Data encoded as RTF - need to strip

akb

Registered User.
Local time
Today, 17:42
Joined
Jul 21, 2014
Messages
57
Hi, in our system we enter notes and I am looking to extract those notes. The notes are encoded in as RTF and I need to strip that data. I need the data after fs20 up to \par.



Below are some examples. The data I need is in bold pink. What formula do I use to do this?



{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Courier New;}}
{\colortbl ;\red8\green0\blue0;}
{\*\generator Msftedit 5.41.15.1515;}\viewkind4\uc1\pard\cf1\highlight0\f0\fs20 Two bedrooms w/ closets, great room, steps up & loft\par


Thank you for your help.
 
This quick test may get you started

Code:
Sub GetPink()
    Dim sample As String 'input from OP
    Dim result As String   '
    Dim sVar As Integer  'start position 
    Dim eVar As Integer
    sample = "{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl {\f0\fnil\fcharset0 Courier New;}}" _
           & "{\colortbl ;\red8\green0\blue0;}" _
           & "{\*\generator Msftedit 5.41.15.1515;}\viewkind4\uc1\pard\cf1\highlight0\f 0\fs20 Two bedrooms w/ closets, great room, steps up & loft\par "
    'Debug.Print sample
    sVar = InStr(sample, "fs20 ")
    eVar = InStrRev(sample, "\par")
    If sVar > 0 And eVar > 0 Then
        result = Mid(sample, sVar + 5, eVar - (sVar + 5)) 'string between markers
        Debug.Print result
    End If
End Sub
 
Try this

Code:
Function testme()

Dim strText As String

strText = "{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Courier New;}}" & _
"{\colortbl ;\red8\green0\blue0;}" & _
"{\*\generator Msftedit 5.41.15.1515;}\viewkind4\uc1\pard\cf1\highlight0\f0\fs20 Two bedrooms w/ closets, great room, steps up & loft \par"

strText = Mid(strText, InStr(strText, "fs20") + 5)
strText = Left(strText, Len(strText) - 5)

Debug.Print strText

End Function

OR there's another solution if the required text is always bold and magenta
 
Or
Code:
Public Function GetAfterFS20(ByVal strIn As String) As String
  strIn = Trim(Split(strIn, "fs20")(1))
  GetAfterFS20 = Replace(strIn, "\par", "")
End Function

or in a harder to read single line
Code:
Public Function GetAfterFS20(ByVal strIn As String) As String
  GetAfterFS20 = Replace(Trim(Split(strIn, "fs20")(1)), "\par", "")
End Function
 
But I am also guessing there is likely a lot more to this with different possibilities.
 
Anther option:

Which RTF control are you using to format eh RTF data?

All the RTF controls I have worked with have a .Text (or similar) property that returns just the text without any of the formatting.
 
Word automation might be simpler.

Using Access to create a word object, open the .rtf file and save it as .txt. I don't have any samples handy but you can give yourself a start by using the Word macro recorder. Open Word, turn on the macro recorder. open the .rtf file, save it as .txt. Turn of the macro recorder and see what you have. Usually the generated code is very close to what you need in Access once you declare the objects correctly.
 
As Pat pointed out, Word automation does work. I have done it but it used lots of code.

Using an RTF control takes only a few lines of code to get the job done.

' 1 load the RTF control

Me.RTFCOntrol = MyRTFdata

' 2- ectract just teh text

MyString = Me.RTFCOntrol.Text

The Internet Explorer web control can be used similarly to the RTF control.
 
I still haven't found a solution for this. The font is not always bold and magenta. Also, the amount of text can be different depending on the note. It's never the same number of characters that I need to extract. I'm looking to extract the notes into an Access report. I'd like to avoid Word if possible. Thoughts?
 
Have you tried any of the offered solutions ?
There are at least three that didn't involve Word...
 
I used the Mid and InStrRev functions. I was able to extract the text starting after "fs20", what do I need to add to eliminate the text after "fs18"?



Mid([CommentBuffer],InStrRev([CommentBuffer],"fs20")+4)
 
I don't think I'm using the plaintext function properly. What formula would I use for that?
 
I used the Mid and InStrRev functions. I was able to extract the text starting after "fs20", what do I need to add to eliminate the text after "fs18"?

Mid([CommentBuffer],InStrRev([CommentBuffer],"fs20")+4)

Have another look at the suggestions in posts 2/3/4

I don't think I'm using the plaintext function properly. What formula would I use for that?

PlainText(expression)

For an example of its use, see https://www.access-programmers.co.uk/forums/showpost.php?p=1579161&postcount=4

However I don't think it will work on your string as it not 'standard' rich text
 
Ok -



Here is the info in the comment buffer field, I need to extract the text in bold:
{\rtf1\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fprq1\fcharset0 Courier New;}}
{\colortbl ;\red0\green0\blue128;}
{\*\generator Riched20 10.0.17134}\viewkind4\uc1
\pard\cf1\highlight0\f0\fs20 please use $8.64yard pricing\fs18\par
}



Using this formula:

Mid(CommentBuffer, InStr(CommentBuffer, "fs20") + 5)


I get:

please use $8.64yard pricing\fs18\par}



I need to remove:

\fs18\par}
 
Making a minor change to my solution in post 2

Code:
Function testme()

Dim strText As String

strText = "{\rtf1\ansi\ansicpg1252\deff0\nouicompat\deflang10 33{\fonttbl{\f0\fnil\fprq1\fcharset0 Courier New;}}" & _
"{\colortbl ;\red0\green0\blue128;}" & _
"{\*\generator Riched20 10.0.17134}\viewkind4\uc1" & _
"\pard\cf1\highlight0\f0\fs20 please use $8.64yard pricing\fs18\par" & _
"}"

strText = Mid(strText, InStr(strText, "fs20") + 5)
strText = Left(strText, InStr(strText, "fs18") - 2)

Debug.Print strText

End Function

Result: please use $8.64yard pricing

To use with CommentBuffer field

Code:
strText = Mid(CommentBuffer, InStr(CommentBuffer, "fs20") + 5)
strText = Left(strText, InStr(strText, "fs18") - 2)
 
Thank you. I must be missing something, where am I putting this? If I put this in a query, it's not working.



strText = Mid(CommentBuffer, InStr(CommentBuffer, "fs20") + 5) strText = Left(strText, InStr(strText, "fs18") - 2)
 
The easiest way is to place this function in a standard module

EDIT: I've corrected an error in the code below after checking it

Code:
Function StripText(strText As String)

strText = Mid(strText, InStr(strText, "fs20") + 5)
[B][COLOR="Red"]StripText[/COLOR][/B] = Left(strText, InStr(strText, "fs18") - 2)

'Debug.Print StripText

End Function

Then in your query add a field: StripText([CommentBuffer])

Otherwise you'll need to combine the 2 lines of code in your query.
Code:
Left(Mid(CommentBuffer, InStr(CommentBuffer, "fs20") + 5), InStr(Mid(CommentBuffer, InStr(CommentBuffer, "fs20") + 5), "fs18") - 2)

Both methods now checked and working with your sample data
 
Last edited:
here is a Class (clsTRF2Text). copy and paste to
Class Module:
Code:
Option Explicit

Private oWd As Object
Private oDO As Object   'msforms.dataobject -- for clipboard access
Private strRTF As String
Private strPlaintext As String

Private Sub Class_Initialize()
    Set oWd = CreateObject("word.application")
    oWd.DisplayAlerts = 0   '=wdAlertsNone
    oWd.Documents.Add
    Set oDO = CreateObject("new:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
End Sub

Private Sub Class_Terminate()
    oWd.ActiveDocument.Close False  'no save
    oWd.Quit
    Set oWd = Nothing
    Set oDO = Nothing
End Sub

Public Property Get TextRTF() As Variant
    TextRTF = strRTF
End Property

Public Property Let TextRTF(ByVal vNewValue As Variant)
    Static oTS As Object
    strRTF = vNewValue
    On Error Resume Next
    'Convert to byte array and place in clipboard
    oDO.Clear
    oDO.SetText StrConv(strRTF, vbFromUnicode), "Rich Text Format"
    oDO.PutInClipboard
    'clear out whatever is in the document
    oWd.ActiveDocument.Range.Text = vbNullString
    'Paste clipboard contents into Word object
    oWd.ActiveDocument.Range.Paste
    If Err = 0 Then
    Else
        Err.Clear
        AppActivate "Microsoft Office Word"
        If Err = 0 Then
            SendKeys "{Enter}", True
        End If
        Err.Clear
    End If
    'Get the plain text
    strPlaintext = oWd.ActiveDocument.Range.Text
    
End Property

Public Property Get Text() As Variant
    Dim boolFoundOne As Boolean
    'remove trailing Word paragraph marks or CrLf before returning plain text
    Do
        boolFoundOne = False
        
        Do While Right(strPlaintext, 1) = vbCr
            strPlaintext = Left(strPlaintext, Len(strPlaintext) - 1)
            boolFoundOne = True
        Loop
        'remove any CrLf character pairs
        Do While Right(strPlaintext, 2) = vbCrLf
            strPlaintext = Left(strPlaintext, Len(strPlaintext) - 2)
            boolFoundOne = True
        Loop
    
    Loop While boolFoundOne
    
    Text = strPlaintext
End Property

'Public Property Let Text(ByVal vNewValue As Variant)
'   Text is a read-only property
'End Property

on a separate module, paste this test:
Code:
Private Sub test()
    Dim cRTF2Text As clsRTF2Text
    Dim strRTF As String
    
    strRTF = "{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl {\f0\fnil\fcharset0 Courier New;}}" & _
"{\colortbl;\red8\green0\blue0;}" & _
"{\*\generator Msftedit 5.41.15.1515;}\viewkind4\uc1\pard\cf1\highlight0\f0\fs20 Two bedrooms w/ closets, great room, steps up & loft\par"
    
'    strRTF = _
'"{\rtf1\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fprq1\fcharset0 Courier New;}}" & _
'"{\colortbl;\red0\green0\blue128;}" & _
'"{\*\generator Riched20 10.0.17134}\viewkind4\uc1" & _
'"\pard\cf1\highlight0\f0\fs20 please use $8.64 yard pricing \fs18\par"

    With New clsRTF2Text
        .TextRTF = strRTF
        MsgBox Replace(.Text, "__", "")
        
    End With
    
End Sub
 

Users who are viewing this thread

Back
Top Bottom