Remove font face RTF tags (1 Viewer)

pdanes

Registered User.
Local time
Yesterday, 16:02
Joined
Apr 12, 2011
Messages
197
I have a memo field into which the user pastes text from various online sources. I made it an RTF field, because some of the formatting is useful - mostly italics for species names, and occasionally bold. Unfortunately, pasting web text often brings in a huge amount of superfluous formatting information, usually having to do with character substitution in foreign languages.

The text fills up with garbage like:

<font face=Arial size=3 color=black>ý</font>

<font face=PalatinoLinotype-Roman size=3 color=black> pušt</font>

<font face=Arial size=3 color=black>í</font>

<font face=PalatinoLinotype-Roman size=3 color=black>k bělav</font>

<font face=Arial size=3 color=black>ý</font>

<font face=PalatinoLinotype-Roman size=3color=black> (</font>

where the red characters are what should be in the box, and all the rest is Access's attempt to switch fonts around in an attempt to correctly display a character that may not be available in some font, but is in another. Naturally, this balloons the content enormously. What should be an article of a few hundred words swells to tens of thousands of characters in the RTF field, with all the problems that such inflation carries.

When I manually select the entire text and give a formatting command to make it all one font, this junk disappears. Is there a way to call this function from VBA?

Before someone tells me to use the PlainText function, I do NOT want to get rid of ALL the formatting. I want to make all the text one font face, but leave things like bold, italics and possibly color.

Access obviously knows how to do this, and I would like to use that ability, instead of having to code up an enormous kludge to slog through all the RTF tags.
 
Not in front of a computer now, but if you're using a menu to do that manually, then check if there's a corresponding RunCommand argument for it.

Sent from phone...
 
Not in front of a computer now, but if you're using a menu to do that manually, then check if there's a corresponding RunCommand argument for it.

Sent from phone...
Yes, that looks promising. RunCommand acCmdFont opens the dialog box for all those things, but I don't want to pester the user with it, I just want to apply it.

I suppose I could use SendKeys as a last resort, but that's icky. I wonder if there is a way to directly call something with the same instructions as the dialog box would return. But this is a good start. Thank you.
 
Code:
Sub test_FF()
    Const cText = "<font face=PalatinoLinotype-Roman size=3 color=black> pušt</font>"
    Debug.Print Change_FontFace(cText, "Arial")
End Sub

Public Function Change_FontFace(AnyString As String, NewFont As String) As String
    Change_FontFace = RegExReplace(AnyString, "(<font face=)([A-Z-]+)( size.*)", "$1" & NewFont & "$3")
End Function

Code:
Private pRegEx As Object

Public Property Get oRegEx(Optional Reset As Boolean) As Object
   If (pRegEx Is Nothing) Then Set pRegEx = CreateObject("Vbscript.Regexp")
   If Reset Then Set pRegEx = Nothing
   Set oRegEx = pRegEx
End Property

Public Function RegExReplace(ByVal SourceText As String, _
      ByVal SearchPattern As String, _
      ByVal ReplaceText As String, _
      Optional ByVal bIgnoreCase As Boolean = True, _
      Optional ByVal bGlobal As Boolean = True, _
      Optional ByVal bMultiLine As Boolean = True) As String
  
   With oRegEx
      .Pattern = SearchPattern
      .IgnoreCase = bIgnoreCase
      .Global = bGlobal
      .MultiLine = bMultiLine
      RegExReplace = .Replace(SourceText, ReplaceText)
   End With
End Function
 
Code:
Sub test_FF()
    Const cText = "<font face=PalatinoLinotype-Roman size=3 color=black> pušt</font>"
    Debug.Print Change_FontFace(cText, "Arial")
End Sub

Public Function Change_FontFace(AnyString As String, NewFont As String) As String
    Change_FontFace = RegExReplace(AnyString, "(<font face=)([A-Z-]+)( size.*)", "$1" & NewFont & "$3")
End Function

Code:
Private pRegEx As Object

Public Property Get oRegEx(Optional Reset As Boolean) As Object
   If (pRegEx Is Nothing) Then Set pRegEx = CreateObject("Vbscript.Regexp")
   If Reset Then Set pRegEx = Nothing
   Set oRegEx = pRegEx
End Property

Public Function RegExReplace(ByVal SourceText As String, _
      ByVal SearchPattern As String, _
      ByVal ReplaceText As String, _
      Optional ByVal bIgnoreCase As Boolean = True, _
      Optional ByVal bGlobal As Boolean = True, _
      Optional ByVal bMultiLine As Boolean = True) As String
 
   With oRegEx
      .Pattern = SearchPattern
      .IgnoreCase = bIgnoreCase
      .Global = bGlobal
      .MultiLine = bMultiLine
      RegExReplace = .Replace(SourceText, ReplaceText)
   End With
End Function
Thank you, but it appears that this will simply change all tags of one font to another font. I'm not just trying to change the appearance of the text to be uniform - I'm trying the get rid of the font tags entirely, while leaving any other formatting tags.
 
OK, deleting is practically replacing with null string.
Code:
Sub test_FF()
    Const cText = "<font face=PalatinoLinotype-Roman size=3 color=black> pušt</font>"
    Debug.Print Delete_FontFace(cText)
End Sub

Public Function Delete_FontFace(AnyString As String) As String
    Delete_FontFace = RegExReplace(AnyString, "(<)(font face=[A-Z-]+ )(size.*)", "$1" & "$3")
End Function
 
OK, deleting is practically replacing with null string.
Code:
Sub test_FF()
    Const cText = "<font face=PalatinoLinotype-Roman size=3 color=black> pušt</font>"
    Debug.Print Delete_FontFace(cText)
End Sub

Public Function Delete_FontFace(AnyString As String) As String
    Delete_FontFace = RegExReplace(AnyString, "(<)(font face=[A-Z-]+ )(size.*)", "$1" & "$3")
End Function
True, but again, this has no provision for leaving useful information. If I manually select a color, for instance, I want that color. Access does that by setting a color at the beginning and clearing it at the end. I would like to do THAT, and not have the color pointlessly set to black every few letters. If the size changes, I want to retain that, but I do not want to set it to 3 every few letters.

When I manually select a font from the font dialog, ALL the tags specifying a font in the text vanish, and are replaced by my selection at the beginning and a closing tag at the end. That is the functionality that the dialog box has, and I want to call it, not manually pick my way through all these tags. What I posted is only an example. There is all sorts of formatting garbage in these text, from all kinds of sources. Writing code to deal with every possible variety of it would be a huge duplication of something Access already knows how to do.
 
Here is some of the junk between angle brackets in those texts, if anyone is curious about what I'm dealing with.


</a>
</div>
</em>
</font>
</strong>
<0.0001). The comparison of yearly totals within the 37 yearsof coincident monitoring showed a significant correlation of fluctuation in numbers on the two localitiesstudied (GLS model, F=9.556, p=0.004). While <em>
<1.9 ind·ha-1). The results identified temperature and habitat quality as prime drivers of reproduction, and ultimately population dynamics. Such information could prove useful for successful rabbit management on Lemnos and other similar areas.</div>
<18 years). There were significant differences between coppices and high forests, but not between grazed and ungrazed woodlots, and rotational and non-rotational forests. Correspondence analysis corroborated the empirical evidence, showing a close association between presence of dormice and high forests, and, to a lesser degree, rotational forests. Concerning environmental parameters, favourable woods were those with high trees and with low stem number, while the specific tree assemblages proved to be scarcely important. Wood area was more important than isolation in predicting presence or absence of this rodent, as the highest probability was associated with woodlots of 40–50 ha area and more. These results did not provide evidences about the importance of the amount of residual habitat in the 10 km2 surrounding the woodlot and the presence of hedgerows as ecological corridors.</div>
<20.5°C). A low maximum temperature at conception (<20.0°C; 1 month time lag) favored pregnancy status (i.e. proportion pregnant). At a higher maximum temperature, pregnancy was favored by a high vegetation cover (>
<21.5°C), whereas under a higher maximum temperature male fertility was favored by a high vegetation quality (>
<32.8 mm) for this important identifying attribut of <em>
<5–10%). Aby bylo možno odpovìdìt na otázku, jak mohly populace plšíkù lískových pøežít v témìø odlesnìných oblastech, byla aktuální mapa rozšíøení plšíka v severozápadní Evropì analyzována pomocí systému GIS s pøekrytím rùzných datových vrstev a dále byly analysovány historické mapy severozápadního Nìmecka, ve snaze nalézt v historické krajinì rozhodující prvky umožòujíci pøežití plšíka. Tradice lokality zjevnì ovlivòuje možnost pøežívání a souèasný stav populace plšíkù. Pokryvnost mladšími lesy je velmi dùležitá, avšak nikoliv urèující. Tradice a spojitost biotopù je dùležitá pro užití stanoviš plšíky lískovými a identifikace historických systémù živých plotù a historických lesù mùže pomoci nalézt místa s dosud neznámým výskytem plšíkù. Zdá se, že pro plšíky lískové byl v severozápadní Evropì nedostatek lesních stanoviš úspìšnì vynahrazen vytvoøením systémù živých plotù. Živé ploty pøedstavují nejenom spojovací strukturu, ale biotop samy o sobì. Hustota 50 m dlouhých, setrvale vysoce kvalitnícha dobøe pr
opojených živých plotù na hektar se jeví jako minimální podmínka pro pøežití plšíka v krajinì severozápadní Evropy. Zachování starobylých biotopù a tvorba nových stanoviš jako jádrových biotopù a jejich propojení pøedstavuje klíèovou strategii pro umožnìní dlouhodobého pøežívání a rekolonisaci plšíka lískového i dalších druhù.</div>
<a href="https://babel.hathitrust.org/cgi/pt...1up&amp;seq=397&amp;skin=2021&amp;q1=Arvicola">
<a href="https://doi.org/10.1007/s42991-021-00200-8">
<a href="https://doi.org/10.1111/eth.12126">
<div>
<em>
<font color="#000000">
<font color="#534E48">
<font color=black>
<font face="Arial Black" size=3 color=black>
<font face="ARPRFT+WarnockPro-Regular" size=1 color=black>
<font face="KGJFM I+ Gulliver RM" size=1 color=black>
<font face="KGJHA F+ Gulliver IT" size=1 color=black>
<font face="Microsoft Sans Serif" size=1 color=black>
<font face="Times New Roman" size=1 color="#211D1E">
<font face="Times New Roman" size=1 color="#221E1F">
<font face="Times New Roman" size=1 color=black>
<font face="Times New Roman" size=2 color=black>
<font face="Times New Roman" size=3 color=black>
<font face="YRPADX+WarnockPro-It" size=1 color=black>
<font face=AdvPSPAL-B size=1 color=black>
<font face=AdvPSPAL-BI size=1 color=black>
<font face=AdvPTimes size=2 color=black>
<font face=AdvTT5e566946.I size=2 color=black>
<font face=AdvTT92c02321 size=2 color=black>
<font face=Arial size=1 color=black>
<font face=Arial size=3 color=black>
<font face=ArialMT size=1 color=black>
<font face=Calibri-Italic color=black>
<font face=Cambria size=3 color=black>
<font face=Courier size=1 color=black>
<font face=Courier size=2 color=black>
<font face=Fd467987-Identity-H color=black>
<font face=Fd547476-Identity-H color=black>
<font face=FlcgtrAdvTTb5929f4c size=2 color=black>
<font face=LiberationSerif size=1 color=black>
<font face=LiberationSerif-Italic size=1 color=black>
<font face=MinionPro-Bold size=2 color=black>
<font face=MinionPro-It size=2 color=black>
<font face=NewCenturySchoolbook size=2 color="#231F20">
<font face=NewCenturySchoolbook-NormalItal size=2 color="#231F20">
<font face=NotoSans size=2 color=black>
<font face=Plantin color="#231F20">
<font face=Plantin size=1 color=black>
<font face=Plantin-Italic size=1 color=black>
<font face=Rpxr size=1 color=black>
<font face=Sabon-Roman size=1 color=black>
<font face=TeXGyreTermes-Italic size=1 color=black>
<font face=TeXGyreTermes-Regular size=1 color=black>
<font face=TimesNewRoman size=3 color=black>
<font face=TimesNewRomanPS-ItalicMT size=1 color=black>
<font face=TimesNewRomanPS-ItalicMT size=2 color=black>
<font face=TimesNewRomanPSMT color=black>
<font face=TimesNewRomanPSMT size=1 color=black>
<font face=TimesNewRomanPSMT size=2 color=black>
<font face=TjpbvvAdvTT99c4c969 size=2 color=black>
<font face=TyfaITCOT size=1 color=black>
<font face=TyfaITCOT-Italic size=1 color=black>
<font face=URWPalladioL-Roma size=1 color=black>
<font size=1 color=black>
<font size=3 color=black>
<strong>
 
When I manually select a font from the font dialog, ALL the tags specifying a font in the text vanish, and are replaced by my selection at the beginning and a closing tag at the end.
It's nice when someone speaks in complete sentences and doesn't just scatter fragments of thoughts.

Sure, analogous to the examples shown above, you can remove formatting tags completely and then set your own formatting tags at the beginning and end (of the selected text).

Writing code to deal with every possible variety of it would be a huge duplication of something Access already knows how to do.
"every possible variety"
Those are very strong words.
Solving EVERYTHING, that means known tasks and still unknown tasks, using code requires a lot of time in addition to some knowledge.
The first thing to keep in mind is that other people's time is at least as valuable as your own.
 
It's nice when someone speaks in complete sentences and doesn't just scatter fragments of thoughts.

Sure, analogous to the examples shown above, you can remove formatting tags completely and then set your own formatting tags at the beginning and end (of the selected text).


"every possible variety"
Those are very strong words.
Solving EVERYTHING, that means known tasks and still unknown tasks, using code requires a lot of time in addition to some knowledge.
The first thing to keep in mind is that other people's time is at least as valuable as your own.
I don't know who peed in your cornflakes this morning, but if you have only snark to offer, go bother someone else.
 

Users who are viewing this thread

Back
Top Bottom