Help - complicated strings extractions

ma143hal

New member
Local time
Today, 14:31
Joined
Mar 22, 2006
Messages
7
Hello,

I have trying to figure how to extract strings from a cell with different starting location, number of strings, date and time format. I'm hoping you can help.

Below is an example of strings in a cell. I'm looking for the following value: 6BS1, 6BC1, 6BM1 and 6BZ1 only then I need to extract these 4 characters after the asterisks, the numbers following it which is the starting date and ending date. The date has many different formats (ddmmyy, mmddyy, mmddyyhhmm) and some do not have ending date. Some have dashes, spaces, etc. The 4 characters, starting date and ending date should be extracted to 3 separate fields. I have to a do bunch of calculations based on the output.

tblEscalationNotes EscalationNote
**2bz4-1215061430- **6bz5-1215060825-1215061332 **6bz1-1128061429-1214062343 **2bz5-1122061358-1128061429 **6bz5-1121060956-1122061329


tblEscalationNotes EscalationNote
**6BC1-1206060940-


tblEscalationNotes EscalationNote
**6BS11112061531 **2BS41112061531


tblEscalationNotes EscalationNote
**6BS10712061454-1212061510 **1BS20712061454-1212061510 **1BS70712061454-1212061510 **2BS60712061454-1212061510 **2BS40712061454-1212061510


tblEscalationNotes EscalationNote
**6BS1 11172006-12141600
**6BC1-1129061600


Thank you very much for your help!
 
You my try stripping away the first two '**'

Then testing for other '**'

If you find other '**' use the Split function to break them down into individual pieces.

After that you can use the Len function to test for lenghts which should tell you how much information is there which will allow you to further break them down into three seperate strings.

As for the date! I see a real problem. What does 040306 mean? Is that April 3rd or March 4th.

Without some standardization I don't really see a simple solution .
 
Last edited:
Standardization (a.k.a. Normalize the data) is definitely the missing link here. Without that, you're going to have to test for each combination. It looks like you're at least starting with "**XXXX" for each entry. You can use a InStr function to figure out where the "**" starts and from there, it will become a cumbersome series of IF..THEN statements to figure out what's next, using the LEN function, and all your possible data combinations (in conjunction with more InStr functions).

Play with InStr() to figure out where the "**" starts, and from there, it'll become math, LEN(), and logic constructs.

Really, if there's a way to normalize the data, that is by far a better solution.

~Moniker
 
ma143hal,

Interesting problem, unless you have data that is structured a little more consistently
it will be very hard to extract anything meaningful from it.

The dates present like --> ddmmyy, mmddyy, mmddyyhhmm
is a show stopper. Wny maintain the dates if you have
no faith in their validity?

Regardless, it is an interesting parsing challenge.
The following will traverse the text in a "Cell" on a form: Me.TheCell
As it encounters each "Field" definition, it will record it in tblHistory.

I didn't run this, but it should give you a start.
You will have to write some code even to do a half-hearted data collection.


Code:
'
'
' Sample Data: 
'
' TableName TableDesc
' ** FieldName StartDate Start4Digit EndDate End4Digit  <-- Can repeat on a line
'
' Save this information for each FieldName entry
'
' tblHistory
' ==========
' TableName   ' All fields are strings and
' TableDesc   ' can be Null.
' FieldName       ' 
' StartDate       ' These 5 fields are just my perception of the
' Start4Digit     ' typical entry like --> **6BS10712061454-1212061510 
' EndDate         '                          ----      ---- ------
' End4Digit       '                              ------           ----
'                 '                            1    2    3     4    5
'
'
Dim aryCellLines() As Variant
Dim i As Long
Dim strTemp As String
Dim strEntry As String
'
Dim strTableName As String
Dim strTableDesc As String

'
' The variant aryCellLines will hold all the lines of the cell.
'
aryCellLines = Split(Me.TheCell, vbCrLf)
For i = 0 To UBound(aryCellLines())
   If Left(aryCellLines(i), 3) = "tbl" Then  ' <-- Line Type = TableName TableDesc
                                             '     Extract names
                                             '
      strTableName = Mid(aryCellLines(i), 1, InStr(aryCellLines(i), " ") - 1)
      strTableDesc = Mid(InStr(aryCellLines(i), " ") + 1)
   Else
      '                                      ' Line TYpe is a FieldName Description
      '                                      ' Process each of possibly many Fields
      '                                      ' Enter each Field in the History Table
      '
      If Left(aryCellLines(i), 2) = "**" Then
            strTemp = aryCellLines(i)
            While Left(strTemp, 2) = "**"
               '
               ' Extract one entry
               ' Either whole string, or up to the next field's "**"
               '
               If InStr(3, strTemp, "**") > 0 Then
                  StrEntry = Mid(strTemp, 1, InStr(3, strTemp, "**"))
               Else
                  StrEntry = strTemp
               End If
               '
               ' Pad the entry so that the Mid function doesn't fail
               '
               strEntry = strEntry & "                              "
               '
               ' Here, the part of the string with the questionable data is processed.
               ' Below, all extraction is done statically -- fixed-length fields
               ' ANY DEVIATION FROM THIS FORMAT MAKES THIS EXTREMELY DIFFICULT !!!
               '
               '     The different date formats, and even absence of dates can only
               '     be addressed by parsing strEntry according to whatever logic is
               '     appropriate.
               '
               '     The point is that any data handling deviating from fixed length
               '     format should be addressed here.
               '
               Select Case Mid(strEntry, 3, 4)
                 Case "6BS1", "6BC1", "6BM", "6BZ1" 
                     Docmd.RunSQL "Insert Into tblHistory (TableName, TableDesc, FieldName, " & _
                                  "                        StartDate, Start4Digit, " & _
                                  "                        EndDate,   End4Digit) " & _
                                  "Values ('" & strTableName & "', '" & _
                                              & strTableDesc & "', '" & _
                                              & Mid(strEntry,  3, 4) & "', '" & _
                                              & Mid(strEntry,  8, 6) & "', '" & _
                                              & Mid(strEntry, 14, 4) & "', '" & _
                                              & Mid(strEntry, 19, 6) & "', '" & _
                                              & Mid(strEntry, 25, 4) & "');" & _
                 End Select
               '
               ' Remove the Field that was just processed from the Current Line
               ' Set to "" if no more fields left.
               '
               If InStr(3, strTemp, "**") > 0 Then
                  strTemp = Mid(strTemp, InStr(3, strTemp, "**"))
               Else
                  strTemp = ""
               Wend

      End If
   End If
   Next i

hth,
Wayne
 

Users who are viewing this thread

Back
Top Bottom