Email Parsing (vb code to find ASCII text)

deafmetal1

Senior Chief
Local time
Tomorrow, 02:46
Joined
May 2, 2008
Messages
30
I built an email parser that finds text based on message formatting.

For example, the messages I parse have slashes in them as follows:

SUBJ/MESSAGE SUBJECT/SENDER/MONTH//

I simply parse the SUBJ line and end it with the // delimiter.

Works GREAT! How to do code it to find text that isn't separated by slashes?

Each message (Outlook email) has a date time group in the first few lines of text. i.e., 020226Z MAY 08. That reads 02 May, 0226am ZULU time. That is the last piece I need to parse and don't know how to tell it to look for that exact ASCII format.

DDHHMMZ MMM YY
(######Z LLL ##)

Any ideas on how to tell my parser to look for 12 characters in the same format as above, but will obviously be different in each message?

Thank you in advance. Couldn't get into my old account (deafmetal) but have been a routine user and fan of this forum since 2002. THANK YOU!!!
 
Can post a sample db with data and what you're using to do the parsing to this point?
 
A sample message looks like this:

R 02226Z MAY 08
FM SENDER
TO RECIPIENT
INFO RECIPIENT
BT
SUBJ/TOPIC/FROM/SERIAL NUMBER/MONTH//

I have other Public Subs that take care of the Outlook stuff, like what folder to look in, etc...

I have a separate module that does the line delimiting for the slashes.

The below code does the actual line parsing:

Code:
Public Sub ParseMSG(strFileToParse As String)
On Error GoTo Err_ParseMSG

Dim fileToParse As Object
Dim fileObject As Object
Dim strLineToParse As String
Dim strSUBJ As String

Dim booFoundILS As Boolean
Dim booFoundExpectedFormat As Boolean

' SUBJ Line Strings
Dim strTOPIC As String
Dim strFROM As String
Dim strSERIAL As String
Dim strMONTH As String

Dim strOriginalMessage As String
    strOriginalMessage = strFileToParse

Dim db As Database
Dim rs As DAO.Recordset
Dim strSQL As String

    ' SUBJ Line recordset intialization
    strSUBJ = ""
    strTOPIC = ""
    strFROM = ""
    strSERIAL = ""
    strMONTH = ""

    'initialize
    booFoundExpectedFormat = True
    
  
  'move to SUBJ line, determine topic, sender, Month and Serial Number
    If InStr(strFileToParse, "SUBJ") <> 0 Then
    
        strFileToParse = Trim(Right(strFileToParse, Len(strFileToParse) - InStr(strFileToParse, "SUBJ") + 1))
        If InStr(strFileToParse, "//") <> 0 Then
            
            strLineToParse = CleanString(Left(strFileToParse, InStr(strFileToParse, "//") + 1))
            strSUBJ = strLineToParse
            
            'get Topic
            strTOPIC = GetStringBeforeTheNthDelimitor(strLine:=strLineToParse, intNthDelimitorNumber:=2)
            'get Sender
            strFROM = GetStringBeforeTheNthDelimitor(strLine:=strLineToParse, intNthDelimitorNumber:=3)
            'get Serial Number
            strSERIAL = GetStringBeforeTheNthDelimitor(strLine:=strLineToParse, intNthDelimitorNumber:=4)
            'get Month
            strMONTH = GetStringBeforeTheNthDelimitor(strLine:=strLineToParse, intNthDelimitorNumber:=5)
                        
              End If
        
    Else
    End If
    
    
            
         'enter this if it is not a duplicate
         Set db = CurrentDb
         Set rs = db.OpenRecordset("tImportMSG")
        
         rs.AddNew
        
         'Write data to Import Table from SUBJ line
    
         rs("TOPIC").Value = Left(strTOPIC, 255)
         rs("FROM").Value = Left(strFROM, 255)
         rs("SERIAL").Value = Left(strSERIAL, 255)
         rs("MONTH").Value = Left(strMONTH, 255)
                         
         rs.Update
        
        Set rs = Nothing
        Set db = Nothing
        
    
Exit_ParseMSG:
    Exit Sub
Err_ParseMSG:
    MsgBox Err.Description
    Resume Exit_ParseMSG
End Sub

End result in my table is:

TOPIC | SENDER | SERIAL NUMBER | MONTH

But I need a method to get that date time group, 020226Z MAY 08. That text is what I'm having a hard time grabbing during parsing.
 
Last edited:
Any ideas on how to grab/parse that Date Time Group?

I figure it may have something to do with specifying a specific amount grouping of alpha and numeric characters. I'm just unfamiliar with how to do that. It will always be ######Z LLL ##

# = Number
L = Letter
Z = ZULU time, will always be a Z after the 6 numbers.
There is always a space in between the Z, 3 letter month, and two digit year.

Thank you for your time.
 
Ok, how about just grabbing the ReceivedTime of the Outlook mail item? That would work good enough I suppose. How do I do that? I've tried strDTG = strFileToParse.ReceivedTime and it doesn't work.
 
Sorry, deaf...I just saw your thread for the first time.

Are you using Access 97? If it is A2K or greater, you can use the Split function. The split function creates an array variable (use variant for ease of use) that separates your text string into different elements. In your case, if you DIM an array variable, call it avarTime, then in code do the following:

avarTime = Split("R 02226Z MAY 08", " ")

...creates an array variable with the following assignments:

avarTime(0): "R"
avarTime(1): "02226Z"
avarTime(2): "MAY"
avarTime(3): "08"

(Of course your code would have the string equivalent of the data you are trying to parse, but I left the entire string in for ease of explanation.)

Now you can use the elements any way you need. It looks like you have some sort of military time designator in avarTime(1), with the "Z" indicating "zulu" time (Greenwich mean time civilian), but with 5 digits I'm not sure how it's broken up. Perhaps someone else can help with that. But at least you can break up the message that way.

EDIT: OK, I saw your original post, and you probably had a typo when you put in the original number. Let's assume element avarTime(1)="022226Z" (Note I added an extra 2 to make it a 6 digit number).

You can now provide a further operation on this string by using the "Mid" function. Create and DIM three more variables, all strings:

strDay
strHour
strMinute

Next, use the Mid function. You will note that you can parse strings based on character positions, as well as lengths. In this case, you need three different parsings:

1. To get the Day portion, strDay = Mid(avarTime(1), 1, 2)
2. To get the Day portion, strHour= Mid(avarTime(1), 3, 2)
3. To get the Day portion, strMinute= Mid(avarTime(1), 5, 2)

Now, you can manipulate the various text items to create a Date field, using the following steps:

(Assumption: datDate DIMmed as date/time datatype)

datDate = DateSerial (avarTime(3), avarTime(2), CLng(strDay))
datDate = DateAdd("h", cLng(strHour), datDate)
datDate = DateAdd("n", cLng(strMinute), datDate)

Depending on your regional settings, datDate will return May 2, 2008 10:26 PM.

This will give you a complete date variable from the message you received. There may be an easier way to encapsulize this process, but for now this should work.

One more note: you can use the "Z" designator as a constant, or universal, time, and if you need to calculate a localized time from that calculation, then refer to functions that calculate local times from universal time. In fact, if the letter is anything else, it indicates a time that is equivalent to that particular time zone, and more calculations would be needed as deemed appropriate.
 
Last edited:
mresann,

That is AWESOME! I will try that out today and let you know how it goes. I am learning as we go here, and yes this is for work applications. Z is indeed Zulu time, and appreciate you catching the 6 digit typo. Your assumptions were all on point.

I figured there was a way to parse based on char positions, just wasn't sure how. I've been learning by reverse engineering/disecting other projects and code. What is a truly good reference for learning these advanced VBA items? Any particular book worthy?

I've been working multiple angles of this project, and this was the only area not gaining any traction, so I'm excited to have received your assistance!

Edit:

Inserted that code, not working, maybe I didn't do it right. I attached the database (currently pulls from Outlook Inbox until I get the folderPath code working). One thing to note is that although we're working this at the moment to parse the example I have above, all the messages I'll be parsing will all have different Date Time Groups (DTG's we call them). Some are preceded by the letter R, some a P. The only constant is the structure and the letter Z.
 

Attachments

Last edited:

Users who are viewing this thread

Back
Top Bottom