View Full Version : Parsing To A Character Type
mstorer 10-20-2008, 02:58 PM Hi Everyone -
I have a string field from which I need to extract a number. The field is not split into fixed length units though. For Example:
00150
002250ABC
0032500XYZ
I am only interested in the numeric portion in bold.
I can omit the first three numbers easily enough, but since the value I want varies in length and there is no specific delimiter, I don't know to to get the instr() function to go to the first non-numberic value. Perhaps I'm barking up the wrong tree with instr()?
Thanks for any assistance!
- Matt
boblarson 10-20-2008, 03:00 PM How about using
Val(Mid([YourFieldNamehereInBrackets],4))
mstorer 10-20-2008, 03:09 PM Thank you Bob. A long day and not enough coffee can sometimes make a mountain out of a mole hill. For whatever reason, I was fearing I'd have to write a function that looped to the first character.... very inefficient with the size of the database. And I use the Val()function all the time!!!
Thanks again for the carity!
- Matt
deafmetal1 11-16-2008, 11:52 PM Great thread, however I have a similar issue as follows:
I am parsing as well using INSTR.
The data string is formatted as follows:
P/-/9 or P/1/9
I want it to grab the string irregardless of the middle character in between the delimiters (slashes) and yet take into account there must be a number 0-9 after the second slash, so that I'd parse any of the below examples:
P/-/6
P/1/6
P/8/7
etc...
I can write a thousand lines of code to grab P/-/0 then look for P/1/0, then P/2/0, etc... but that is WAY inefficient, and like I said, don't care what the middle character is as long as it's looking for the number after the 2nd slash. Reason is there are other lines in the message that may have a P/ in it.
I tried Like and nested IF statements to no avail. Was thinking defining a string with the criteria, then using INSTR(strSTRING, strCRITERIA). That on a path to success? Thank you in advance!!
Rob
raskew 11-17-2008, 12:38 AM Hi -
Not absolutely sure I understand the problem correctly. Having said that, try copying/pasting this to a standard module: http://www.access-programmers.co.uk/forums/showthread.php?t=156440 and calling it like this:
x = "P/-/6"
? val(mid(x, fFindNthOccur(x, "/", 2)+1))
6
Please post back if that's not what you're after.
Bob
deafmetal1 11-17-2008, 04:24 AM Bob,
Sorry to confuse you, I confuse myself some days!!
Let me try this; In the emails/messages I parse, I look for a line that looks like this:
P/-/####
Sometimes it will have a number in between, as such: P/#/####
There is more data in the lines, and they end with a //.
I have it figured out to find these lines based simply on starting with P/ and ending in //. But if there are any other accidental P/ in a different line that ends with a // it grabs it mistakenly.
Here's the existing code:
If InStr(strFileToParse, "P/") <> 0 Then
strFileToParse = Trim(Right(strFileToParse, Len(strFileToParse) - InStr(strFileToParse, "P/") + 1))
If InStr(strFileToParse, "//") <> 0 Then
strLineToParse = CleanString(Left(strFileToParse, InStr(strFileToParse, "//") + 1))
strP = strLineToParse
End If
Else
End If
It was grabbing extraneous stuff with just the P/, so I wanted to extend to the right a bit to ensure I get the right line.
That's where I get the P/*/# (asterich represents any singular char, will either be a dash or a number, the # is a number 0 thru 9)
That help clarify any further? Wish I could just use wildcards for characters and numbers in the InStr statement above.
R,
Rob
raskew 11-17-2008, 06:17 AM Doesn't sound as though you tried the function provided.
As written, it returns the value of the character(s) following the 2nd "/", without consideration to what might be between the slashes. Is that not what you are after?
Bob
gemma-the-husky 11-17-2008, 06:44 AM you could use the split function, then test each item in the resulting array with val until you get a value <> 0
this would mean that A\12\123XY would return 12 as the first value.
but A\AB\123XY would return 123
deafmetal1 11-17-2008, 01:46 PM That would mean I'd have to understand it, and it confuses me, but am trying it now. Th eonly reason I didn't was because I didn't understand the syntax: ? fFindNthOccur(x, " ", 3)
Will try and report back findings, wasn't trying to be an ignorant requester of help, my skill level of understanding sometimes hampers my ability.
Sorry to matt for hijacking his original thread!!!
raskew 11-17-2008, 02:46 PM Hi -
The ...
x = "P/-/6"
? val(mid(x, fFindNthOccur(x, "/", 2)+1))
6
...is the method used to test the function from the immediate (debug window) (Ctrl + G)
The first line (x = "P/-/6") followed by a carriage return simplifies the "P/-/6" so you can just refer to 'x', rather than repeating the whole thing.
The next line tests the function. '?' represents print. The call, as shown above, returns the numeric value of the character(s) immediately following the second slash in x -- in this case it's 6. Were x = "P/8/62A5", the same call would return 62. The fFindNthOccur() function is actually returning the location of the second slash in the string, which in this case is 4. The Mid() function is then used to return a value starting with 4 + 1. Were you using just the Mid() call, it'd read: ? val(mid(x, 5))
Using the example you showed (? fFindNthOccur(x, " ", 3)), the function returns the location of the 3rd space in the string, e.g. if
x = "The quick brown fox", the above call would return 16
Hope that clarifies it somewhat.
Best Wishes - Bob
deafmetal1 11-19-2008, 02:13 PM Bob,
That's an EXCELLENT explanation, and I truly appreciate you taking the time to do so!
I have fiddled with your code, it works great, I get the value after the 2nd Slash every time, but at the point that I'm using this code, I'm trying to get the parser to find the P/#/### line, not so much process the data after the 2nd slash yet. So although using the code against a line that starts with P/-/6, I get a resulting 6, I am having it grab the entire line based on the starting data (P/-/6), then later on I grab the internals of that line and store the "6" in a table.
Make sense?
raskew 11-19-2008, 02:37 PM Hi Chief -
Make sense?
Not entirely, I'm a tad slow.
Please provide a list of examples, showing in bold that numbers you're attempting to isolate. If some of the examples wouldn't be valid (should be overlooked), please include them anyway.
Are these strings coming from a field or what? Please clarify.
Best wishes - Bob (SGM, USA, Ret)
deafmetal1 11-19-2008, 03:44 PM Bob,
Great to see another Senior Enlisted!
Maybe if I start from ground zero, it will help. I'm parsing messages out of Outlook. The parser stores the file to parse as an object, then I have a Public Sub that looks within each object for specific lines. Once it finds that specific line to parse, which ends in a "//", it then passes the line to parse to a line specific function which strips the specific data fields and appends them to applicable tables within the db.
The whole thing works fine if the line has a very unique set name, but some of the messages have lines that simply start with one letter. In the previous examples, a P. Below is an example of one unique line:
MSGID/MSG TYPE/SENDER/SERIAL #//
So in this case, I have the parser use InStr(strFileToParse, "MSGID/") and a delimiter function to find the "//" to grab the specific line. Then I pass that line to a function that will pull out the specific items after each individual "/". In this case I send the MSG TYPE, the SENDER and the SERIAL # to a table. After some reverse engineering, this is easy.
The hard part is single letter lines, such as S/, P/, E/ etc... In the above example line, the second field ends in E/. So if I was parsing with my InStr method above simply using "E/", it would grab the E/ and all after that to the "//", although from the wrong set, because it exists previous to the real E/ line I want later in the message.
So I started trying to find ways to grab more from the line I want to parse. So if the E/ lines always looks like this: E/BROWN/FOX/, I could simply adjust my InStr statement to include all of that. I don't have that option with my P line.
It is always like this: P/(either a "-" or a single number)/#####(4 - 5 digit number)/
For example: P/1/5555/-/-/-/-/-/-// (the other dashes here represent other fields of data that may be populated in this line, but are unimportant to discuss at the moment but get stripped from the line after parsing.
Either way I just want it to look for that routine, hopefully with some kind of wildcard routine, and capture that line so I can then strip the specific sets. Using P/ isn't working, so I have to move to the right a little bit to ensure I get the right line for post parsing data stripping.
That help Bob?
Thanks Again Shipmate!!
raskew 11-19-2008, 04:06 PM Hi -
Did I mention slow?
Would you provide a series of examples, with the appropriate items to extract shown in Bold.
What I'm trying to see is the logic of if you detect MSG ID, you do this, but if you detect P/ you do that.
Appreciate your explanations but a bunch of examples (or a zip file, if that's feasible), in the format that you'd process them, would be more self-explanatory.
Thanks, Bob
deafmetal1 11-19-2008, 10:47 PM I have an example base DB attached. Included in the zipfile is also a pic of the VBA references, and 4 test emails. Reccomend creating an Outlook folder in your Inbox called MSG_PARSER or whatever. Not sure if you can just drag and drop the emails into it, though you should be able to. You'll be able to set the folder location in the parser itself.
The parser has 4 entries already in it's tables that you can empty and play with. They are from the 4 emails provided. You'll notice the P/ issue if you open up the MSG_Data table and look in the PNUM column. That illustrates the issue if I just use "P/" in my InStr method. Hence why I was trying to somehow incorporate more of the P line structure into my parsing of the line.
That any better?
Thanks a TON!! This is a 6+ month project by a VBA illiterate [me]. (unfortunately with this stripped example, you don't get to see all the stuff I've learned here from the TREMENDOUS help I've received. But you'll see some, like the Outlook folder option.)
raskew 11-20-2008, 12:37 PM Hi -
Outlook not being my strong-suit, I was unable to drag & drop the messages to folder MSG_Data, apparently since the messages are Read Only.
Having said that, I still don't see the problem. Given the examples in table Msg_Data, the function I provided will, in every case take the string in field pLine and correctly return the number shown in field pNum.
x = "P/-/9999/Test/Test/Test//"
? format(val(mid(x, fFindNthOccur(x, "/", 2)+1)), "0000")
9999
x = "P/-/1234/Test/Test/Test//"
? format(val(mid(x, fFindNthOccur(x, "/", 2)+1)), "0000")
1234
x = "P/4/0052/Test/Test/Test//"
? format(val(mid(x, fFindNthOccur(x, "/", 2)+1)), "0000")
0052
Since your 'numbers' are stored as text and may contain preceding zeros, I added the format() wrapper.
What am I missing? Please post an example string which may occur, where the above wouldn't work.
Bob
deafmetal1 11-20-2008, 01:50 PM If that's the case, then how do I have the Public Sub solve for x? In your intermediate code, you're providing the PLine data. Whereas my problem is finding and grabbing the PLine first, I'll worry about delimiting and getting specific fields from the line later. I need to get the PLine every time first. In the 4th example email (Problem Child), that's the one that shows where an errant P/ ruins my method.
That or I'm missing how best to implement your code. There are more than that numerical field that I need to grab from the PLine, I'm just simplifying for the sake of trying to reliably grab that line within an object every time.
raskew 11-20-2008, 02:11 PM Take a look at this, copied from the 4 test messages:
Test 1
x = "MSGID/TEST-MESSAGE/ME-MYSELF-AND-I/001//P/-/9999/Test/Test/Test//"
? mid(x, instr(x, "//")+2)
P/-/9999/Test/Test/Test//
Test 2
x = "MSGID/TEST-MESSAGE/ME-MYSELF-AND-I/002//P/-/1234/Test/Test/Test//"
? mid(x, instr(x, "//")+2)
P/-/9999/Test/Test/Test//
Test 3
x = "MSGID/TEST-MESSAGE/ME-MYSELF-AND-I/003//P/4/0052/Test/Test/Test//"
? mid(x, instr(x, "//")+2)
P/4/0052/Test/Test/Test//
Test 4
x = "MSGID/TEST-MESSAGE/FROM-A-PIMP/023//P/-/1234/Test/Test/Test//"
? mid(x, instr(x, "//")+2)
P/-/1234/Test/Test/Test//
I don't see the "errant P" in Test 4.
Bob
deafmetal1 11-20-2008, 02:29 PM Bob,
Open up the MSG_Data table, look under the PNUM column for Test #4 data. You'll find the Value of ERROR_FORMAT (because intNthDelimitorNumber:=3 doesn't exist due to grabbing the wrong line of data). Further, if you look at the actual PLINE it parsed (P/023//), it grabbed the wrong line due to the errant P/ in the originator field of test #4 (FROM-A-PIMP/). [Lame example, I know.]
The importance is all in my tables, I then export the data to Excel for Pivot Tabling. I understand the text format you mentioned above (appreciate the wrapper, but unnecessary), I just whipped together a sample db. In my live one, the import table is text, and the final data table is set to Single Number with 4 decimal places. There are some numbers I snag that I don't want to lose the zeros.
So if I read your code above correctly, you're counting the second instance ending in // to get the PLine? This may be doable, but for messages with more than one particular Line that may get repeated without a predictable sequence, that may not work so well, unless I come up with a flexible loop. Interesting.
raskew 11-20-2008, 02:47 PM How does (a verbatim copy of the Test 4 message):
MSGID/TEST-MESSAGE/FROM-A-PIMP/023//
P/-/1234/Test/Test/Test//
...turn into what is shown in the table? The message format is identical (with the exception that there is a '-', rather than an integer between the first two slashes) to Test 1 - 3 messages.
Is this your code doing this?
Bob
deafmetal1 11-20-2008, 02:56 PM It is my code doing that, same as in the attachment. Your code doesn't work with more than 2 lines of text in the message. i.e., If I put a full, complete message here like so:
MSGID/TEST-MESSAGE/FROM-A-PIMP/023//
P/-/1234/Test/Test/Test//
REF/B/PHONECON/20SEP2006//
T/1257PM/PACIFIC STANDARD TIME//
REMARKS/NOTHING FURTHER TO REPORT//
raskew 11-20-2008, 03:16 PM I do believe it's the code creating the problems.
In Public Sub ParseMSG (modInputMsg):
strFileToParse = "MSGID/TEST-MESSAGE/FROM-A-PIMP/023//P/-/1234/Test/Test/Test//"
' this line produces what's getting into your table
strFileToParse = Trim(Right(strFileToParse, Len(strFileToParse) - InStr(strFileToParse, "P/") + 1))
? strFileToParse
P/023//P/-/1234/Test/Test/Test//
strFileToParse = "MSGID/TEST-MESSAGE/FROM-A-PIMP/023//P/-/1234/Test/Test/Test//"
Try replacing the line in red (above) with this:
strFileToParse = mid(strFileToParse, instr(strFileToParse, "//")+ 2)
? strFileToParse
P/-/1234/Test/Test/Test//
Bob
deafmetal1 11-20-2008, 03:27 PM Bob,
Whoa! That worked!! I need to digest how, at that moment and point it does, but now I can see why you were getting frustrated that it was working on your end...lol
I'm gonna play with it a bit and report back final findings and a clean email parsing version for the code repository once done. Thank you again, my command and I appreciate it!
raskew 11-20-2008, 03:36 PM Alright! Maybe we're finally getting somewhere.
I'm somewhat crippled since I can't get the messages into Outlook, thus I can't run the complete code.
Think you're going to have to step thru it line-by-line to see if there are other problems of similar nature. I suggest testing each line of code containing InStr, Left or Right to see what's being output. Easy way might be to add a Debug.Print following the lines in question.
Bob
Added: You gotta watch them PIMP/'s, they'll do you in every time.
deafmetal1 11-20-2008, 07:06 PM I plugged that code in for the MSGID processing section... no joy.
'Move to MSGID line, determine Message Type, Originator, and Serial Number
If InStr(strFileToParse, "MSGID/") <> 0 Then
'strFileToParse = Trim(Right(strFileToParse, Len(strFileToParse) - InStr(strFileToParse, "MSGID/") + 1))
strFileToParse = Mid(strFileToParse, InStr(strFileToParse, "//") + 2)
If InStr(strFileToParse, "//") <> 0 Then
strLineToParse = CleanString(Left(strFileToParse, InStr(strFileToParse, "//") + 1))
strMSGID = strLineToParse
MsgBox strMSGID
'get Message Type
strMSGTYPE = GetStringBeforeTheNthDelimitor(strLine:=strLineToP arse, intNthDelimitorNumber:=2)
'get Originator
strFROM = GetStringBeforeTheNthDelimitor(strLine:=strLineToP arse, intNthDelimitorNumber:=3)
'get Serial Number
strSERIAL = GetStringBeforeTheNthDelimitor(strLine:=strLineToP arse, intNthDelimitorNumber:=4)
End If
Else
End If
It seems that the fFindNthOccur function is only working for a P/ line parsing, not any other line, in this case MSGID. Do I need to make another function, tailored for the MSGID line?
*EDIT: Ok, my first guess long ago was correct, your code is grabbing the line within the "//". If I edit my test email and put a // in front of the word MSGID, the above code works. However, I don't have that luxury to add anything to the 1000's of messages I have to parse. So for any lines after the MSGID line, your code will work, based on all preceding lines ending in a //, so we just gotta crack the nut the MSGID line, since it doesn't have a line preceding it ending in //.
deafmetal1 11-24-2008, 05:51 AM Figured it out!!!
Each line in the message ends with a carriage return and a line feed. Using a book I bought, Mastering VBA for MS Office, I figured out how to look for that, then the start of the line using the built in constant for carriageretun & line feed: vbCrLf
Dim strPCR As String
strPCR = vbCrLf & "P/"
strFileToParse = Trim(Right(strFileToParse, Len(strFileToParse) - InStr(strFileToParse, strPCR) + 1))
Works like a champ every time! Sorry for the troubles Bob, but thank you for the help, it taught me quite a few things!
raskew 11-24-2008, 11:39 AM Hi -
Sorry for the delay in getting back to you.
Glad the vbCrLf works for you. However, how 'bout the rest of it?
I ask because, in my opinion, your explanation in Post #25 is incorrect and is not going to do what you're looking for. If you'll look at post #18, you'll note all of the test messages are virtually identical, and I grabbed them based on the "//" at the end of the section, not the beginning. You were catching "P/" because that's what you were selecting on, rather than "//".
If you're still having problems, would it be possible to provide some test messages that are not read only?
Best wishes, Bob
|
|