Question about VBA Text Parsing

frustrating

Registered User.
Local time
Today, 05:12
Joined
Oct 18, 2012
Messages
68
All:

I'm building a parser that will evaluate a text document and return data to a database. I'm running into some roadblocks, however, and was wondering if I could get some input or if someone has any ideas on how to make this work. There are two fields I'm really looking at, as an example, we'll say Order and Stuff. The text file will follow this logic:
Order: 1234
Stuff: 1
Stuff: 29
Stuff: 27
Stuff: 14
Order: 5678
Stuff: 83
Stuff: 102
Stuff: 7
Order: 9999
Stuff: 22
Order: 2468
Stuff: 12
Stuff: 72

So what I want to do is take this data and group everything by their respective order numbers, then concatenate the "Stuff" field into one string (semicolon delimited) to insert into the DB. So the output would look something like this:

---
Order: 1234
Stuff: 1;29;27;14

Order: 5678
Stuff: 83,102,7
I would then run a SQL insert or update query to send Order and Stuff to their fields in the DB.

Where I'm getting stuck is when I'm writing my loop to reconcile the Order ID with the Stuff entries. I'd like to evaluate all the data between Order numbers, but I'm having a mental block on a good way of doing this. What is a good way to capture the Order ID and then apply that to respective Stuff IDs?

Any ideas?
 
To get focused assistance you're going to have to describe WHAT the things are that you are working with and use simple terms.
You're telling us HOW you plan to do something, but we have no idea WHAT you are really working with.

Here's a generic method from Barry Williams's site. that many people would use, but you know your situation and we don't.

These are the Steps in a Top-Down Approach :-

Define the Scope as the Area of Interest,(e.g. the HR Department in an organization).
Define the "Things of Interest",(e.g. Employees), in the Area of Interest.
Analyze the Things of Interest and identify the corresponding Tables.
Consider cases of 'Inheritance', where there are general Entities and Specific Entities.
For example, a Customer is a General Entity, and Commercial Customer and Personal Customer would be Specific Entities. If you are just starting out, I suggest that you postpone this level of analysis.
At this point, you can produce a List of Things of Interest.
Establish the relationships between the Tables.
For example, "A Customer can place many Orders", and "A Product can be purchased many times and appear in many Orders."
Determine the characteristics of each Table,(e.g. an Employee has a Date-of-Birth).
Identify the Static and Reference Data, such as Country Codes or Customer Types.
Obtain a small set of Sample Data,
e.g. "John Doe is a Maintenance Engineer and was born on 1st. August, 1965 and lives at 22 Woodland Street, New Haven.
"He is currently assigned to maintenance of the Air-Conditioning and becomes available in 4 weeks time"
Review Code or Type Data which is (more or less) constant, which can be classified as Reference Data.
For example, Currency or Country Codes. Where possible, use standard values, such as ISO Codes.
Look for 'has a' relationships. These can become Foreign Keys, or 'Parent-Child' relationships.
You need to define a Primary Key for all Tables.
For Reference Tables, use the'Code' as the Key, often with only one other field, which is the Description field.
I recommend that names of Reference Data Tables all start with 'REF_'.
For all non-Reference Data Tables, I suggest that you simply assign an Auto-increment Integer for each Primary Key.
This has some benefits, for example, it provides flexibility, and it's really the only choice for a Database supporting a Web Site.
However, it complicates life for developers, which have to use the natural key to join on, as well as the 'surrogate' key.
It also makes it possible to postpone a thorough analysis of what the actual Primary Key should be. Which means, of course, that it often never gets done.
Confirm the first draft of the Database design against the Sample Data.
Review the Business Rules with Users,(if you can find any Users).
Obtain from the Users some representative enquiries for the Database,
e.g. "How many Maintenance Engineers do we have on staff coming available in the next 4 weeks ?"
Review the Results of Steps 1) to 9) with appropriate people, such as Users, Managers,
Development staff, etc. and repeat until the final Database design is reached.
Define User Scenarios and step through them with some sample data to check that that Database supports the required functionality.


Good luck.
 
I suggest looking at regular expressions. https://msdn.microsoft.com/en-us/library/28hw3sce(v=vs.90).aspx will give you an overview of how they work and http://stackoverflow.com/questions/5539141/microsoft-office-access-like-vs-regex show how a simple search is set up. You might be able to capture the "stuff" between orders in a collection and then iterate through the collection.

Regular expression might not work for you if the data is too voluminous and I can't really say how much data is too much. I once was using a regular expression and then tried the instr function instead and experienced a speed increase of more than two order in magnitude.
 
Last edited:
Do you have a good reason for wanting to concatenate the Stuff into a single field? That would not be the ideal data structure if you have a choice.
 
Do you have a good reason for wanting to concatenate the Stuff into a single field? That would not be the ideal data structure if you have a choice.

For this particular purpose, it actually makes sense to concatenate. This is basically flat filing an EDI file, and everything applicable needs to fit within one record.
 
Welcome to the world that exists "under the covers" of a finite automaton. The problem has to do with "states" and when you transition among them. How much do you actually know about parser theory? The problem is that you need to define states corresponding to what you need to see so that you know when to store and when to parse. In my opinion, what you need to consider is that you should not store ANYTHING you have previously accumulated until you can do the whole thing at one (complex) step.

I.e. if you have an Order number, you can erase your temporary strings, but you should not really store that Order number until you have reached the next Order keyword.

In between Order keywords, you will have Stuff keywords for which you append your Stuff numbers between semicolons in a temp string. You can accumulate some number of Stuff but can't do anything with it yet.

When you hit the next Order keyword (or the end of the input stream) then you have your Order and Stuff entries ready for storage together, where you can associate them to each other as needed. Just don't rush to store what ain't ready to store.

If you were aware of parsing theory or finite automata theory, you would be able to define parser "states" corresponding to the various steps associated with parsing. For each individual line, the states would appear to be:

NewLine state; SawKeyword state; SawColon state; SawNumber state, SawEndOfLine state; and the SawSomethingIDidNotLike state that covers what happens when you have junk input.

However, there is a second level of this, because you have "outer" states corresponding to SawOrderLine state; SawStuffLine state, SawEndOfFile state, and SawUnexpectedInput state. The SawOrderLine state would trigger you to check for prior accumulated data before you store stuff for the new Order line. The SawStuffLine state would merely be a way to append stuff to your accumulating data.

This sort of thing will look very much like spaghetti code if you implemented it with VBA, but that IS the nature of programming finite automata without some sort of parser-oriented support language that allows you to define tabular states for various actions.

The only REAL advice I can give you is to try to read up on finite automata as a way to organize your thoughts on how to parse what is essentially a semi-fixed-format input stream and extract the relevant data at those moments during the parse operation that you CAN have relevant data.
 

Users who are viewing this thread

Back
Top Bottom