Issue with Parsing a Delimited Text File

rkane · Jan 16, 2013

Hello All!

I am having an issue with trying to import a semicolon delimitted text file with 52 columns into a table.

I am utilizing a piece of vba to parse through the file, however there are a pair of comment fields in the text file, and sometimes people are using semicolons in their comments.

Is there a way to have it either ignore any semicolons found in text file column range 266-315 and 317-366? Or, have access edit the file to replace those particular semicolons with a space/comma?

Thanks in advance!

DavidAtWork · Jan 16, 2013

"Is there a way to have it either ignore any semicolons found in text file column range 266-315 and 317-366"
I thought you specified there were only 52 columns!
This sounds like a nightmare using : as a delimiter and I don't see a way round this problem. Where do the files originate from, is it possible to do a find & replace at the source and replace all :'s with a space or comma, otherwise could you use a different delimiter such as | which is a pipe and not used in normal text writing
David

rkane · Jan 16, 2013

I guess I should clarify. The columns I was referring to when talking about the 266-315 and 317-366 is the character reference in Notepad. IE Ln 26, Col 366.

The source is a text file that our system sends to a network folder. This is a case where the "download" was setup to capture data before reporting could be made on it. We are changing the process to replace the extra semicolons in the comments section going forward, however, I have 2.5 months worth of back data that needs to be worked, which is what this is for.

I can do each file individually by importing it into Excel, performing a text to column on those sections, do a find and replace, save as a text file, repeat. I was just hoping for a faster solution.

DJkarl · Jan 16, 2013

If you could put text delimiters around the comment field you might be ok, but since these are old files and already set I can't think of an easy way to do this, as my old boss was fond of saying, GIGO (garbage in, garbage out).

mdlueck · Jan 16, 2013

To be able to import CSV files generated by MS Office programs, I ended up needing to process the data one character at a time. Access / Excel are not consistent in the CSV data they generate. One of the apps generates files with mixes quoting... I suppose "if I think I need the quotes then I will add the quotes".

So my import code knows which fields could potentially be quote delimited, tests for the presence of a quote, and knowing that turns a "comma lock" variable on/off.

The result is my app is able to read CSV's generated with both Access and Excel. That was a nasty curve ball to encounter at the time.

jdraw · Jan 16, 2013

Can you post some sample data?
My preference is to bring the data into an access table, then parse records in that table into the "desired" destination table and format.

Not always straight forward, and not always possible, but my general approach.

rkane · Jan 16, 2013

jdraw said:
Can you post some sample data?
My preference is to bring the data into an access table, then parse records in that table into the "desired" destination table and format.

Not always straight forward, and not always possible, but my general approach.

Don't think that approach would work to be honest. Here's one of the error lines that I get:

0;2;XXXX;20121025;12374610350;272105362;035013;XXXX;010;0518073;3;L;03;0000071.98 ;0000011.00 ;0000000.00 ;H2; ;BK;89903;2172;00001 ;00034.99 ;5;2310;00001 ;00036.99 ;5; ;00000 ;00000.00 ; ; ;00000 ;00000.00 ; ; ;00000;00000.00 ; ; ;00000 ;00000.00 ; ;USPS CAN'T FIND/WILL NOT DELIVER'CREDIT ALL;xxxxxx;xx AAB ;01;H; ; ;0000000.00 ;0000000.00 ;

The ; after CREDIT ALL is the excess. Keep in mind that this is all one line, and it looks like it removed some excess spaces when I pasted it (line length is 403 characters).

Pat Hartman · Jan 16, 2013

My preference is to bring the data into an access table, then parse records in that table into the "desired" destination table and format.

That doesn't work when you have extra delimiters. The extraneous delimiters make Access think there are extra columns in the record. So, what happens is if a stray semi-colon occurs in a text string, the string ends at the semi-colon and a new column is assumed to start following it. If you thought you had 10 columns but one of them contained a stray semi-colon, then that row would have 11 columns which gives Access fits. As Mike said, the only way around the problem is to read the file using file I/O rather than TransferText so you can parse a record, character by character and either double or eliminate the extra delimiters. I can't find any code sample but if you search you might find some. Look for OPEN INPUT INPUT# to find file I/O samples.

jdraw · Jan 16, 2013

Here's one of the error lines that I get:

What is an error line compared to your original data?
You may preserve spaces etc if you use [ c o d e ] [ / c o d e ] tags rather than [ q u o t e ].
2 or 3 records should suffice.

Can you show me the layout for your "table with 52 columns"?

EDIT after posting:

Pat,
Didn't see your post until after I submitted this one. Yes, I'm aware of the potential of extra semicolons, and I have used INPUT in the past.

I just want to see what the poster is dealing with. I appreciate the heads up, thanks.

jdraw · Jan 16, 2013

Using AC2010, I set up an import delimited based on ; delimiter and let Access do the datatype assignment. I got 54 fields with 1 import error on datatype in field5.

I made a change to make all fields text, set up a table and reimported the data into that structure (no import errors).

Since I don't know your field names and don't know what you expected, I am attaching the fields and values for the fields that were imported.

Now, if you are saying that the extra semicolons appear after the CREDIT ALL, it would be easy enough to reference another table structure and concatenate the extra data in one field. However, if as others are suggesting, the extra semicolons may appear anywhere, then all bets are off.

attached is the output I have related to you sample record.

Pat Hartman · Jan 16, 2013

jdraw. Add a semi-colon in the middle of one of the text fields. Access will now think there are 55 columns. I would test with a table of just a couple of columns to minimize the work. If you import the following file, Access will assume 4 columns even though our intention is three. The second record has a stray semi-colon
fld1;fld2;fld3
aa;aab;aac;
bb;b;bc;bbd
cc;ccd;cce

jdraw · Jan 16, 2013

Pat,

I agree 100% if the rampant semi colons happen anywhere.
But I didn't read it that way. I understood the post to say that the semicolons after the CREDIT ALL were all extraneous.

I really can't believe a production process is "this loose", but asked the poster to confirm where the "unexpected semi colons" occurred.

In my view, if the semicolons could appear anywhere, then automation of any sort would be questionable. This user has not posted the layout of the desired table, so programming/importing on a char by char basis does not seem to be viable.

I was looking at an Access approach to try to address

I can do each file individually by importing it into Excel, performing a text to column on those sections, do a find and replace, save as a text file, repeat. I was just hoping for a faster solution.

I guess until we hear back - we have no specs, we have no desired output format, and only 1 sample record with no information on field names or data types which the poster may says may have been shortened by the forum software...... (sounds like life 30 years ago)

We've probably spent more time on this than warranted.

Solo712 · Jan 16, 2013

rkane said:
Hello All!

I am having an issue with trying to import a semicolon delimitted text file with 52 columns into a table.

I am utilizing a piece of vba to parse through the file, however there are a pair of comment fields in the text file, and sometimes people are using semicolons in their comments.

Is there a way to have it either ignore any semicolons found in text file column range 266-315 and 317-366? Or, have access edit the file to replace those particular semicolons with a space/comma?

Thanks in advance!

As Pat points out, Access would not have a way of knowing which semicolons are delimiting fields and which are part of comments. Perhaps, the best I way to clean this up, if at all feasible, would be to scrub the source records before exporting data. If it is possible re-run the data through a routine which would look at the fields that are likely to offend before exporting, and fix the problem at that point, that would be seem to be the optimal way out of your trouble.

To repair this at any point after export is difficult. As a matter of fact, you have to rely on the ability of the code to identify reliably the start of a new record (given that each semicolon would be read as new field and the number of fields hence appears variant). On the sample record that you provided the first field appears to be a text-converted unsigned numeric or alphanumeric. If this is indeed the case, you can likely work on the records. The last two fields are zero-filled fixed-lengthth decimals (trailing sign) , after the 50th semicolon you have a ten-digit fixed decimal and the same after the 51th, then you are on track. Ther are no extra semicolons. If there is none after the 50th but in the two fields following, you have one extra semicolon in the record, etc, etc. You can separate the healthy and exception records, fix the anomalies - probably best done manually - then merge the I/O files and import onto Access. I really do not see much hope beyond this.

You indicate you know the position in the buffer where the problem might occur. I trust this does not mean the fields are fixed length.

Best,
Jiri

Issue with Parsing a Delimited Text File

rkane

New member

DavidAtWork

Registered User.

rkane

New member

DJkarl

Registered User.

mdlueck

Sr. Application Developer

jdraw

Super Moderator

rkane

New member

Pat Hartman

Super Moderator

jdraw

Super Moderator

jdraw

Super Moderator

Attachments

Pat Hartman

Super Moderator

jdraw

Super Moderator

Solo712

Registered User.

Similar threads

Users who are viewing this thread