Importing a space-delimited text file

bimmer5 · May 14, 2008

I have a challenge importing a space-delimited text files into Access if one of the fields contains multiple words separated by a space. For example, the "Description" field:

Date Description Amount
05/14/08 This is a first line $1234,50
05/15/08 Another line $9876.20

How can I tell Access to import "This is a first line" and "Another line" into the second field and not to try to make a column from each word?
Any suggestion is highly appreciated.
Peter

georgedwilkinson · May 15, 2008

Will there always be a "$" at the beginning of the 3rd (last) field? And will there always be a space after the first field but not before?

If so:
Date: Left(myline, InStr(myline, " ") - 1)
Description: Trim(Mid(myline, InStr(myline, " "), InStr(myline, "$") - InStr(myline, " ")))
Amount: Right(myline, Len(myline) - InStr(myline, "$") + 1)

Depending on where you put this, you may have to leave out the "Trim" function.

bimmer5 · May 15, 2008

No, the "$" sign will never be there in my real files. I just posted as an example. All fields are string fields. Please find attached a sample of the original file.
The file does not contain spaces before the first field. Yes, it will be always a space after the first field as well after any other logical field.

Mike375 · May 15, 2008

If you select fixed Width you can can put vertical lines in that create fields. The lines (double click 'em) can removed or dragged around. You get this with the second screen on Import.

However, it is limited when data is uneven like yours. On your sample I could split the date and year but not the rest. However, if you can split some then it makes Right/Left/Mid easier.

bimmer5 · May 15, 2008

Mike,
I wish if all files are simple as the first sample. I've attached another sample where you can see my real challenges. Thanks, though.

Mike375 · May 15, 2008

Just had a better look.

Attached is Excel where I just copied and pasted.

I have a "thing"

made in Access that will split up to 6 items in a field in to 6fields.If there are commas, slashes, dashes and others between them it will dump them.

Mike375 · May 15, 2008

Sample2 is a bit much for Excel/Paste.

But you can do it another way....but a bit of work.

Attaches is a DB with a query and the first line of Sample2 put in field FirstName.

From Sample2 line 1 as example: 52-60-600 -726.00 Apr 28, 2007 GJ008403 BENEFITS PREMIUM ACCRUAL FOR APRIL 2(Rev) GJ00 316113 Y C49226
If you chack the query I have done it to the extent that it is getting
52-60-600
-726.00
Apr

There are two sets of Left() and Right() and they are feeding off each other. You need a pair of them for each group of characters split by a space.

The other way of doing, which is how I have my own (but I only have 6 entities done) is the Left() takes the fist enttity, which in the example would be 52-60-600 and a macro setvale action sticks that in another field. The Right() takes it all except for 52-60-600. The result of the right function is but back into the field with the data by another Setvalue action. That means the left() now grabs -726.00 and SetValues that to another field and so it goes.

But for what you have I would just do the Lefts and Rights as required in the query. i use the macro because I also pull -()/#& etc and it runs it through several times and then to the next record.

If you just extend what is in the attached query you can do it quite quickly because you just copy/pase the calculated fields and change the field name. Just stick with calculated field names like abc abc1 abc2 xyz xyz1 xyz2 etc because thenn you only have one change to the field name.

Mike375 · May 15, 2008

Try attached DB. You would import the text file as in one field and then put records in the table field.

Put the entry in the field in the table and open Form1 and click.

This is for up to 15 entities with each separated by a space.

I tested it on the first record of the text file

52-60-600 -726.00 Apr 28, 2007 GJ008403 BENEFITS PREMIUM ACCRUAL FOR APRIL 2(Rev) GJ00 316113 Y C49226

Whatever the last entity is will be left in the table field and the other entities will each land in a different field

ajetrumpet · May 15, 2008

Hey Mike,

I seem to remember helping someone out with something like this once. I wonder who that person was...

I wonder if that person really knows what they're doing...

Mike375 · May 15, 2008

I already had them done and asked you if you could do it in code

I being playing around this one and some variations.

Public Function FindNum2(strName As String) As String
Dim strTemp As String
Dim i As Integer

For i = 2 To Len(strName)

strTemp = Mid(strName, i, 1)

If (Asc(strTemp) < 91 And Asc(strTemp) > 64) Or (Asc(strTemp) < 128 And Asc(strTemp) > 96) Or (Asc(strTemp) < 48 And Asc(strTemp) > 31) Then
FindNum2 = Left (and same thing with Right)$(strName, Len(strName) - i)
End If
Next i
End Function

gemma-the-husky · May 15, 2008

where does the file come from

you are bound to struggle with this - space delimited seems really unusual, and only makes sense if you can be sure there are no embedded spaces. if you can get the text strings srrounded by quote marks, or use a comma or other delimiter, that will help

bimmer5 · May 15, 2008

Thank you all who helped me with this issue, special thanks to you Mike!
I think I'll be O.K. for now.

Mike375 · May 16, 2008

My pleasure.

After is split you can then take the fields that have

BENEFITS PREMIUM ACCRUAL FOR APRIL

and join them into one field with [a] & " " & & " " & [c] etc

You could also put " BENEFITS PREMIUM ACCRUAL FOR APRIL" in place of the " " in the Right and Left and it would break the line at that point. However, if you try and get a bit fancy with this sort of thing you end spending a week making it all.

You can also replace " " with "-" or "(" or ")" or "," etc. You could use the ( and ) to remove the brackets around (Rev). To do that you would do it first and then join the results and then do the space split on the result. You can also use Replace() as in Replace([FieldName],"("," ")

Have fun

The_Doc_Man · May 16, 2008

First, let's be brutally honest. Your problem is a design flaw. If you accepted data with neither fixed width nor fixed delimiters, you ASKED for trouble - and got it. If there is no way to get the data back in some other format that supports a more orderly programmed approach, I'll be blunt. You are in trouble.

I'm going on a flight of logic here, so if I spin, crash, and burn somewhere, don't say I didn't warn you it could happen.

I assume that you believe this data set would be suitable in a table, which means you have SOME knowledge of the structure, even if not complete knowledge. You must work "backwards" here. Let's look at your data.

52-60-600 -726.00 Apr 28, 2007 GJ008403 BENEFITS PREMIUM ACCRUAL FOR APRIL 2(Rev) GJ00 316113 Y C49226

Those fields look predictable in format except for that mess in the middle. Take what you know about the fields and parse this thing out. For instance, you probably expect one field to be nnn-nn-nnn. Look for that. Next, you expect a number that could be signed. Then a date with a predictable format of MMM dd, yyyy

Next comes the messy one. So work from the OTHER end and scan backwards. Did you expect a field with a letter and several numbers? Pull that out from the rear. What about a single alphabetic character? Pull that out. Maybe now a 6-digit number? (Remember, working backwards.) OK, how about 2 alphas and 2 digits?

Now you have worked backwards from the tail to find the things you expect and forwards from the head of the line for things you can predict in format. Whatever is in between those two known items is your wild field that defies analysis.

Mike375 · May 16, 2008

The_Doc_Man said:
First, let's be brutally honest. Your problem is a design flaw. If you accepted data with neither fixed width nor fixed delimiters, you ASKED for trouble - and got it. If there is no way to get the data back in some other format that supports a more orderly programmed approach, I'll be blunt. You are in trouble.

Often you don't have that luxury. I buy names, address, phone, occupation, bus name, bus type for telemarketing lists. The price can can run from 40c to $1.50 depending on the format. 40c is junk format, but I can split it. A lot of the data entry is done in third world countries and a lot is often from OCR conversion on scanned lists. 10,000 names is $15,000 in a nice format vs $4000 in junk format. Guess which one I take.

WayneRyan · May 16, 2008

Bimmer,

We working with data like this can be very troublesome. In addition to Doc's points,
I'd like to add that you may never be able to reconstruct the data in this format.

For example:

If you normally receive data --> String string number number

What if one of the numbers is blank?

You have no way of knowing which number was provided, it's just a guess.

Unless you have a standardize fixed-column format, or a nicely delimited file,
you can generally expect probems.

hth,
Wayne

Mike375 · May 16, 2008

Wayne

I think what you are referring to is frequently seen with name, address, phone number data

John Smith 4 Jones Street Double Bay 2120 0294566789
Bill Andrews 11 Cullens Rd Punchbowl 0291235467
Tony Gale 57 Alex St NSW 0291274834 0299159067 02 45579246

Let's say you can't get the data in another format. What do you do? Is there an alternative to splitting into fields? I guess if someone works in a gov't dept or large company they can just reject it.....we don't do that etc.... but for small business it can be different.

I would say most people getting such data can get it together OK because they will be familiar with what the data is about. For example, it is easy to split any number entries from letter entries that land in the same field. In my case I know any number like 2120 is post code and Len() will get those from phone numbers. Also easy to pick up where a double word suburb lands in two fields. I also know that NSW, QLD etc are states in Australia. I know 02, 03 etc are dialing codes.

The_Doc_Man · May 20, 2008

10,000 names is $15,000 in a nice format vs $4000 in junk format.

So how much do you think your time is worth? When it gets to where you would charge $100/hour then in 90 hours (two weeks plus a little overtime), you break even. If you are charging your time to a customer, you might be charging more than that. And,... when you spend 90 hours massaging data by hand, that is SO labor intensive that you aren't doing anything else constructive.

You tell ME what it is worth.

Mike375 · May 20, 2008

The_Doc_Man said:
So how much do you think your time is worth? When it gets to where you would charge $100/hour then in 90 hours (two weeks plus a little overtime), you break even. If you are charging your time to a customer, you might be charging more than that. And,... when you spend 90 hours massaging data by hand, that is SO labor intensive that you aren't doing anything else constructive.

You tell ME what it is worth.

This where the realities of the small business come into play. For starters, I am the customer

When I buy 10,000 names the time to get from that list to those names being in the data base and ready for telemarketing is not much different one way or the other. The major time factor is culling the list. To get the list from all entries being in one field to different fields does not take too long.

On average I will get the names into a format that is close to what I want before lunch on the day I start the split process. I then start the cull process. First up, I get the phone numbers all in the same format so brackets, dashes and spaces are gone. This is to check if the names are duplicated with names already held and yet to be called and names that have been called and are in various stages of a call including having been rejected and if rejected, how long ago and for which of my telemarketers. I also do a check using a field where I have joined first and last name because some people will have changed jobs and thus the phone number is now different.

On average I cull out 15% of the names for various reasons. Of that 15% there is a very small number, perhaps 50 to 200 names out of 10000 that are messed up in the split process. They just get deleted.

It does not matter how I do it there will be manual stuff to do that is a far bigger deal than splitting the names. On average it takes me 2 or 3 days to get the list ready and nearly all the time is in the culling and splitting the names into different groups.

Importing a space-delimited text file

Registered User.

AWF VIP

Registered User.

Attachments

Mike375

Guest

Registered User.

Attachments

Mike375

Guest

Attachments

Mike375

Guest

Attachments

Mike375

Guest

Attachments

Banned

Mike375

Guest

Super Moderator

Registered User.

Mike375

Guest

Immoderate Moderator

Mike375

Guest

AWF VIP

Mike375

Guest

Immoderate Moderator

Mike375

Guest

Similar threads

Users who are viewing this thread