Parsing Problems

indyaries

Registered User.
Local time
Today, 22:44
Joined
Apr 22, 2002
Messages
102
Greetings,

Up front, I am posting this question in both Access-Queries and Excel forums. I am using Access-Excel 97 SR2 at work, and Access-Excel 2000 at home.

I have a large Excel file where the addresses are all in one field (Address). Here are some examples:

807 E.South St. Apt.33 Crown Point, IN 46307
5269 Cedar Point Dr. G152 Crown Point, IN 46307
717 Moraine Trace #16 Schererville, IN 46375
8128 Mount Ct. #A Crown Point, IN 46307

I need this data broken into these fields:
Address = Street Address
City
State
Zip

I have phone numbers that need the dashes stripped out.
I have dates in the format mm/dd/yy that need to be mm/dd/yyyy.
I have names in Last, First format in one field...need these to be split also.

I've not found anything useful in the Excel forum, and I've not been able to get anything I've found in the Access-Queries forum to work the way I need it to.

Any help would be greatly appreciated. This info needs to be fixed before this Thurday morning...

Thanks in advance to all who reply !!!!!

Sincerely,

Bob in Indy.
 
I've done this before and its takes a little ingenuity.

I'd build a macro unpacking from the right, i.e. first picking off the zip code, then the state. Storing in separate cells as you go.

The city poses a problem because it may be two, possibly more words, therefore have your macro prompt you for the number of word in the city, then what's left is the street address.

I hope that this gives you a runnihng start.
 
llkhoutx,

It was a tedious process, but I finally got the conversion done using Excel, autofilter, and a lot of UPPER and CONCATENATE routines. Let's not forget the cutting and pasting afterwards <smile>.

The macro you spoke of...was that using Access or Excel? Do you have any examples to provide?

I've not had one response from the Excel forum yet. There just HAS to be an easier way to do this.

Thanks again for replying!
 
Sorry, no longer have any such code as it was long ago.

The unpacking I did was in a Lotus 1-2-3 macro, but there is no reason it can't be similarly done in Excel.

To accomplish the task, use the macro recorder to develop what you need.

I'll give it a try this weekend on the data that you posted with your question and will post the macro or email it to you.
 
As long as you have an ill-behaved data set, you will have terrible headaches.

807 E.South St. Apt.33 Crown Point, IN 46307
5269 Cedar Point Dr. G152 Crown Point, IN 46307
717 Moraine Trace #16 Schererville, IN 46375
8128 Mount Ct. #A Crown Point, IN 46307

If you had commas as field separators in strategic places, this would be a piece of cake. But you don't have uniformly placed separators between the street address and the city. Nor do you have uniform content for the city. One word vs. two words, other issues as well - such as apartment number vs. raw street addresses, different formats for apartment specifiers (Apt.33 vs. #16 vs #A vs G152). With neither uniformity nor reliable separators, I don't see an obvious algorithm that jumps out to say, "HERE I AM! USE ME!"

I once built a string parser that could find punctuation and could separate numbers from letters, spaces, and various types of punctuation, but the truth is that even my best parser would have barfed on that data set.

So the next thing that comes to mind is, how many different city/town/municipality names do you have? Because one possibility is to predefine a list of things that can go in this field. Then you could use some rather messy searches to identify the starting point in each string where a valid city name began.

Now, if I were doing it, I would first pull in the raw string to a temp table. Then I would write a query that did a "like" search for the city names in those strings, updating a second field in the temp table with the column number where the city name started, and optionally a third field showing which city name I think matched. For non-match cases, update your list of cities with the names (discovered by visual inspection or by a query to find null names in the match field) and run the query again. You could run this iteratively until no entry in the temp table had a blank in the "Matching City" field.

From there, you could break out the first part, city, state, and zip because the zip and state have commas, and you would recognize the start and end of the city field based on what name was matched by the query.

As to your telephone stuff, this is again a brute-force parse that has to look for ( ) and -, plus simple spaces as separators in some cases. Looking into my crystal ball, I see VBA code in your future. Lots of VBA code...

As to breaking apart LastName, FirstName - good luck. If you have titles, suffixes, and middle initials in the pile, you are going to have a massive headache. I've tried to handle this more times than I can remember, and there is NO hard-and-fast rule that you can apply.

In general, if your dataset is not well-behaved, you can expect all sorts of uglies to occur in breaking apart fields. And I have yet to find the perfect parser for this stuff. So it might be a good idea to be willing to accept less than perfect results.
 
I started a macro in Excel, but decided that I'd do it in Access with VBA.

Your posting doesn't say anything about the Apt Number being stripped out into a separate field, I'll include that also.

Names are a big problem with optional potenital prefixes and suffixes as well as middle names and initials. Some have initials instead of a first name.

I invision my code prompting the user the identify what's being parsed to branch appropriatelt througn the code.
 
I think you might be better off producing a temp table with errors and letting the user fix them there. Prompting the user interactively could get annyoning expecially if there are hundreds of errors.
 
User's data may be ratty, but is quite easy to identify what the data is on a record by record basis.

I cleaned up 4,500 records this way one time. It was tedious identifying the type entries I had and then properly responding to the prompts, but it worked the first time with minimal recpords being fixed on viewing the data.
 
Given your example, this shouldn't be difficult.

(1) Copy/paste Function xLastInStr() and Function Removeomatic(),
below, to a new module.

(2) Create the following table:
tblAddress
Field: MyAddress; Text
Field: Street; Text
Field: City, Text
Field: State; Text
Field: ZIP; Text

(3) Create and run Append Query (your problem) to populate
field MyAddress with your list of mal-structured addresses.

(4) From the example, everything following the last space in
MyAddress = the ZIP code. So you could use this update
query to populate field ZIP.
Code:
UPDATE tblAddress SET tblAddress.ZIP = 
Mid([myAddress],xLastInStr([myAddress]," ")+1,5);

(5) Download a ZIPCODE table. There are a variety of free
sources. Try: http://www.granite.ab.ca/accsmstr.htm or http://asbdc.ualr.edu/free/

(6) Create a new query. Bring tblAddress and tblZips into the
grid. Create a left-join on Zipcode. Bring in fields City and
State from tblAddress. Turn the query into an Update Query
and, in the Update To cells, specify the City and State fields
from tblZips. The query would look like this but will vary
depending on your field names.
Code:
UPDATE tblAddress LEFT JOIN tblZips ON tblAddress.ZIP =
tblZips.Zipcode SET tblAddress.City = [CityName], 
tblAddress.State = [StateAbbr];

(7) After running the query from (6), all that should remain is to
to fill in the street address, which will be everything to the
left of the City name in field MyAddress. This update query
should do it:
Code:
UPDATE tblAddress SET tblAddress.Address = 
Left([myAddress],InStr([myAddress],[City])-1);

(8) You can remove dashes from a string using Function
Removeomatic (see below).


****************************************************************************
Code:
Function xLastInStr(ByVal tstr As String, twhat As String) As Integer
'*******************************************
'Name:      xLastInStr (Function)
'Purpose:   Return location of last instance of a character or phrase.
'Inputs:    Call xLastInStr("the quick brown fox jumped the lazy dog", "the")
'Output:    28 - Location of last occurence of "the"
'*******************************************

Dim i As Integer, N As Integer, tlen As Integer

N = 0
tlen = Len(twhat)
For i = Len(RTrim(tstr)) To 1 Step -1

  If Mid(tstr, i, tlen) = twhat Then
      N = i
      Exit For
  End If
Next i

xLastInStr = N

End Function

Code:
Function Removeomatic(ByVal pstr As String, ByVal pchar As String) As String
'*******************************************
'Name:      Removeomatic (Function)
'Purpose:   Removed specified characters from a string.
'Inputs:    from debug window:  ? Removeomatic("123-45-6789", "-")
'Output:    123456789
'*******************************************

Dim strHold As String
strHold = RTrim(pstr)
Do While InStr(strHold, pchar) > 0
  strHold = Left(strHold, InStr(strHold, pchar) - 1) & Mid(strHold, InStr(strHold, pchar) + 1)
Loop
Removeomatic = strHold
End Function
 
Last edited:
Attached is a zipped Access 97 database which parses the data you originally posted, excepting that I added names in various permutations.

I imported your data from an Excel spreadsheet to which I added names in to table "ExcelImport".

My code is in function basParseData.

This ought to give you an idea of how to do this. The code need not be elegant because usually it's a one show deal. You use it and then throw it away.

Bon chance
 

Attachments

llkhoutx

Checked out your example. It's just too painful - requiring the user to answer a series of questions regarding each and every record. That's not automation and we might as well provide the user with a hard-copy form and ask him/her to fill in the blanks. With just a five-record example, I found myself asking 'when will this ever end?' and checking any option, just to make it go away. OK, I may be short on patience, but what if it were a 5,000 record table or a 50,000 record table? Granted, it may have worked for you but, THIS IS NOT WHAT AUTOMATION IS ALL ABOUT!!

We need to ask indyaries how the name is displayed, in a separate field or as the first part of the example provided. Since he didn't mention otherwise, my guess that it's in a separate field, so we can hopefully deal with it independent of the address issue.

Regarding name parsing, readers may want to checkout this MSKB article. It seems to cover most of the bases and I've used it successfully in the past.
http://support.microsoft.com/default.aspx?scid=kb;en-us;Q168799

I'd always been led to believe that within the Zipcode system, a particular city may have multiple ZIPs, but that a particular ZIP would never represent more than one city, unless an individual user had added records representing districts/communities/etc., that didn't have their own post office.

When I wrote my response, it was with this assumption. However, after downloading the zipcode files from http://www.granite.ab.ca/accsmstr.htm, (which I'd noted but not used before) I find that they often represent multiple cities with the same Zip. Checking out a few with which I'm familiar, it's apparent that the specific data is incorrect. Not being a postal code expert, I'm unsure which approach is technically correct. However, to accommodate these problem cases, I revised the code. What follows is a function that sequentially processes each of the procedures previously described.

Code:
Function UpdateAddress()
Dim db As DATABASE
Dim strSQL As String

Set db = CurrentDb

'(1) populate Zipcode field
strSQL = "UPDATE tblAddress SET tblAddress.ZIP = " _
       & "Mid([myAddress],xLastInStr([myAddress],' ')+1,5);"

Call RunMySQL(strSQL)

'(2) populate City and State fields
strSQL = "UPDATE tblAddress LEFT JOIN tblZipcode ON " _
       & "tblAddress.ZIP = tblZipcode.Zip SET " _
       & "tblAddress.City = [tblZipcode].[City], " _
       & "tblAddress.State = [tblZipCode].[State]" _
       & "WHERE (((InStr([tblAddress].[MyAddress],[tblZipcode].[City]))>0));"

Call RunMySQL(strSQL)

'(3) populate Address field
strSQL = "UPDATE tblAddress SET tblAddress.Address = " _
       & "Left([myAddress],InStr([myAddress],[City])-1);"

Call RunMySQL(strSQL)

'(4) view results
docmd.OpenTable "tblAddress", acViewNormal
       
End Function

'******************************************

Function RunMySQL(pSQL As String)

docmd.SetWarnings False
docmd.RunSQL pSQL
docmd.SetWarnings True

End Function

Best wishes,

Bob
 

Users who are viewing this thread

Back
Top Bottom