Query a mailing List

buratti · Apr 27, 2010

This isn't so much of a database design question, but I guess more of a suggested technique. What I am doing is taking a master mailing list from the post office for a my town, and I want to remove all of my current customers from that master list. I have imported both lists as separate tables in a new database, and obviously the easiest way would be to run the query wizard of unmatched records based on the address field and I would be finished. But its not that simple in my case...

My customer address table has hundreds of rather misspellings, typos, or different abbreviations for street names than the master list does. Example: the master list may have an address of 25 N Main St, and my customer list has that same address but it's spelled out like 25 No Main St, or the word Circle abbreviated as Cr when it should be Cir, etc. Being that it is not the exact spelling as the master list, those addresses would NOT be removed.

In the past I have removed what I could automatically with the find unmatched query then went through each remaining record manually one by one to see if it should of been removed, find the reason it wasn't, and removed it if need be.

Does anyone have any suggestions or a technique on an easier way to automate this process better and NOT have to go through each record one by one again? Thanks for suggestions.

namliam · Apr 27, 2010

Make a list of common errors... then auto transscribe them...

The tricky part is extracting only the street name first from your combined house nubmer/streetname field.
You can probably do something like:
Mid (yourfield, instr(yourfield," ")+1 )
This returns: "N Main St" from your example of 25 N Main St

Then do a "group by" and create a new table from ONLY your streetnames.
Now you can go find streets that are simular but different and "translate" them into "better" or "corrected" street names. then take your original table, and either fix your database or use your translation table to join to your third table where you can exclude.

Either way, which ever way manual work will be required, good luck

buratti · Apr 27, 2010

Thanks for the quick reply. I think I mostly understood you, but by "translate" do you mean to still maunally change the fields? I guess grouping them this way would be easier than comparing 2 tables maunally.
I've never used it, but isn't there something like a "like" function? If so do you think that would work and if so how would I use it?

namliam · Apr 28, 2010

YOu can use:
Street Like "*Main*"
which will find both "N Main" and "No Main", this doesnt however do what you like it to, I think.

Yes well, manually yes kindoff, except instead of correcting 100 addresses, your mapping the same "typo" only once.
Plus not actually fixing the typo, but intead leaving it in place.

Orig table => Fixing table => Check table

Where the fixing table on one side hold the (single) faulty info and on the otherside the (single) fixed

namliam · Apr 28, 2010

I.e. Your fixing table could look something like:

Code:

Original       Fixed
N Main St      No Main St
Nr Main St     No Main St
Nbr Main St    No Main St
Number Main St No Main St
N Main Street  No Main St
N Main Str     No Main St
No Main St     No Main St

Where your N Main St "error" could be happening 100 times, so instead of fixing 100 addresses you fix the single one here

DCrake · Apr 28, 2010

You may want to do a search on Fuzzy Logic there are a few threads that cover this topic.

buratti · Apr 28, 2010

Thanks. I haven't gotten a chance to actually try those suggestions yet, but will soon, and post back on how it worked out.

buratti · Apr 30, 2010

Hey guys, I'm back with some problems. I was able to create an automated way to remove about 90% of my customer addresses from the master list, and now I'm stuck with the last 200 or so records.

For the record and if anybody else is following this, what I did to remove the vast majority of them is as follows:

Started with 2 tables: Customer List and Master List.

I created a query based on the Customer list and added a field to the query named Modified Address that ran a custom function reformatting the address field. In that function I hard coded the top 5 or so most common mismatchesand fixes. Ex removed punctuation, changed "No" to N, "So" to "S", "West" to "W", etc. And then the last thig it did was remove all letters/words after the LAST space in the string. So an address of lets say "123 No Main St" would be returned as just "123 N Main".

I created a similar query based on the Master list and added a field just as the Customer list query, but in this query I only formatted the address to remove the last set of letters after the last space in the string (like the last step in the customer query above).

I then created a simple find unmatched query based on those queries just created, joining those added fields, and it "removed" about 90% of what I needed (as opposed to about 70% before reformatting).

Now those 200 or so records that were not removed/still do not match, now need to be removed.

I figure that namliam's suggestion would be the way to go here to sort out these last records. I appoligize, I thought I understood what you were talking about on how to "auto transcribe" the records by creating a table and grouping by, but I guess I don't. Would you be able to give me a more detailed explination of what you were talking about?

namliam · May 3, 2010

basicaly you did it already... but hardcoded in a function...

You would create able with all the types of (common) mistakes people make, then add a column to fix the mistake.

Then take your customer table join that to this correction table, then join the correction into your master table.
Ultimate solution would be to do address validation upon data entry

buratti · May 4, 2010

Thanks. To try to filter out and remove those last 100 or so records, I'm going to try and experiment with using the like operator. It would be too confusing to try to explain exactly what I'm trying to do so instead here a more simple example, but same concept:

First of all, can I use the syntax [filed1] like '*[field2]*', or does [field2] need to be some type of string or integer?

I have tried that and variations in the syntax but it rather returns no records or makes my computer hang for 5 minutes or so before returning all records. Any suggestions?

namliam · May 4, 2010

[filed1] like '*' & [field2] & '*'

Only works if both fields are text fields

buratti · May 4, 2010

Thanks, but I cant quite seem to get that working. For experimental purposes and to get the syntax correct to yeld the results I should have, I'm basically for now just trying to get a matching records from the original master address list and the original customer list using the like operator.

Technically speaking (correct me if I am wrong) the "like" expression without any wildcards ("*") would be just like using the "=" operator correct? Here's what I have:

SELECT [Master Addresses].Address, [Customer Addresses].[Phy Street]
FROM [Master Addresses], [Customer Addresses]
WHERE ((([Master Addresses].Address) Like [phy street]));

Takes five minutes to return results, but no results are returned

And variations like:

...
WHERE ((([Master Addresses].Address) Like '*' & [phy street]));

Also takes 5 minutes with no results

...
WHERE ((([Master Addresses].Address) Like '*' & [phy street] & '*'));

Also takes 5 minutes with no results

...
WHERE ((([Master Addresses].Address) Like " & [phy street] "));

Runs instantly but no results

It would not let me just put & [phy street] without automatically adding the quotes (") around it

But...

...
WHERE ((([Master Addresses].Address) = [phy street] ));

works perfectly showing that matching records/addresses

Do you see anything wrong that I am doing?

namliam · May 4, 2010

Yes like works -more or less- like = when there are no wild cards, problem though it cannot use any index if it exists.
Also if you have a [phy street] containing wildcards the will be evaluated as wildcards.

Problem why it takes for ever (5 minutes) is the cartasian product, each address from your master is being checked against every [phy street] which potentially can be huge.

100.000 masters checked against 1000 phy's is 100.000.000 records generated

Sence check against a small table first, make a empty database create table 1... stick some (10?) records in.
Create table 2, stick some searches in...
Do your query...
I.e.
Table3:
Field1
a
b
c
d
e
f
g
gh
bbbbbabbbbb

Table4:
Field1
a
g
f
h
t
r
v
d
s
w

Query:

Code:

SELECT Table3.Field1
FROM Table3, Table4
WHERE (((Table3.Field1) Like "*" & [table4].[field1] & "*"));

returnes
Field1
a
bbbbbabbbbb
g
gh
f
gh
d

As per what I would expect...

buratti · May 4, 2010

I think i figured it out somewhat. My problem (I think) was that when exporting my customer list (from my industry specific software) it adds for whatever reason, about 30 spaces to the end of the address. So using the like operator is looking for those extra spaces in a comparison. i used the trim() function along with it and I got better results that I was looking for. Still experimenting with ideas and let you know if i run into anymore problems. Thanks!!!

Query a mailing List

buratti

Registered User.

namliam

The Mailman - AWF VIP

buratti

Registered User.

namliam

The Mailman - AWF VIP

namliam

The Mailman - AWF VIP

DCrake

Remembered

buratti

Registered User.

buratti

Registered User.

namliam

The Mailman - AWF VIP

buratti

Registered User.

namliam

The Mailman - AWF VIP

buratti

Registered User.

namliam

The Mailman - AWF VIP

buratti

Registered User.

Similar threads

Users who are viewing this thread