Fuzzy Matching Address

mosh

Registered User.
Local time
Today, 14:19
Joined
Aug 22, 2005
Messages
133
Hey Guys,

I've created a couple of scripts that cleans the address data from dataset 1 and 2 and then matching on a LIKE function within the join. But I'm having problems with certain values not matching due to the limitations of the LIKE function.

Match Code:

SELECT distinct
,ADDRESS1
,[Postal_Code]
,[post_code]

INTO [SOURCE].[dbo].[SC1_3]
FROM [SOURCE].[dbo].[SC1_2] INNER JOIN [SOURCE].[dbo].[SCD_NEW]
ON sc1_2.[Postal_Code]=scd_new.[Post_Code]
AND
SCD_New.ADDRESS1 LIKE '%'+SC1_2.Street1+'%'
WHERE SUBSTRING(SCD_New.ADDRESS1,CHARINDEX(SC1_2.Street1,SCD_New.ADDRESS1)-1,1)=' '
OR
CHARINDEX(SC1_2.Street1,SCD_New.ADDRESS1)=1
ORDER BY ID


But I get the following examples where it doesn't work that well:

Address_1 Ad1_Cleaned DATA_2_ADDRESS
77-79 CANON ROAD 77 79 CA 77 CANON
GRD FL F LN T 27 REDCLIFFE GD 27A REDC 27 REDCLIFFE
1-5 Shower Street 1 5 SHOW 1 SHOW ST


Thanks for your help!
 
You may find MarKK's observations HERE useful
 
Guys,

Sorry, forgot to mention it's sql server 2008.
 
mosh,

People have made careers out of cleaning up names and address data. There are many algorithms and routines to assist, but unlikely 1 to do exactly what you want in all cases.

You may have a few routines that are run in series, each focusing on specific issues.

I'm not sure where you are located, but many Post Offices/Postal Services have guidelines/standards related to street types and abbreviations. That may be one way to approach inconsistent terminology.

see http://www.canadapost.ca/tools/pg/manual/PGaddress-e.asp for examples.

You may also find these posts of value

Good luck.
 

Similar threads

Users who are viewing this thread

Back
Top Bottom