Checksum with SQL

liddlem · Feb 17, 2017

Hi All
In order to ensure that I have clean(er) data in my address table, I am trying to ensure that the same address is not captured twice. (Impossible, I know, but I HAVE to try - 17 different spellings for the same town is getting ridiculous!)

For this, I have tried to normalise just about every component of an address and put as much of it as I can into drop-down boxes. (Unfortunately, users still need a 'free-text' field to insert organisation names etc, but I cant help that.)

Anyhow, in the SQL table has a field where I can store a CHECKSUM value for a combination of all the fields.

The question is, how can I (from and Access 2016 front end) call the 'Checksum(field1, field2,. . . fieldn) in SQL-T to do the calculation for me?

Thanks

liddlem · Feb 24, 2017

I got a number solution
https://en.wikibooks.org/wiki/Visual_Basic_for_Applications/String_Hashing_in_VBA

Galaxiom · Feb 24, 2017

I wouldn't waste time implementing it because what are effectively duplicates rarely match absolutely perfectly across all fields.

Most developers are more concerned with recognising values for names and addresses that are almost the same.

liddlem · Feb 26, 2017

Hi Galaxiom
Normally, I would agree with you.
However, the school that I am working for is 100% reliant on govt funding.
Recently, we were asked to provide a physical address that is broken down to the following parts. (This is a part of my table structure...)

Code:

CREATE TABLE [dbo].[CMTY_Address](
	[ID_CMTY_Address] [int] IDENTITY(1,1) NOT NULL,
	[ID_CMTY_Postal] [int] NOT NULL,
	[Lot_Section_Portion] [varchar](50) NULL,
	[Building_Type] [varchar](20) NULL,
	[UNIT_FlatNo] [varchar](20) NULL,
	[HouseNumber] [varchar](20) NULL,
	[Property_commune_Name] [varchar](100) NULL,
	[Street_Name] [varchar](100) NULL,
	[Street_Type] [varchar](50) NULL,
	[CAMP_Community] [varchar](50) NULL,
	[Suburb_LOCALITY] [varchar](50) NULL,
	[PostCode] [varchar](6) NULL,
	--[City] [varchar](100) NULL,
	[St_te] [varchar](3) NOT NULL,
	[NearestCrossRoadOrComment] [varchar](100) NULL,

The last two occasions that analysed our existing data and broke it down to this, it took me 2 weeks.

The existing DB is structured that each each family member is connected to their OWN (individual) address record. On average, we have 8 people living at the same address.
Also - (thanks to my conscientious users - NOT!) I have 17 different ways just to spell 'Alice Springs'. I lost count of how many ways there are to record 'Unit 4/17 Foo Street'
I HAVE to change both the data structure (Many users can be connected to the same address) and also the way that the info is captured.

Creating an address key, based on the concatenation of all the above fields proves to be too long to use as a key.

The checksum is the solution for this problem.

Thanks for you input anyhow..

Galaxiom · Feb 26, 2017

liddlem said:
Also - (thanks to my conscientious users - NOT!) I have 17 different ways just to spell 'Alice Springs'. I lost count of how many ways there are to record 'Unit 4/17 Foo Street'

The check sum will recognise precisely none of these as duplicates. You are barking up the wrong tree if you think the checksum is addressing anything to do with this problem.

You should first be looking for near matches on individual parts of the addresses then combining the results to give a result for the whole address.

I recommend the use of the Damerau-Levenshtein distance applied to street and city names to recognise spelling mistakes. The results of different parts can be multiplied together to give a distance of the whole address. These can also be weighted before finding the product.

See this thread for a simple usage example.

Checksum with SQL

liddlem

Registered User.

liddlem

Registered User.

Galaxiom

Super Moderator

liddlem

Registered User.

Galaxiom

Super Moderator

Similar threads

Users who are viewing this thread