Compare same data - delete

blander · Aug 15, 2006

Hi guys,

I have 23k records

I have run the wizard to find all the duplicates according to one field.

So i have the results, except that some of the records contain nothing within the searchable field - so it returns all the records with nothing in the searchable field as duplicates.

How would i modify the query to perform the search but exclude the records with no data.

The SQL query below is the default wizard output for find duplicates, how would i change this??

SELECT Master.EMAIL, Master.MemID, Master.MEMBERNO, Master.FIRSTNAME, Master.LASTNAME, Master.POSITION, Master.ORGAN, Master.ADDRESS1, Master.ADDRESS2, Master.ADDRESS3, Master.ADDRESS4, Master.COUNTY, Master.COUNTRY, Master.POSTCODE, Master.SALUTATION, Master.MOBILE, Master.TELEPHONE, Master.WORKPHONE, Master.FAX, Master.RATE, Master.CLASS, Master.ADDREF, Master.DATEJOIN, Master.WEB, Master.Field26
FROM Master
WHERE (((Master.EMAIL) In (SELECT FROM [Master] As Tmp GROUP BY [EMAIL] HAVING Count(*)>1 )))
ORDER BY Master.EMAIL;

Another thing is, once i have the result... which query would i use to actually delete the duplicate data??

Cheers, and advance thanks!!

FoFa · Aug 15, 2006

I assume they are NULL (not blank) so a [myfield] IS NOT NULL would eleminate those

blander · Aug 15, 2006

Thanks for you help

but it looks like they are blank fields and not null - (imported from excel where some of the fields were blank)

would i put

IS NOT =" ";

ok so that dosent work...

any ideas??

Cheers

Ben

FoFa · Aug 15, 2006

Have you tried the NOT NULL code?
If NOT NULL does not work, you can try:
Setup a check (criteria only column) as such:
CHKIT: TRIM([MyColumnName])
then in the criteria you can use <> ""
OR
CHKIT: LEN(TRIM([MyColumnName]))
then in the criteria you can use <> 0

These will work if it is spaced filled. If it has special characters or some such nonsense as that, well that's a bit tougher. Try those first.

blander · Aug 15, 2006

Thanks FoFa

ok so i'll explain the entire process of what i am trying to achieve..

(This is a bit of a long story)

I have contacts - heaps of them (25k)

They all belong to a certain category - (like Mail List (one category) and Northern Contacts (another category) for example)

Each contact may or may not belong to one or more category

So i have collected these contacts into an excel spreatsheet..

Name address phone email etc

Then assigned each contact with a category id

So there are duplicate contacts within this spreadsheet which have a different category id

Then i have placed this data into access...

Proceded to Normaize the data 1NF and 2NF

So I have 3 Tables

Master (contains all the persons contact details Name address ect as well as a member ID as the primary ID)

CatDes (Contains the CatDesID as the primary key and the category description)

CatBel (Contains CatBelID as PK and CatID (FK to CatID in CatDes Table) and MemID (FK to MemID in the Master table))

So, i still have duplicates within the master table as the one member can belong to more than 1 category.

Confused - so am I!!

So for Example

001 Bob Smith Add1 Add2 Email
002 Bob Smith Add1 Add2 Email

Are both in the master table because MemID 001 belongs to CatID 004 and MemID 002 belongs to CatID 008

I know that i can search for all duplicates according to an email address.. but it will show the same member but from different categories... due to the member id being associated to only one category ID and not members to multiple categories within the CatBel table..

So what i really want is for one of the memberID records to reference to the CatBel Table..

I know that many of my problems started when putting the data in the excel sheet.. but hindsight is a beautiful thing..

Any ideas,

Cheers

Ben

FoFa · Aug 15, 2006

OK, try this:
Create a work table to store MemberID and Email (if Email is the field you are detecting duplicates on). Populate this table with a SELECT MIN(Memberid), Email from Master group by Email. This will give a unique list of members basically. Now join this work table to your CatBel table on Email and update all the CatBell records with the MemberID from your worktable where the Emails match (basically this will make them all point to the one master record per unique Email). Then you can delete any master records that do not exist in the work table. Then delete your work table.
That said, VERIFY each step to the nth degree before moving on. Backup your DB in between each step for safety AND BEFORE YOU START.

blander · Aug 15, 2006

Thanks FoFa i'll give it a go..

Really appreciate the help

blander · Aug 16, 2006

Thanks FoFA,

But;

The CatBel table does not contain the Email field since the CatBel table is made up of the MemID and the CatID, the link is made between the MemID in the Members table and CatID in the Category Table... hence it contains only three fields - CatBelID, MemID and CatID

The CatID Table was created to fix the many-to-many problem that one member can belong to many categories..

And some of the contacts dont contain email data, but they do belong to some categories... so doing this would mean than all contacts that do not have email addresses belong to the one category.. which would not be the case..

Any ideas??

Cheers

Ben

FoFa · Aug 16, 2006

You can join the Catbel table to the master to get the Email address and use that. You just have to exclude EMAIL IS NULL is all.
This NULL emails you mayhaps to fix manually, depending on how good a comparison you can do on say NAME or something else.

blander · Aug 25, 2006

Cheers FoFa, I really appreciate all you help.. you have saved me heaps and heaps of time and a headache!!

Compare same data - delete

blander

Registered User.

FoFa

Registered User.

blander

Registered User.

FoFa

Registered User.

blander

Registered User.

FoFa

Registered User.

blander

Registered User.

blander

Registered User.

FoFa

Registered User.

blander

Registered User.

Similar threads

Users who are viewing this thread