Merge two tables with duplicates

stormin_norm

Registered User.
Local time
Yesterday, 22:06
Joined
Apr 23, 2003
Messages
213
My head is spinning from reading posts from every website on the best method for deleting dups. I could use some direction on the best method and sql/vba code to do this:

I have two dbs, I want to merge these 2 high school tables. Problem is each table has dups and then I want to merge as unique. I can't do a select unique because the HS_ID is different in each.
Here is an example:

A_HS (in DB1)
HSid HSname HSaddress HScity HSstate HSotherInfo
1, JB HS, 1313 Mockingbird Lane, Somewhere, NY, zaxakjdjd
2, XX HS, 1 Main Street, Here, NY, jdjdkdkd
3, JB HS, 1313 Mockbird Street, Somewhere, NY, aaaaaaaa

I have to first dedup 1 & 3 by looking only at Name, City, State keeping record 1 (doesn't actually matter which one I keep). THEN make this the 'master HS table'. so have to merge in 'address' & 'other info' after the dedup.

NEXT- Do the same thing for table B in the other db. to reduce dups internal to this db.

B_HS (in DB2)
HSid HSname HSaddress HScity HSstate HSotherInfo
1, JB HS, 1313 Mockingbird Lane, Somewhere, NY, zaxakjdjd
2, YY HS, 1 Main Street, Here, NY, jdjdkdkd
3, YY HS, 1 Main Street Suite B, Here, NY, bbbbbbbbb
200, JB HS, 1313 Mockbird Street, Somewhere, NY, ccccccccccc

THEN- Merge the B info A skipping dups, add only 'new recs' like YY

During each step I have to point back to the student and change their HSID to this new HS table. lets call it C_HS.
 
Determine what fields exactly makes a dup.
Look at a UNION query for merging into a final table.
http://www.techonthenet.com/sql/union.php

A dup is same Name, City, State

The tricky part is the id numbers are not relliable cross both db's. They do not match.
the id number may or may not be the same in both db's. So a union is good after both tables are cleaned up of dups. Then, B_HS could union append.
I just can't loose the original id#'s since I need those to 'fix' the student records.
 
I'll try the union something like:

select distinct hs.name, hs.city, hs.state into new_hs.
union both tables w/ new hs_id.

then go back and match hs_a by hs.name, hs.city, hs.state and create a cross-ref table of old id to new id.

do the same with hs_b.

that may work. I'll see. There may be some 'outliers' but lets see how that does.
 
Take a look at the attached database.

I have, however, actually made the assumption that HSname, HSaddress, HScity and HSstate give a unique combination.

If you follow the queries Step001.. to Step008.. in sequence it will combine A_HS and B_HS records (at least the name and address info) into Table HS and create a second Table, called InfoTable, containing the OtherInfo with a link to the new HsID.

Step001_Mk_tmpHS_from_A_HS_Unique
Create the table tmpHS using unique combinations of HSname, HSaddress, HScity and HSstate from A_HS.

Step002_Add_unique_records_from_A_HS
Add, or update table HS with values from tmpHS (A_HS)

Step003_Mk_tmpHS_from_B_HS_Unique
Create the table tmpHS using unique combinations of HSname, HSaddress, HScity and HSstate from B_HS.

Step004_Add_unique_records_from_B_HS
Add, or update table HS with values from tmpHS (B_HS)

Step005_Mk_tmpInfo_From_A_HS
Create the table tmpInfo using unique instances of HsId from HS and HSotherInfo from A_HS.

Step006_Add_OtherInfo_from_A_HS
Add, or update table InfoTable with values from tmpInfo (A_HS)

Step007_Mk_tmpInfo_From_B_HS
Create the table tmpInfo using unique instances of HsId from HS and HSotherInfo from A_HS.

Step008_Add_OtherInfo_from_B_HS
Add, or update table InfoTable with values from tmpInfo (B_HS)

Et voila!


I've also included HS(Empty) and InfoTable(Empty) which are empty copies of the data tables if you want to delete the HS and InfoTables and start from scratch.


I hope this proves useful.
 

Attachments

Nigel-
Excellent idea. I was starting to look into vba coding and stepping through recs. I like your technique. I am going to modify it a bit.

After A_HS is loaded into the new table, I'll mark records in B_HS which match in the new HS table. This way I can filter out all those in B which already exist in HS table.

THANKS for your time and assistance!
 

Users who are viewing this thread

Back
Top Bottom