removing duplicates feom a csv import

SeriesIII · Sep 9, 2008

We have a customer who sends us an email with all our jobs in a csv file. The problem with this is that this is a list of all the open jobs, so each day contains duplicates of jobs that are still being worked on, rather than just new jobs. They do at least contain a unique ID. I have been talking to their IT bods and have suggested many solutions, but they don't want to go for them. Finally, I have been told to work with the data I have.

So, I need to be able to import the data (with minimal user intervention), less the duplicates. This is the solution I am considering;

Set up 2 identical tables (importtemp and importlog). Set the customers job number as a primary key in import log only. Add an additional yes/no field in import log call 'imported', or similar, with 'no' as the default value.

The procedure would work as follows;

Clear the import temp table

Import the csv data into import temp. Append it to import log, which should only append new records that don't have a corresponding primary key (the unique ID).

Append all records with a 'no' in the imported field into the main database table as required, carry out any required manipulation, etc. Set all the records imported field to 'yes' in import log table using a query.

I have used two tables because if I just import the csv directly into the import log table, the 'failed' ones (i.e. the duplicates) will result in many 'import error' type of tables clogging up my database.

This doesn't appear to be a very elegant solution, as I now have the same data (all be it in a non manipulated form) in two places in my database, import log and the main table. But I can't think of a better way of doing this.

Any advice appreceated.

georgedwilkinson · Sep 9, 2008

I do this all the time. You should set this up semi-manually until you get the process down pat, then you can automate it.

Set up a table that exactly duplicates your incoming data, but with an added autonumber PK (you may not need this if you're sure they're sending a unique key along with the data).

If your database is normalized correctly, the input table won't be a 1:1 match so you'll need to update your data in phases. That means you'll need a way to keep track of the status of the updates in the input table. I create a status table with a FK from the input table. Initially, this will have no records.

First, run an insert query on the status table, inserting new rows for each row in the input table that is already updated IN the "main" database (you may be able to skip this step). Then, run another insert query on the same table for all fields where the data is NOT IN the "main" database.

Now, run an insert query on the first "main" database table for all records in the input table that have a status record with a type of "need to insert into main" or whatever you chose. If all is OK, change the status and do the same for the next table.

When you're done, delete everything from the status and input tables.

Some things you'll need:
TransferSpreadsheet()
IN
NOT IN
DoCmd.RunQuery

removing duplicates feom a csv import

SeriesIII

Registered User.

georgedwilkinson

AWF VIP

Similar threads

Users who are viewing this thread