Hi,
I bounced into a big problem with Access 2010 where I cannot seem to be able to remove duplicates from a table containing millions of entries midway through a series of queries.
The table is formed like this:
Date - Serial Number - Reading 1 - Reading 2 - Reading 3
30.7.2013 - 1122334455 - 12345 - 987654 - 654321
30.7.2013 - 1122334466 - 12345 - 987654 - 654321
29.7.2013 - 1122334455 - 12210 - 987654 - 643210
29.7.2013 - 1122334455 - 12210 - 987654 - 643210
29.7.2013 - 1122334466 - 12344 - 986013 - 453213
As you can see, there are some rows which contain exact duplicates and some are a bit different. I wish to remove those fields with exact duplicates and only the duplicated fields. Running "Find duplicates" in Access gives me about 250,000 rows with such data.
I've tried a few options already:
- I cannot use the date as primary key as there are several serial numbers. I cannot use serial numbers as primary key, because there are several dates. Using reading value as primary isn't an option either.
- Microsoft help says I should mark all duplicate values with an x and then make a delete query to get rid of all the x-marked rows. For 250,000 duplicates, that's a bit too much manual 'x-ing'.
- If I do a delete query using the Find Duplicates query as a base, it removes all 250,000 entries from that table, instead of just the 125,000 which had a duplicate elsewhere in the table.
- If I make a query which identifies duplicate data and gives me just one row for each duplicate, the delete query still deletes both entries from the original table.
- I could make a new query which would have only unique values using Totals as criteria (for instance, using First for the Date-column). However, this still leaves the duplicate values in the original table. Note that this database is already 800 MB large and new data is imported once a week, for the next decade or so. I cannot have a table get duplicates every week and leave it there.
- If I make a macro which would create the unique values table first and then deletes the old table, what happens next week when I try to import new readings? I would need to make a new macro each time I try to import new data as the table names change. Or is there a way to first run the unique numbers out, then replace the original table with the new one? With a 800MB database, this would put me dangerously close to the 2GB size limit. I wouldn't be able to use this as a part of a macro, as the database would have to do Compact & Repair each time it deletes the original table, midway through a longer series of queries.
- Having duplicates removed from the original import isn't an option either, as it comes in as a overriding excel sheet for the past 3 months, once a week.
As you can see, it's quite a pickle getting the duplicates out of the original table. This is just a small part of a very long macro, which takes about 15 minutes to complete, but due to duplicates the database is getting way too large.
Any ideas?
Cheers!
-YariLei
I bounced into a big problem with Access 2010 where I cannot seem to be able to remove duplicates from a table containing millions of entries midway through a series of queries.
The table is formed like this:
Date - Serial Number - Reading 1 - Reading 2 - Reading 3
30.7.2013 - 1122334455 - 12345 - 987654 - 654321
30.7.2013 - 1122334466 - 12345 - 987654 - 654321
29.7.2013 - 1122334455 - 12210 - 987654 - 643210
29.7.2013 - 1122334455 - 12210 - 987654 - 643210
29.7.2013 - 1122334466 - 12344 - 986013 - 453213
As you can see, there are some rows which contain exact duplicates and some are a bit different. I wish to remove those fields with exact duplicates and only the duplicated fields. Running "Find duplicates" in Access gives me about 250,000 rows with such data.
I've tried a few options already:
- I cannot use the date as primary key as there are several serial numbers. I cannot use serial numbers as primary key, because there are several dates. Using reading value as primary isn't an option either.
- Microsoft help says I should mark all duplicate values with an x and then make a delete query to get rid of all the x-marked rows. For 250,000 duplicates, that's a bit too much manual 'x-ing'.
- If I do a delete query using the Find Duplicates query as a base, it removes all 250,000 entries from that table, instead of just the 125,000 which had a duplicate elsewhere in the table.
- If I make a query which identifies duplicate data and gives me just one row for each duplicate, the delete query still deletes both entries from the original table.
- I could make a new query which would have only unique values using Totals as criteria (for instance, using First for the Date-column). However, this still leaves the duplicate values in the original table. Note that this database is already 800 MB large and new data is imported once a week, for the next decade or so. I cannot have a table get duplicates every week and leave it there.
- If I make a macro which would create the unique values table first and then deletes the old table, what happens next week when I try to import new readings? I would need to make a new macro each time I try to import new data as the table names change. Or is there a way to first run the unique numbers out, then replace the original table with the new one? With a 800MB database, this would put me dangerously close to the 2GB size limit. I wouldn't be able to use this as a part of a macro, as the database would have to do Compact & Repair each time it deletes the original table, midway through a longer series of queries.
- Having duplicates removed from the original import isn't an option either, as it comes in as a overriding excel sheet for the past 3 months, once a week.
As you can see, it's quite a pickle getting the duplicates out of the original table. This is just a small part of a very long macro, which takes about 15 minutes to complete, but due to duplicates the database is getting way too large.
Any ideas?

Cheers!
-YariLei