Using vba to remove old duplicates from an Access table (1 Viewer)

lucky245 · May 18, 2019

I have 31 tables which as they are updated are appended to the Master table (bad name should call holding but for this example I called it Master).

The Master Table could have upwards of 5000 records depending on when it is updated. I need to remove the duplicates before moving the data to the Output Table.

I can automatically import the data and get it to append the data that has not been appended since last use to the Master Table.

I cannot figure how to efficiently remove the duplicate records.

In the sample database attached the key fields are Spec_number, spec_result, and date_authorised.

if the spec_number and spec_result is the same I need to delete any record (row) which is older that the newest date_authorised.

However if the date_authorised are the same but the spec_result is different then all different spec_results need to be kept.

Basically spec_Number and spec_result uniquely identify the record after which I only need the latest version (newest date_authorised).

theDBguy · May 18, 2019

Hi. Just as a test, if you run this query, would it show you all the records you want to keep?

Code:

SELECT * FROM MASTER T1 
INNER JOIN (SELECT Master.Spec_Number, Master.SPEC_RESULTS, Max(Master.Date_last_Auth) AS LastDate
FROM Master
GROUP BY Master.Spec_Number, Master.SPEC_RESULTS
ORDER BY Master.Spec_Number) T2
ON T1.Spec_Number=T2.Spec_Number 
AND T1.Spec_Results=T2.Spec_Results
AND T1.Date_Last_Auth=T2.LastDate;

Micron · May 18, 2019

Rather than remove duplicates then append elsewhere, why not just set the target table to not allow dupes via composite index (or composite PK if that is preferred) and just append? Access will balk if this is a query but if you run it anyway, it will tell you how many records were not appended and why. If it's not a query but is vba, just suppress the error. This all assumes you don't wan to pick which record of a dupe gets appended.

Going forward, the "master" table should be designed to prevent dupes if it is to be repeatedly used.

lucky245 · May 18, 2019

theDBguy said:
Hi. Just as a test, if you run this query, would it show you all the records you want to keep?

Code:

SELECT * FROM MASTER T1 INNER JOIN (SELECT Master.Spec_Number, Master.SPEC_RESULTS, Max(Master.Date_last_Auth) AS LastDate FROM Master GROUP BY Master.Spec_Number, Master.SPEC_RESULTS ORDER BY Master.Spec_Number) T2 ON T1.Spec_Number=T2.Spec_Number AND T1.Spec_Results=T2.Spec_Results AND T1.Date_Last_Auth=T2.LastDate;

Yes this works perfectly. I was playing with using a recordset and a cloned recordset but this does the job without all the code.

lucky245 · May 18, 2019

Micron said:
Rather than remove duplicates then append elsewhere, why not just set the target table to not allow dupes via composite index (or composite PK if that is preferred) and just append? Access will balk if this is a query but if you run it anyway, it will tell you how many records were not appended and why. If it's not a query but is vba, just suppress the error. This all assumes you don't wan to pick which record of a dupe gets appended.

Going forward, the "master" table should be designed to prevent dupes if it is to be repeatedly used.

I do understand where you are coming from however the problem is they are not true duplicates as the later records will be an updated version but the original record is required to ensure tests are in the system. Only a few fields will be identical and I need to be able to remove the old ones once the updated version is released. As there are 100's output each day and often only with minor changes if the system is not viewed for a few days the older records could be repeated hence my term Duplicates.

1268 · May 18, 2019

If that is the case can you not grab max?

Sent from my SM-G950U using Tapatalk

isladogs · May 18, 2019

In that case, why not update the existing records rather than append new ones and delete the original records.

theDBguy · May 18, 2019

lucky245 said:
Yes this works perfectly. I was playing with using a recordset and a cloned recordset but this does the job without all the code.

Hi. Hope you can manage from here. If not, please don't hesitate to holler. Cheers!

Mark_ · May 19, 2019

I'm with Colin for this.

Basic logic is
SELECT Master WHERE Spec_number = yourSpecNumber AND spec_result = YourSpecResult
IF error THEN ' It doesn't exist
AddRecord
Else
UpdateRecord
EndIF

That said, are you SURE there is no reason to keep previous history for a given Spec_Number? I've encountered too many cases where someone says "We don't care about X" only to have them ask later "Why can't you tell me about X??? What's wrong with your program!!". Keeping history is often far easier than deleting it as you'll often find you need to look at it just when you'r sure you won't.

lucky245 · May 19, 2019

Mark_ said:
I'm with Colin for this.

Basic logic is
SELECT Master WHERE Spec_number = yourSpecNumber AND spec_result = YourSpecResult
IF error THEN ' It doesn't exist
AddRecord
Else
UpdateRecord
EndIF

That said, are you SURE there is no reason to keep previous history for a given Spec_Number? I've encountered too many cases where someone says "We don't care about X" only to have them ask later "Why can't you tell me about X??? What's wrong with your program!!". Keeping history is often far easier than deleting it as you'll often find you need to look at it just when you'r sure you won't.

The following is in no way meant to be or sound condescending

Ok to explain a little more. The original record could output one line or more depending on current results for specimen as the output delivers 2 lines if 2 comments are added.

This could then change to no result eg only 1 line needed or there may be 3 organisms detected therefore 2 additional lines required. As this result may be out 4 or five times we could end up with 11 lines of data where only 3 are needed. A “no result yet” line is pretty useless once an organism is detected. The reason it’s over Multiple days is that some take longer to detect than others and some have mid stages so the result is output at each stage. The end result is the only result ever needed to treat a patient on and history of its stages will never be useful or they would have been kept in the main system.

Lastly eg 11 lines instead of 3 * 1000s of results will be a waste of space and no doubt slow the search process down exponentially.

Hope that helps

gemma-the-husky · May 19, 2019

I have to say that this doesn't sound right. If you have duplicate records, there is most likely going to be something amiss with your data structure. A proberly noramlised data structure would detect dupiicates before they can be entered.

lucky245 · May 19, 2019

gemma-the-husky said:
I have to say that this doesn't sound right. If you have duplicate records, there is most likely going to be something amiss with your data structure. A proberly noramlised data structure would detect dupiicates before they can be entered.

This problem is more about getting the data in from multiple automated inputs into a flat file table which will be the source file for another system.

Sent from my iPhone using Tapatalk

Using vba to remove old duplicates from an Access table (1 Viewer)

lucky245

Registered User.

Attachments

theDBguy

I’m here to help

Micron

AWF VIP

lucky245

Registered User.

lucky245

Registered User.

1268

Registered User.

isladogs

MVP / VIP

theDBguy

I’m here to help

Mark_

Longboard on the internet

lucky245

Registered User.

gemma-the-husky

Super Moderator

lucky245

Registered User.

Similar threads

Users who are viewing this thread