Finding Duplicates in two tables- MS Access 2000

Attila001122

New member
Local time
Today, 10:47
Joined
Dec 14, 2007
Messages
8
Hello,

I am just starting Access...
Or at least trying to learn it on a small project, and I need a jumpstart, if someone woild be so kind and help me out.

I need to organize my e-mail sendings.

In one table (Table Sent) I have e-mail addresses where I have already sent messages.
In an other (Table New) I have some other e-mail addresses, I am preparing to send out.

To make sure to not send mail to those who already got one, I need to compare the two tables, and delete from the Table New those addresses which are already present in Table Sent.

I guess that would be a 2-3 line SQL script...
All of my attempts so far produced no results

Any guidance would be great!

Thanks

Attila
 
Please post your database so we can play with it and provide a solution for you.
 
The zipped database

Thanks for the offer of help !

It is really simple...
I would like to learn it step by step, so I sending a small example of my tables.

So... I would like to delete from the NewMail table those e-mails which are in the SentMail table.
After that, in a separate step, I would like to add the records from the NewMail table o the SentMail table...

Thanks

Attila
 

Attachments

Based on the result of the query :qryOccureBothNewAndSent
in the attached database, you can filter out the duplicates.

Enjoy!
 

Attachments

A great step forward !

Hi Guus!

Thanks for the help.
Now I can find the duplicates, and this is a great step ahead.

It would be great if you could spend one more minute with this (but I really dont want to use your time) and point me to the direction how can I delete the duplicates from the NewMail table, and/or add the "non duplicates" to the SentMail table .

Thanks for your input, it is really good help to me to learn a bit of the basics of Access!

Attila
 
Code:
delete * from sentmail where [e-mail] in (SELECT qryOccureBothNewAndSent.[E-Mail]
FROM qryOccureBothNewAndSent
)
Duplicates are erased from the sentmail table. Replace "SentMail" with "NewMail" if you want.

Enjoy!
 
first look at your datamodel

A succesful database application is founded on a sound Entity Relationship Model (ERM).
When i took a look at your database, it occurred to me that you have 2 tables for one entity: emails. They are distinguished by a property, namely if they are sent or not. Furthermore there is no information about the email itself. There is just a reference to an email address. How will you ever distinguish which email you are dealing with?
Another issue is normalization. In designing a data model, it is adviced to mention each bit of information only once. So if there is an entity, like "email address", you store it once and when you refer to it, you use a unique identification or primary key.
If i understand the functional needs of your application well, my advice would be to have one table for email addresses (e.g. EMAD) and one for emails (e.g. MAIL). The table fields could be as follows (this is of course a minimal example):

EMAD:
emad_id (long) --> primary key
emad_text (text) --> the email address

MAIL:
mail_id (long) --> primary key
mail_subject (text) --> the subject
mail_date (date) --> the composition date

Finally you will have to relate the emails to the addresses. Since we are dealing with a so called many to many relationship (1 email can be sent to many addresses and 1 address can receive many emails) you will need a junction table. Lets call it ADMA (from Address-Mail). The fields will be something like:

ADMA:
emad_id (long) --> primary key
mail_id (long) --> primary key
adma_sent (date) --> the send date

The primary key involves two fields, which therefore guarantees uniqueness, so each email can only be combined once with a specific email address

The relationship between these 3 tables can be displayed as follows:
EMAD <-- ADMA --> MAIL

Of course, it is impossible to give a complete introduction into the topic of Database Design, but i hope i have wettened your appetite ... just look on the internet. If you need any more specific help on this subject, please let me know
 
Last edited:
Sorry about the wrong naming

Hello,

Thanks for the very detailed answer !

I guess you become the victim of my shameful habit of clumsy naming...
Sincere apology for that !

I am talking here rather about email addresses and not really the email messages...

The main purpose of this project is to collect all the e-mail addresses scattered on many folders, computers and e-mail addresses of new visitors.

I am trying to consolidate all of them into a new database of opt-in news letter receivers, while continously sending out messages...

To avoid becoming a spammer, I would like not to send the same messages to the same address...

So I made a simple table with just the e-mail addresses.
Before I can add the new ones, I would like to filter out the duplicates.

The process would like to look like this:

Check the "new" batch of addresses against the "old" table, (the table what has the addresses where the intro e-mail was sent).

If there are any duplicates, delete them from the new batch...

Send the new e-mail intro to the "new" batch.

Add these email addresses to the "sent" (old) table - with the addition date...

I am using Access for quite a while, but rather on a passive way, with some taught-myself way...
Inevitably, I reached the limit what I can do myself.

Therefore I am grateful for any input what helps me to get forward with my project, and I do feel lucky to find someone who's knowledge is so much superior to mine, and willing to share it.
Yes, I got my appetite to do it "properly"...
(And it's limit only your patience)

Thanks again,

Attila
 
The process would like to look like this:

Check the "new" batch of addresses against the "old" table, (the table what has the addresses where the intro e-mail was sent).

If there are any duplicates, delete them from the new batch...

Send the new e-mail intro to the "new" batch.

Add these email addresses to the "sent" (old) table - with the addition date...
Why don't you just use one table with an email address and an email_sent field (either a date or a boolean flag)? Instead of maintaining two tables, you could distinguish between the two by having the email_sent field empty or not. I think that would save you a lot of unnecessary work.
 
Last edited:
Why don't you just use one table

I am always happy to listen to those who obviously know more than me...

Maybe I am tackling the problem from the wrong angle...
Or just simply having trouble to express myself clearly. (English is not my first language)

But I still have problem with the idea of keep adding email addresses to a single table without checking if they are already exist there...
This way I can end up with multiply occurence of the same email addresses, some with a "sent" flag, some without...

I guess I would not "maintaining" two tables as such, but having a "temporary parking" for e-mail addresses for those which should get into the main table...

Getting ONE table without duplicate email addresses and eventually sending an Intro message IS the aim of the project...
To build this while sending messages and checking for duplicates I only can imagine with two tables.

My problem is not how to set the "sent" field in a table, boolean or date.
The aim is NOT TO SEND emails more than once to the same address, because nothing stops duplicates to get into the (single?)table.

Thanks for your patience

Attila
 
When you have a candidate email address to your table first check if the address is already in the table. If it is don't add it. If it is not then add it. Where's the problem?

To find existing duplicates use the Query builder wizard to check for duplicates in the email address field.

Once you know which are duplicates you can delete them so there is only 1 occurence of each address.
 
Checking for multiple duplicates

When you have a candidate email address to your table first check if the address is already in the table. If it is don't add it. If it is not then add it. Where's the problem?

To find existing duplicates use the Query builder wizard to check for duplicates in the email address field.

Once you know which are duplicates you can delete them so there is only 1 occurence of each address.

Where is the problem?
Surely, my rather scattered knowledge of Access.

Yes...

Having a candidate e-mail (one) and check it out is not a big deal...
Having 58 today.... 27 tomorrow... is a different story...

Where do you put them before adding to the table ?
 
Where is the problem?
Surely, my rather scattered knowledge of Access.

Yes...

Having a candidate e-mail (one) and check it out is not a big deal...
Having 58 today.... 27 tomorrow... is a different story...

Where do you put them before adding to the table ?
The numbers you mention is not a problem. All you need to do before adding them is to use a temporary table to store them in and then search your existing table programmatically to check if they are already present.
 
work with one table!

You should really set it up as simple as possible. Why taking the trouble of working with 2 tables, comparing duplicates, and deleting them? If i understand your functional description well, this is not necessary.

Just try out the following. Create one table for all your email addresses (EMAD), with the following fields:
emad_id (autonumber) = primary key
emad_address (text, mandatory)
emad_sent (date, optional)

Make a unique index on the emad_address field. This will ensure that you cannot insert the same email address twice.

To obtain a list of all email addresses that did not receive an email yet:
Code:
SELECT EMAD.emad_id, EMAD.emad_address, EMAD.emad_sent
FROM EMAD
WHERE (((EMAD.emad_sent) Is Null));

To update the email_sent date to the current date after you sent them an email:
Code:
UPDATE EMAD SET EMAD.emad_sent = Date()
WHERE (((EMAD.emad_sent) Is Null));

Create a dummy table DUAL with 1 field and 1 record (this is because we need a table reference in the following INSERT INTO statement to attach the WHERE clause).

To programmatically add email addresses from a text file (i created a test form with a button to activate this code):
Code:
Option Compare Database
Option Explicit

Private Sub but_test_Click()
    Dim str_address As String
    
    Open "addresses.txt" For Input As #1
    Do While Not EOF(1)
        Line Input #1, str_address
        DoCmd.RunSQL _
        "INSERT INTO emad ( emad_address ) " & _
        "SELECT '" & str_address & "' As Address " & _
        "FROM dual " & _
        "WHERE NOT EXISTS " & _
        "(SELECT emad_id FROM emad WHERE emad_address = '" & str_address & "');"
    Loop
    Close #1
End Sub

Before you run it it is advisable to disable confirmation of action queries (MS Access menu: Options, tabpage Edit/Find)
If you do not want to disable confirmation of action queries, there is another option: add a reference to Microsoft DAO 3.6 Object Library,
(VB menu Tools:References) and add the following code to the procedure:
Code:
Dim db as Database
Set db = CurrentDb()
... and replace the DoCmd.RunSQL statement with db.Execute

Success!
 
Last edited:
It seems to put me on the right track !

Hello everybody !

Thanks for all of your efforts, it seem to got me started on this and give me some new ideas !

I am sure I will be back for more advice, and it is is nice to know that there are some really professional guys out there who are not only good but helpful too!

Thanks again !

And if I wont bothering you soon, have a very nice Christmas or any way or name you celebrate the New Year period !

Attila
 
To prevent duplicates, you can apply an index, no duplicates to the field that holds the email address.
 

Users who are viewing this thread

Back
Top Bottom