append distinct records

slimjen1 · Jun 5, 2013

Hi all. Using a database 2010. I have a monthly import from a spreadsheet to one of my tables. Sometimes it has the same previous data from prior month or months that has not changed. I don't want to keep appending this data to my table if none of the fields have changed.

Is there a way to append data to table only if fields have changed in an append query? Ex. (Name, Address, PhoneNum, Rate) If all fields are the same; don’t append. If one or more fields are changed; append.
Thanks

CJ_London · Jun 5, 2013

Is there a unique ID in the new data which can be found in the old data? If not, can one be created by combining fields? (may be a combination of dates, ids etc)

If the answer is yes then create a query which inner joins the new data to the old data which will give you exact matches. (If the answer is no then it can't be done since you cannot associate a new data record with an old data record)

Then for all the other fields in the new table put the field to the field row in the query design and set the criteria to <> the equivalent field in the old table. These are 'OR's so within the query builder (one per line) you are limited to 9 criteria but you can then go to sql view and type in the rest.

This query will then only return records in the new data which match on the 'unique fields' but differ in some other field.

It is then very straight forward to change it to an append query and completing the append row.

Sorry, just realised it doesn't answer your question but I'll leave it in since it may answer another!

To answer your question you still need to be able to create a unique ID as above and link the tables as above but this time use a left join between new and old data (i.e include all new data records and only old data where it exists)

Then for the criteria only include a single field from the old data and set its criteria to equal null - this will limit the records returned from the new data to those where it does not exist in the old data. Then convert to an append query and assign as before.

Note: if the tables are quite large significant improvement in performance can be gained by indexing all 'unique' fields (with duplicates set to OK) in the old table. Further improvement can be had by copying the excel data to a temporary table, also indexed. In the case of a similar scenario a few years ago query run time was reduced from several hours to about 15 minutes (it was dealing with about 3m records)

slimjen1 · Jun 6, 2013

Thank you for replying. The records append each month in a table with an autonumber id and each record has an append date. I'll try your suggestion tomorrow. Thanks

slimjen1 · Jun 11, 2013

I think i've stumbled on a way to this by accident. I needed to get the latest date for the same table and had to group by and called for the max date and it gave me the results I was looking for! Apparently grouping only gives me one record if all of the records for that particular unique id are the same and any additional records for that same unique id if each record is different. I am still reviewing the data but it looks good. What do you think.

CJ_London · Jun 12, 2013

As an alternative to grouping you can use SELECT DISTINCT which can be faster.

The benefit of grouping is you can sum, count etc. relevant columns - however you won't be able to do this if you are simply grouping to 'remove' duplicates since you will end up double counting the results.

The consequence of what you suggest (or using SELECT DISTINCT) is that your data will contain duplicates and it is better design not to have them so I would still recommend my original post.

slimjen1 · Jun 14, 2013

Point taking. I am trying your suggestion and still having problems. I think im just getting frustrated. I'll try again tomorrow.
Thanks

slimjen1 · Jun 14, 2013

Ok. I think I followed your instructions correct and it did not append any records. I know for a fact that there are records to append. The unique field in the old data table is SSN:

Code:

SELECT qryMed.SSN, qryMed.[Client ID], qryMed.[Participant Type], qryMed.Relationship, qryMed.[Policy Holder SSN], qryMed.[First Name], qryMed.[Last Name], qryMed.[Address 1], qryMed.[Address 2], qryMed.City, qryMed.State, qryMed.Zip, qryMed.[Date of Birth], tblMasterData.SSN
FROM qryMed LEFT JOIN tblMasterData ON qryMed.SSN = tblMasterData.SSN
WHERE (((tblMasterData.SSN) Is Null));

I ran it as a select query first to see if I would get any records and there were none. It wouldn't because of the unique field being SSN right? So how do I get the updated records? I do have a date field so that if say the address has changed; I would like it to take the record for the latest date to replace the existing record.

From New Table:

Code:

111-11-111  John		Doe		1 Some Ct.	Fish	FL	1-1-2013
111-11-111  John		Doe		1 Some Ct.	Fish	FL	2-1-2013
111-11-111  John 		Doe		2 Green Ct.	Avon	CO	5-30-13

To OldTable(I want to append to this table)

Code:

111-11-111  John		Doe		1 Some Ct.	Fish	FL	1-1-2013

I want to append the record with data from the new table with the date 5-30-13 because something has changed for this SSN. The reason the New table have the two records because it’s a history table that is updated with records each month and sometimes there are no changes but has a date it was updated. The old table would not have repeating SSN because it’s the unique field if I can help it. Maybe I am getting mixed up because of what I am trying to accomplish. Probably going about it the wrong way. It seems like I would need an “if statement”. Please help.

CJ_London · Jun 14, 2013

You need to look at the first part of my original post - the bit that didn't answer your question

- but is relevant to what you are asking now

In your example, it won't bring through the first two records because all the fields are identical but it will bring through the last record.

However if line 2 was different to line 1 and you only wanted to bring through line 3 you will need to add a dlookup to limit the records in the new table. This would be:

WHERE NewTable.myDate=dmax("myDate","NewTable","SSN=" & NewTable.SSN)

I'm assuming SSN is numeric.

using these domain functions can slow things down, unfortunately I don't think you can use subqueries for an update query (I can check if performance becomes an issue)

slimjen1 · Jun 14, 2013

Ok. SSN is text since its the social. Is this wrong? I can change it. Then in your original post I think you said I need a inner join but put <> in criteria field. I know where to put the <> but what comes after it? Do I repeat the field:

Code:

INSERT INTO tblAddress ( SSN, [Address 1], [Address 2], City, State, Zip, SnapshotDate )
SELECT qryMed.SSN, qryMed.[Address 1], qryMed.[Address 2], qryMed.City, qryMed.State, qryMed.Zip, tblAddress.SnapshotDate
FROM qryMed LEFT JOIN tblAddress ON qryMed.SSN = tblAddress.SSN
WHERE (((qryMed.SSN)<>"SSN") AND ((tblAddress.SSN) Is Null)) OR (((qryMed.[Address 1])<>"Address 1")) OR (((qryMed.[Address 2])<>"Address 2")) OR (((qryMed.City)<>"City")) OR (((qryMed.State)<>"State")) OR (((qryMed.Zip)<>"ZIP")) OR (((tblAddress.SnapshotDate)<>"SnapshotDate"));

This throws error: type mismatch

slimjen1 · Jun 14, 2013

Sorry: I changed it to inner join. Still got the error

slimjen1 · Jun 14, 2013

I am getting close. Something is still wrong because i am getting repeating data thats the exact same. I think something is wrong with my dlookup:

Code:

SELECT qryMed.SSN, qryMed.[Address 1], qryMed.[Address 2], qryMed.City, qryMed.State, qryMed.Zip, tblAddress.SnapshotDate, tblAddress.SSN
FROM qryMed INNER JOIN tblAddress ON qryMed.SSN = tblAddress.SSN
WHERE (((qryMed.SSN)<>[tblAddress].[SSN]) AND ([qryMed].[SnapshotDate]=DMax("SnapshotDate","qryMed","SSN=" & [qryMed].[SSN])) AND ((tblAddress.SSN) Is Null)) OR (((qryMed.[Address 1])<>[tblAddress].[Address 1])) OR (((qryMed.[Address 2])<>[tblAddress].[Address 2])) OR (((qryMed.City)<>[tblAddress].[City])) OR (((qryMed.State)<>[tblAddress].[State])) OR (((qryMed.Zip)<>[tblAddress].[Zip]));

I need help please.

slimjen1 · Jun 17, 2013

help please

CJ_London · Jun 17, 2013

The bit in red is mutually exclusive. Also is ssn is text you need to use quotation marks

Code:

SELECT qryMed.SSN, qryMed.[Address 1], qryMed.[Address 2], qryMed.City, qryMed.State, qryMed.Zip, tblAddress.SnapshotDate, tblAddress.SSN
FROM qryMed INNER JOIN tblAddress ON [COLOR=red]qryMed.SSN = tblAddress.SSN[/COLOR]
[COLOR=red]WHERE (((qryMed.SSN)<>[tblAddress].[SSN])[/COLOR] AND ([qryMed].[SnapshotDate]=DMax("SnapshotDate","qryMed","SSN=" & [qryMed].[SSN])) AND ((tblAddress.SSN) Is Null)) OR (((qryMed.[Address 1])<>[tblAddress].[Address 1])) OR (((qryMed.[Address 2])<>[tblAddress].[Address 2])) OR (((qryMed.City)<>[tblAddress].[City])) OR (((qryMed.State)<>[tblAddress].[State])) OR (((qryMed.Zip)<>[tblAddress].[Zip]));

Try this - I've also built in protection against nulls

Code:

SELECT qryMed.SSN, qryMed.[Address 1], qryMed.[Address 2], qryMed.City, qryMed.State, qryMed.Zip, tblAddress.SnapshotDate, tblAddress.SSN
FROM qryMed INNER JOIN tblAddress ON qryMed.SSN = tblAddress.SSN
WHERE [qryMed].[SnapshotDate]=DMax("SnapshotDate","qryMed","SSN='" & [qryMed].[SSN] & "'") 
AND 
(nz(qryMed.[Address 1])<>nz([tblAddress].[Address 1]) 
OR nz(qryMed.[Address 2])<>nz([tblAddress].[Address 2]) 
OR nz(qryMed.City)<>nz([tblAddress].[City]) 
OR nz(qryMed.State)<>nz([tblAddress].[State]) 
OR nz(qryMed.Zip)<>nz([tblAddress].[Zip]));

slimjen1 · Jun 18, 2013

Ok. I'll try this first thing tomorrow. I really appreciate your help!

slimjen1 · Jun 18, 2013

Great; besides the fact that it took 10mins to run; it gave me the results expected with the exception of the max date. Ex. The record that changed had a date of 1/1/2013 but it gave the date of the previous record 12/31/11. Could it be that dmax only looks at the month instead of the month, day and year? If I could get the date the record actually changed; all would be perfect! Thank you so much for your help.

CJ_London · Jun 18, 2013

It is not clear from your post whether all records are picking up the wrong date or just one, if one then I would suspect the data.

Re time to run, domain functions can slow things down if you have a lot of data so you could try a sub query instead (see below). I would also refer you to the last paragraph of my first post - I had a similar situation to yours, but with about 3m lines of data in a text file - running time was reduced from 16-17 hours to 20 minutes by importing the file to a temporary db and indexing - as a minimum you need to index SSN and Snapshotdate - ultimately in both tables, source and destination.

Code:

... 
WHERE [qryMed].[SnapshotDate]=(SELECT Max([SnapshotDate]) FROM qryMed AS Tmp WHERE SSN=[qryMed].[SSN]) 
AND
...

Could it be that dmax only looks at the month instead of the month, day and year?

No! but it may have something to do with excel formatting - if you are not going to follow my advice above, try saving the excel file as a text file and linking to that instead.

slimjen1 · Jun 18, 2013

GM Cj! Thank you for your quick reply. I am ready to put this to rest. Yes; all records that returned were the incorrect date (12/31/11 over 1/1/13). I am confused about when you say the excel format??? I am using ms 2010. I have several years of records with data against multiple records for the same ssn. As for your suggestion about the temp db? Are you suggesting I use the query in a temporary database and import it back to the original db or am I misunderstanding. The file it searchs does have about 15000 records to search. I will make sure I have the table indexed on both sides. Thanks so much!

CJ_London · Jun 18, 2013

I am confused about when you say the excel format??? I am using ms 2010

In your first post you said you were importing from a spreadsheet

As for your suggestion about the temp db? Are you suggesting I use the query in a temporary database and import it back to the original db or am I misunderstanding

I've been working on this basis:

The principle is import the data from excel to a temporary table then use a query to find data in the temporary table that is not in the main table (which was the first part of this thread) and then run another query (the current one under discussion) to update the main table where data in the temporary table (latest view) is different from the main table

slimjen1 · Jun 18, 2013

O Now I understand. Will try this. I am still curious why I am not getting the latest date. It just seems based on the returning records all of them return the 12/31/11 instead of the 1/1/13 data. 12 is higher than 1 so I was guessing it had something to do with this. Thanks!

CJ_London · Jun 18, 2013

Dates are actually stored as double type numbers - the value to the left of the decimal relates to the date and the value to the right relates to the time (expressed as a ratio of 24hrs*60mins*60secs) so .1 =10% of 86400=8640 seconds=144 mins or 2 hrs 24mins.

Excel/Access may be interpreting 12/31/11 as a string due to incorrect formatting (can happen if the first line is blank or a 'wrong' date structure so the column is interpreted as a default of string). And in string terms, 12/31/11 is greater than 1/1/13.

One way you can check this is to open the linked table view of the excel spreadsheet in design view and look at the datatype. - The fix is to correct the spreadsheet or write a special function in Access to force a correction. - but easier to convert excel to a text file.

append distinct records

Registered User.

Super Moderator

Registered User.

Registered User.

Super Moderator

Registered User.

Registered User.

Super Moderator

Registered User.

Registered User.

Registered User.

Registered User.

Super Moderator

Registered User.

Registered User.

Super Moderator

Registered User.

Super Moderator

Registered User.

Super Moderator

Similar threads

Users who are viewing this thread