query avoiding duplicates

thewiseguy

Registered User.
Local time
Today, 21:24
Joined
Apr 14, 2004
Messages
56
hi -

sorry if this has been answered before but i couldn't find it exactly...

i have a table with data similar to so:

ID number Date Area Region
26784 14/2/03 Lower 2
37772 27/8/03 Upper 1
37772 1/3/04 Upper 3
50101 12/12/03 Lower 2
50101 4/2/04 Lower 2
50101 10/5/04 Lower 1

i would like to use a query to select only entries that are not duplicates on ID number and also the most recent. i.e the first, third and last rows of the sample data above. is there any way to do this...?

many thanks in advance...

thank you very much in advance...
 
In query design view, go to the View menu and choose totals. You'll see a new row in the QBE grid called "Total:". Under the ID field choose "Group By" in the "Total:" row. Under the date field choose "Last". For the Area and Region fields also choose "Last". Run the query.
 
thewiseguy said:
hi -

sorry if this has been answered before but i couldn't find it exactly...

i have a table with data similar to so:

ID number Date Area Region
26784 14/2/03 Lower 2
37772 27/8/03 Upper 1
37772 1/3/04 Upper 3
50101 12/12/03 Lower 2
50101 4/2/04 Lower 2
50101 10/5/04 Lower 1

i would like to use a query to select only entries that are not duplicates on ID number and also the most recent. i.e the first, third and last rows of the sample data above. is there any way to do this...?

many thanks in advance...

thank you very much in advance...
Code:
Select [table].[id number], max([table].[date]) as [MxDate], [table].[area], [table].[region]
From [table]
Group by [table].[id number], [table].[area], [table].[region]


Vince
 
thanks to both dcx and vince -
can i ask is there a difference between using max and last?
 
You have raised a good point. Should we use Last or Max?

The following is taken from Access's Help regarding the First and Last aggregate function:
The First and Last functions are analogous to the MoveFirst and MoveLast methods of a DAO Recordset object. They simply return the value of a specified field in the first or last record, respectively, of the result set returned by a query. Because records are usually returned in no particular order (unless the query includes an ORDER BY clause), the records returned by these functions will be arbitrary.

Since Last will arbitrarily return one record and since date is internally stored as a double precision number, we should use Max.


To correctly return the three records in your sample data using Max, we normally do it with a series of two queries. You can run the second query in the attached database, which contains your sample data.

(In order to show that Max can correctly return the three records based on the dates, the positions of the 5th and the 6th records in your data were deliberately switched in the table in the database.)
.
 

Attachments

Last edited:
Very cool Jon. Thanks for the info.
 
Jon K said:
To correctly return the three records in your sample data using Max, we normally do it
with a series of two queries.

Vince has offered a query using Max:
Code:
Select [table].[id number], max([table].[date]) as [MxDate], 
[table].[area], [table].[region]
From [table]
Group by [table].[id number], [table].[area], [table].[region]

Jon, why do we need two queries?

DLB
 
DLB,

Vince's query will return duplicate records for ID numbers 37772 and 50101, which is not what TheWiseGuy wanted.


To use a one-query approach correctly, you will have to use a subquery, for example:
SELECT Table.*
FROM

WHERE [Date] = (Select Max([Date]) from
as S where S.[ID number]=
.[ID number]);

This kind of subquery is optimised poorly in Access. (If you run it against a table of say 2,000 records, you can see how much longer this query will take to run than the two-query approach used in my database.)


Alternatively, you can use a subquery like this:
SELECT Table.*
FROM

WHERE [ID number] & " " & [Date] in (Select [ID number] & " " & Max([Date]) from
group by [ID number]);

This query runs faster than the former but is still slower than the two-query approach. (You can easily see their differences when you increase the number of records to 5000 or 8000.) And some people find this query less easy to understand or to construct in query Design View.


The two-query approach is the fastest and is the easiest to build in Design View.
.
 
Last edited:
Thank you very much, Jon.

I admit I have difficulty understanding the second subquery, too. :o

DLB
 

Users who are viewing this thread

Back
Top Bottom