Problem With Aggregation and Join Queries

silversurfer19

Registered User.
Local time
Today, 19:54
Joined
Aug 20, 2008
Messages
26
I have been using aggregation queries to seperate data into those where there is a count of more than one for that particular combination of premise name, address line 1 and postcode, and those where there is only one.

Here is the code for these queries:

Code:
SELECT [InputTable].[Premises Name (Full)], [InputTable].[Address 1], [InputTable].Postcode INTO [Duplicates Only]
FROM InputTable
GROUP BY [InputTable].[Premises Name (Full)], [InputTable].[Address 1], [InputTable].Postcode
HAVING (((Count(*))>1));

and

Code:
SELECT [InputTable].[Premises Name (Full)], [InputTable].[Address 1], [InputTable].Postcode INTO [Non Duplicates]
FROM InputTable
GROUP BY [InputTable].[Premises Name (Full)], [InputTable].[Address 1], [InputTable].Postcode
HAVING (((Count(*))=1));

From the results of these, I then have two more queries which reconstruct the original records, providing me with full details of records in two seperate lists, those with duplicate addresses, and those without.

Here are the queries for these:

Code:
SELECT InputTable.[Address Source], InputTable.[External Unique ID], InputTable.URN, InputTable.[Premises Name (Full)], InputTable.[Address 1], InputTable.[Address 2], InputTable.[Address 3], InputTable.City, InputTable.County, InputTable.Postcode, InputTable.Country, InputTable.[Telephone 1], InputTable.[TPS 1], InputTable.[Telephone 2], InputTable.[TPS 2], InputTable.SOHO, InputTable.[SIC Code], InputTable.[SIC Description], InputTable.[Major Category], InputTable.[Minor Category], InputTable.Ethnicity, InputTable.[Employee Number], InputTable.[Num Sites], InputTable.[SUV Contact ID], InputTable.[Contact Title], InputTable.[Contact First Name], InputTable.[Contact Last Name], InputTable.[Contact Job Title], InputTable.[Campaign Code], InputTable.Outcome, InputTable.[Date & Time], InputTable.[Inbound/Outbound], InputTable.[Gross Revenue], InputTable.[New Premises Name (Full)], InputTable.[New Address 1], InputTable.[New Address 2], InputTable.[New Address 3], InputTable.[New City], InputTable.[New County], InputTable.[New Postcode], InputTable.[New Country], InputTable.[New Telephone 1], InputTable.[New Contact Title], InputTable.[New Contact First Name], InputTable.[New Contact Last Name], InputTable.[New Contact Job Title]
FROM [Duplicates Only] INNER JOIN InputTable ON [Duplicates Only].Postcode = InputTable.Postcode
WHERE (([InputTable].[Premises Name (Full)] = [Duplicates Only].[Premises Name (Full)]) OR ([InputTable].[Premises Name (Full)] IS NULL AND [InputTable].[Premises Name (Full)] IS NULL)) AND (([InputTable].[Address 1] = [Duplicates Only].[Address 1]) OR ([InputTable].[Address 1] IS NULL AND [Duplicates Only].[Address 1] IS NULL))

UNION

SELECT InputTable.[Address Source], InputTable.[External Unique ID], InputTable.URN, InputTable.[Premises Name (Full)], InputTable.[Address 1], InputTable.[Address 2], InputTable.[Address 3], InputTable.City, InputTable.County, InputTable.Postcode, InputTable.Country, InputTable.[Telephone 1], InputTable.[TPS 1], InputTable.[Telephone 2], InputTable.[TPS 2], InputTable.SOHO, InputTable.[SIC Code], InputTable.[SIC Description], InputTable.[Major Category], InputTable.[Minor Category], InputTable.Ethnicity, InputTable.[Employee Number], InputTable.[Num Sites], InputTable.[SUV Contact ID], InputTable.[Contact Title], InputTable.[Contact First Name], InputTable.[Contact Last Name], InputTable.[Contact Job Title], InputTable.[Campaign Code], InputTable.Outcome, InputTable.[Date & Time], InputTable.[Inbound/Outbound], InputTable.[Gross Revenue], InputTable.[New Premises Name (Full)], InputTable.[New Address 1], InputTable.[New Address 2], InputTable.[New Address 3], InputTable.[New City], InputTable.[New County], InputTable.[New Postcode], InputTable.[New Country], InputTable.[New Telephone 1], InputTable.[New Contact Title], InputTable.[New Contact First Name], InputTable.[New Contact Last Name], InputTable.[New Contact Job Title]
FROM [Duplicates Only] INNER JOIN InputTable ON [Duplicates Only].[Premises Name (Full)] = InputTable.[Premises Name (Full)]
WHERE (([InputTable].[Postcode] = [Duplicates Only].[Postcode]) OR ([InputTable].[Postcode] IS NULL AND [InputTable].[Postcode] IS NULL)) AND (([InputTable].[Address 1] = [Duplicates Only].[Address 1]) OR ([InputTable].[Address 1] IS NULL AND [Duplicates Only].[Address 1] IS NULL))

UNION SELECT InputTable.[Address Source], InputTable.[External Unique ID], InputTable.URN, InputTable.[Premises Name (Full)], InputTable.[Address 1], InputTable.[Address 2], InputTable.[Address 3], InputTable.City, InputTable.County, InputTable.Postcode, InputTable.Country, InputTable.[Telephone 1], InputTable.[TPS 1], InputTable.[Telephone 2], InputTable.[TPS 2], InputTable.SOHO, InputTable.[SIC Code], InputTable.[SIC Description], InputTable.[Major Category], InputTable.[Minor Category], InputTable.Ethnicity, InputTable.[Employee Number], InputTable.[Num Sites], InputTable.[SUV Contact ID], InputTable.[Contact Title], InputTable.[Contact First Name], InputTable.[Contact Last Name], InputTable.[Contact Job Title], InputTable.[Campaign Code], InputTable.Outcome, InputTable.[Date & Time], InputTable.[Inbound/Outbound], InputTable.[Gross Revenue], InputTable.[New Premises Name (Full)], InputTable.[New Address 1], InputTable.[New Address 2], InputTable.[New Address 3], InputTable.[New City], InputTable.[New County], InputTable.[New Postcode], InputTable.[New Country], InputTable.[New Telephone 1], InputTable.[New Contact Title], InputTable.[New Contact First Name], InputTable.[New Contact Last Name], InputTable.[New Contact Job Title]
FROM [Duplicates Only] INNER JOIN InputTable ON [Duplicates Only].[Address 1] = InputTable.[Address 1]
WHERE (([InputTable].[Premises Name (Full)] = [Duplicates Only].[Premises Name (Full)]) OR ([InputTable].[Premises Name (Full)] IS NULL AND [InputTable].[Premises Name (Full)] IS NULL)) AND (([InputTable].[Postcode] = [Duplicates Only].[Postcode]) OR ([InputTable].[Postcode] IS NULL AND [Duplicates Only].[Postcode] IS NULL));


Code:
SELECT InputTable.[Address Source], InputTable.[External Unique ID], InputTable.URN, InputTable.[Premises Name (Full)], InputTable.[Address 1], InputTable.[Address 2], InputTable.[Address 3], InputTable.City, InputTable.County, InputTable.Postcode, InputTable.Country, InputTable.[Telephone 1], InputTable.[TPS 1], InputTable.[Telephone 2], InputTable.[TPS 2], InputTable.SOHO, InputTable.[SIC Code], InputTable.[SIC Description], InputTable.[Major Category], InputTable.[Minor Category], InputTable.Ethnicity, InputTable.[Employee Number], InputTable.[Num Sites], InputTable.[SUV Contact ID], InputTable.[Contact Title], InputTable.[Contact First Name], InputTable.[Contact Last Name], InputTable.[Contact Job Title], InputTable.[Campaign Code], InputTable.Outcome, InputTable.[Date & Time], InputTable.[Inbound/Outbound], InputTable.[Gross Revenue], InputTable.[New Premises Name (Full)], InputTable.[New Address 1], InputTable.[New Address 2], InputTable.[New Address 3], InputTable.[New City], InputTable.[New County], InputTable.[New Postcode], InputTable.[New Country], InputTable.[New Telephone 1], InputTable.[New Contact Title], InputTable.[New Contact First Name], InputTable.[New Contact Last Name], InputTable.[New Contact Job Title]
FROM [Non Duplicates] INNER JOIN InputTable ON [Non Duplicates].Postcode = InputTable.Postcode
WHERE ([InputTable].[Premises Name (Full)] = [Non Duplicates].[Premises Name (Full)] OR ([InputTable].[Premises Name (Full)] IS NULL AND [Non Duplicates].[Premises Name (Full)] IS NULL)) AND ([InputTable].[Address 1] = [Non Duplicates].[Address 1] OR ([InputTable].[Address 1] IS NULL AND [Non Duplicates].[Address 1] IS NULL))

UNION

SELECT InputTable.[Address Source], InputTable.[External Unique ID], InputTable.URN, InputTable.[Premises Name (Full)], InputTable.[Address 1], InputTable.[Address 2], InputTable.[Address 3], InputTable.City, InputTable.County, InputTable.Postcode, InputTable.Country, InputTable.[Telephone 1], InputTable.[TPS 1], InputTable.[Telephone 2], InputTable.[TPS 2], InputTable.SOHO, InputTable.[SIC Code], InputTable.[SIC Description], InputTable.[Major Category], InputTable.[Minor Category], InputTable.Ethnicity, InputTable.[Employee Number], InputTable.[Num Sites], InputTable.[SUV Contact ID], InputTable.[Contact Title], InputTable.[Contact First Name], InputTable.[Contact Last Name], InputTable.[Contact Job Title], InputTable.[Campaign Code], InputTable.Outcome, InputTable.[Date & Time], InputTable.[Inbound/Outbound], InputTable.[Gross Revenue], InputTable.[New Premises Name (Full)], InputTable.[New Address 1], InputTable.[New Address 2], InputTable.[New Address 3], InputTable.[New City], InputTable.[New County], InputTable.[New Postcode], InputTable.[New Country], InputTable.[New Telephone 1], InputTable.[New Contact Title], InputTable.[New Contact First Name], InputTable.[New Contact Last Name], InputTable.[New Contact Job Title]
FROM [Non Duplicates] INNER JOIN InputTable ON [Non Duplicates].[Address 1] = InputTable.[Address 1]

WHERE ([InputTable].[Premises Name (Full)] = [Non Duplicates].[Premises Name (Full)] OR ([InputTable].[Premises Name (Full)] IS NULL AND [Non Duplicates].[Premises Name (Full)] IS NULL)) AND ([InputTable].[Postcode] = [Non Duplicates].[Postcode] OR ([InputTable].[Postcode] IS NULL AND [Non Duplicates].[Postcode] IS NULL))

UNION SELECT InputTable.[Address Source], InputTable.[External Unique ID], InputTable.URN, InputTable.[Premises Name (Full)], InputTable.[Address 1], InputTable.[Address 2], InputTable.[Address 3], InputTable.City, InputTable.County, InputTable.Postcode, InputTable.Country, InputTable.[Telephone 1], InputTable.[TPS 1], InputTable.[Telephone 2], InputTable.[TPS 2], InputTable.SOHO, InputTable.[SIC Code], InputTable.[SIC Description], InputTable.[Major Category], InputTable.[Minor Category], InputTable.Ethnicity, InputTable.[Employee Number], InputTable.[Num Sites], InputTable.[SUV Contact ID], InputTable.[Contact Title], InputTable.[Contact First Name], InputTable.[Contact Last Name], InputTable.[Contact Job Title], InputTable.[Campaign Code], InputTable.Outcome, InputTable.[Date & Time], InputTable.[Inbound/Outbound], InputTable.[Gross Revenue], InputTable.[New Premises Name (Full)], InputTable.[New Address 1], InputTable.[New Address 2], InputTable.[New Address 3], InputTable.[New City], InputTable.[New County], InputTable.[New Postcode], InputTable.[New Country], InputTable.[New Telephone 1], InputTable.[New Contact Title], InputTable.[New Contact First Name], InputTable.[New Contact Last Name], InputTable.[New Contact Job Title]
FROM [Non Duplicates] INNER JOIN InputTable ON [Non Duplicates].[Premises Name (Full)] = InputTable.[Premises Name (Full)]
WHERE ([InputTable].[Postcode] = [Non Duplicates].[Postcode] OR ([InputTable].[Postcode] IS NULL AND [Non Duplicates].[Postcode] IS NULL)) AND ([InputTable].[Address 1] = [Non Duplicates].[Address 1] OR ([InputTable].[Address 1] IS NULL AND [Non Duplicates].[Address 1] IS NULL));


I have used this process with a number of files without any problems. However I have tried it with another two files and both are dropping a fairly large number of records somewhere within the process, but I am unable to determine where or why.

Could anyone suggest to me why this might be and how I could fix the problem?

Many thanks
 
From access help:

By default, no duplicate records are returned when you use a UNION operation; however, you can include the ALL predicate to ensure that all records are returned. This also makes the query run faster.
All queries in a UNION operation must request the same number of fields; however, the fields do not have to be of the same size or data type.
Use aliases only in the first SELECT statement because they are ignored in any others. In the ORDER BY clause, refer to fields by what they are called in the first SELECT statement.

So try with

Select .....
Union All
Select ....

JR
 
The reason for the unions is because I have to do three seperate joins of the duplicate records back to the full record set, one for each field, to account for posssible nulls in any of the fields. Any record where all three fields is null is being dealt with seperately, and thats being taken into consideration when the counts are made. However, if I use Union All won't I just end up with a huge number of duplicate records caused by merging the joins which didn't even exist in the original full record set? I just want to have only the records from the original table, which had a count of >1 in the aggregate query, hence the 'Union'. Is there a way I can use Union All but prevent incorrect duplicates?
 
I haven't really read the whole thread but to answer your question -->
Is there a way I can use Union All but prevent incorrect duplicates?
Code:
SELECT DISTINCT Q.* FROM 
     (Select .....
      Union All
      Select ....) AS Q;
At quick glance I feel there's a better of doing what you're trying to achieve but if you give some more information then maybe it will help us in proposing the way forward.
 
Last edited:
I believe I tried


SELECT DISTINCT Q.* FROM
(Select .....
Union All
Select ....) AS Q;
before I used my current method, and it was providing more records than were contained in the original file, suggesting that it wasn't working correctly, tho I will try again making sure brackets are in the correct place and see if it works.
 
Ok, I have tried
Code:
SELECT DISTINCT Q.* FROM 
(SELECT InputTable.[Address Source], InputTable.[External Unique ID], InputTable.URN, InputTable.[Premises Name (Full)], InputTable.[Address 1], InputTable.[Address 2], InputTable.[Address 3], InputTable.City, InputTable.County, InputTable.Postcode, InputTable.Country, InputTable.[Telephone 1], InputTable.[TPS 1], InputTable.[Telephone 2], InputTable.[TPS 2], InputTable.SOHO, InputTable.[SIC Code], InputTable.[SIC Description], InputTable.[Major Category], InputTable.[Minor Category], InputTable.Ethnicity, InputTable.[Employee Number], InputTable.[Num Sites], InputTable.[SUV Contact ID], InputTable.[Contact Title], InputTable.[Contact First Name], InputTable.[Contact Last Name], InputTable.[Contact Job Title], InputTable.[Campaign Code], InputTable.Outcome, InputTable.[Date & Time], InputTable.[Inbound/Outbound], InputTable.[Gross Revenue], InputTable.[New Premises Name (Full)], InputTable.[New Address 1], InputTable.[New Address 2], InputTable.[New Address 3], InputTable.[New City], InputTable.[New County], InputTable.[New Postcode], InputTable.[New Country], InputTable.[New Telephone 1], InputTable.[New Contact Title], InputTable.[New Contact First Name], InputTable.[New Contact Last Name], InputTable.[New Contact Job Title]
FROM [Duplicates Only] INNER JOIN InputTable ON [Duplicates Only].Postcode = InputTable.Postcode
WHERE (((InputTable.[Premises Name (Full)])=[Duplicates Only].[Premises Name (Full)]) AND ((InputTable.[Address 1])=[Duplicates Only].[Address 1])) OR (((InputTable.[Premises Name (Full)]) Is Null) AND ((InputTable.[Address 1])=[Duplicates Only].[Address 1]) AND (([Duplicates Only].[Premises Name (Full)]) Is Null)) OR (((InputTable.[Premises Name (Full)])=[Duplicates Only].[Premises Name (Full)]) AND ((InputTable.[Address 1]) Is Null) AND (([Duplicates Only].[Address 1]) Is Null)) OR (((InputTable.[Premises Name (Full)]) Is Null) AND ((InputTable.[Address 1]) Is Null) AND (([Duplicates Only].[Premises Name (Full)]) Is Null) AND (([Duplicates Only].[Address 1]) Is Null))

UNION ALL

SELECT InputTable.[Address Source], InputTable.[External Unique ID], InputTable.URN, InputTable.[Premises Name (Full)], InputTable.[Address 1], InputTable.[Address 2], InputTable.[Address 3], InputTable.City, InputTable.County, InputTable.Postcode, InputTable.Country, InputTable.[Telephone 1], InputTable.[TPS 1], InputTable.[Telephone 2], InputTable.[TPS 2], InputTable.SOHO, InputTable.[SIC Code], InputTable.[SIC Description], InputTable.[Major Category], InputTable.[Minor Category], InputTable.Ethnicity, InputTable.[Employee Number], InputTable.[Num Sites], InputTable.[SUV Contact ID], InputTable.[Contact Title], InputTable.[Contact First Name], InputTable.[Contact Last Name], InputTable.[Contact Job Title], InputTable.[Campaign Code], InputTable.Outcome, InputTable.[Date & Time], InputTable.[Inbound/Outbound], InputTable.[Gross Revenue], InputTable.[New Premises Name (Full)], InputTable.[New Address 1], InputTable.[New Address 2], InputTable.[New Address 3], InputTable.[New City], InputTable.[New County], InputTable.[New Postcode], InputTable.[New Country], InputTable.[New Telephone 1], InputTable.[New Contact Title], InputTable.[New Contact First Name], InputTable.[New Contact Last Name], InputTable.[New Contact Job Title]
FROM [Duplicates Only] INNER JOIN InputTable ON [Duplicates Only].[Premises Name (Full)] = InputTable.[Premises Name (Full)]
WHERE (((InputTable.[Address 1])=[Duplicates Only].[Address 1]) AND ((InputTable.Postcode)=[Duplicates Only].[Postcode])) OR (((InputTable.[Address 1])=[Duplicates Only].[Address 1]) AND ((InputTable.Postcode) Is Null) AND (([Duplicates Only].Postcode) Is Null)) OR (((InputTable.[Address 1]) Is Null) AND ((InputTable.Postcode)=[Duplicates Only].[Postcode]) AND (([Duplicates Only].[Address 1]) Is Null)) OR (((InputTable.[Address 1]) Is Null) AND ((InputTable.Postcode) Is Null) AND (([Duplicates Only].Postcode) Is Null) AND (([Duplicates Only].[Address 1]) Is Null))

UNION ALL

SELECT InputTable.[Address Source], InputTable.[External Unique ID], InputTable.URN, InputTable.[Premises Name (Full)], InputTable.[Address 1], InputTable.[Address 2], InputTable.[Address 3], InputTable.City, InputTable.County, InputTable.Postcode, InputTable.Country, InputTable.[Telephone 1], InputTable.[TPS 1], InputTable.[Telephone 2], InputTable.[TPS 2], InputTable.SOHO, InputTable.[SIC Code], InputTable.[SIC Description], InputTable.[Major Category], InputTable.[Minor Category], InputTable.Ethnicity, InputTable.[Employee Number], InputTable.[Num Sites], InputTable.[SUV Contact ID], InputTable.[Contact Title], InputTable.[Contact First Name], InputTable.[Contact Last Name], InputTable.[Contact Job Title], InputTable.[Campaign Code], InputTable.Outcome, InputTable.[Date & Time], InputTable.[Inbound/Outbound], InputTable.[Gross Revenue], InputTable.[New Premises Name (Full)], InputTable.[New Address 1], InputTable.[New Address 2], InputTable.[New Address 3], InputTable.[New City], InputTable.[New County], InputTable.[New Postcode], InputTable.[New Country], InputTable.[New Telephone 1], InputTable.[New Contact Title], InputTable.[New Contact First Name], InputTable.[New Contact Last Name], InputTable.[New Contact Job Title]
FROM [Duplicates Only] INNER JOIN InputTable ON [Duplicates Only].[Address 1] = InputTable.[Address 1]
WHERE (((InputTable.[Premises Name (Full)])=[Duplicates Only].[Premises Name (Full)]) AND ((InputTable.Postcode)=[Duplicates Only].[Postcode])) OR (((InputTable.[Premises Name (Full)]) Is Null) AND ((InputTable.Postcode)=[Duplicates Only].[Postcode]) AND (([Duplicates Only].[Premises Name (Full)]) Is Null)) OR (((InputTable.[Premises Name (Full)])=[Duplicates Only].[Premises Name (Full)]) AND ((InputTable.Postcode) Is Null) AND (([Duplicates Only].Postcode) Is Null)) OR (((InputTable.[Premises Name (Full)]) Is Null) AND ((InputTable.Postcode) Is Null) AND (([Duplicates Only].[Premises Name (Full)]) Is Null) AND (([Duplicates Only].Postcode) Is Null))) AS Q;

but that drops more records than the way I was doing it previously. Would using a different kind of JOIN help in any way?
 
It simply drops duplicate records so maybe you want to revise your goal.
 
I'm going to see if I can better explain exactly what it is I need to do. I'm being given files with a large number of records in them. Some of these records may be exact duplicates of each other, while a great many or not exact duplicates but have the same Premise Name, Address Line 1 and Postcode, i.e refer to the same property. All I want to do is seperate the original table into two data sets, one containing records referring to a property which is unique, and one containing records which refer to a property which is also referred to in another record.

The record counts of these two data sets MUST add up to the original record count. (this, unfortunately is not my decision). I have used the aggregate queries that I posted the code for above to split records into combinations of Premise Name, address 1 and postcode which are unique, and those which are not.
However when I try and join those combinations back to the original table (Using the union queries I've posted) to get the full details of the records they relate to, the two data sets record count does not add up to the original table's.

I know the problem is not due to records where all three of the fields I am aggregating on are null, because I have had this problem previously and htey are now being dealt with and considered in the final count.
 
I don't see how you hope to achieve the same record count if you are applying criteria on all subqueries.
 
There are no criteria I'm applying which should reduce the final number of records, thats why I don't understand the count not matching. All I'm doing is splitting into two sets which form the whole, and on many occasions, works perfectly, but in a few others it doesn't, so there must be some kind of record which is causing a problem, but I can't figure out what it is
 
In your previous posts I see criteria applied in each subset.

It might be best to see your db.
 
I have not fully read and digested the whole thread, however you seem to be treating duplicate postcodes as indicating duplicate premises, that is not so a postcode can apply to upto 14 houses, I believe that is the correct figure.

Brian
 

Users who are viewing this thread

Back
Top Bottom