View Full Version : Deleting duplicates but keeping totals


kupe
05-16-2003, 06:55 AM
I have a table of 60,000 records many of which are repeated. Each record has two fields - a url, and a number.

I need to delete all duplicate names, but I need also the totals of the duplicates to be added to the total of the url which will remain.

For example: Name Totals
www.access.com 12
www.access.com 2
www.access.com 9

I need to be left with
www.access.com 23

I can't find a way to do this by QBE. Perhaps sql would manage it? Is it a job for coding? As I am very far from that far on with VBA, would anyone have a code that could be used for this please?

I need to perform this process with many similar tables.

Hopefully

Stressed Out of Warwickshire

kupe
05-16-2003, 07:14 AM
Been there and done that, MoP, but I can't find how to make it help. It shows 431 duplicates and then a field with the number of duplicates.

I could manually delete each of the records - but in a table of 25,000 records, it could be a long night's work, and the weekend. And I need to add the totals as well.

Alternatively, I can't just delete the duplicates in the automatic way (by duplicating the table etc as per Access Help) because the totals won't be added up.

It's a bit of a dilemma. But I do appreciate your help.

Mile-O
05-16-2003, 07:19 AM
Sorry, deleted my post when I realised hyperlinks weren't included.

Are you wanting to do this programmatically? I know how to do it manually. With a bit of thought I should be able to do it in code.

kupe
05-16-2003, 07:23 AM
When I couldn't find your reply, I thought this problem had become a bit too heavy.

I really don't know the answer. I would have thought manually would work, but totalling the totals that will be deleted makes it a challenge. Or impossible.

SQL would probably work, and I'm sure VB would, but it's much further on than my studies so far. Cheers, MoP.

Mile-O
05-16-2003, 07:26 AM
Assuming your fields are SiteLink and Total in a table called tblData.

Make a query with this SQL and call it qryInitial.

SELECT SiteLink, Total, CStr([SiteLink]) AS NewLink
FROM tblData;


Make a new query with this SQL and call it qrySecondStep.

SELECT DISTINCTROW First(NewLink) AS [NewLink Field], Count(NewLink) AS NumberOfDups, Sum(Total) AS SumOfTotal, Left([NewLink],InStr(1,[NewLink],"#")-1) AS Site
FROM qryInitial
GROUP BY Left([NewLink],InStr(1,[NewLink],"#")-1)
HAVING (((Count(NewLink))>=1));

One more query with this SQL and call it qryFinal.

SELECT Site, SumOfTotal AS [Number] INTO tblFixed
FROM qrySecondStep;


Run the third query. You'll have a new table called tblFixed.
Enter its design view and change the data type of the field called Site to a hyperlink.

Rename the table if you wish.

All done.

kupe
05-16-2003, 07:33 AM
Looks great, MoP. What a star! Being dragged into a meeting, will report back asap. But it looks really good. Very grateful.

kupe
05-19-2003, 12:40 AM
Hi, MoP. The first query works fine.

Access stumbles with the second, and the trouble seems to be around

First(NewLink)

Be very grateful if you could look your sql over and see if you can see the error. I've tried any number of variations without success. It needs an expert, and sadly I am still far from that. Cheers

Mile-O
05-19-2003, 01:00 AM
Have a look at this

Apologies for the poor naming conventions :rolleyes:

kupe
05-19-2003, 02:35 AM
It looks good ... only the problem is that the number of duplicates has to be added to the total.

So the result from your table1 which displays in tblFixed would be 27 for www.mileout.org.uk, because the total would be added to the number of times the url has been repeated.

Similarly www.bt.com would total 9.

Sorry that it isn't all straight forward, MoP. It's really very good of you to work over this. I hope your patience will last a little longer. Thanks enormously.

Mile-O
05-19-2003, 02:38 AM
Change the SQL on the Make Table query to this then:

SELECT [Find duplicates for Query1].Site, [SumOfNumber]+[NumberOfDups] AS [Number] INTO tblFixed
FROM [Find duplicates for Query1];

kupe
05-19-2003, 06:03 AM
A couple of hours later, and I seem to be back to Saturday's delights where no matter what I did with the second qsl, I was told nothing but "invalid prodedure call". Frustrating is one of the words that leaps to mind.

I just mention this in case you might have thought of something since ...

Mile-O
05-19-2003, 06:11 AM
I don't understand where you are going wrong as the example I posted does what you want (after the third SQL statement is amended to what I rewrote above.

kupe
05-19-2003, 06:17 AM
You and me both, MoP. I'll get there yet. I do appreciate all that guidance. Must be a mental collapse. But, I think, I'm almost there ...

kupe
05-19-2003, 08:19 AM
Access is happy with yours, but won't let me through with the second query. You'll spot the mistake, of course, but I can't. All I get is the message 'invalid procedure call'. It was identical to yours in sql, and now it's identical via qbe. Same result. If you'd take a quick look, that would be brilliant of you. Cheers.

Mile-O
05-19-2003, 08:32 AM
Make a copy of your database - remove everything except the table.

Leave only a few duplicate records in the table.

Post it here.

kupe
05-19-2003, 08:45 AM
It's zipped down to 137kb but the forum won't accept it. Perhaps it's the fates saying I'll solve it myself given time. But appreciate very much all the guidance.

Mile-O
05-19-2003, 08:54 AM
Send it to this address then: Mile-O-Phile (takeahikefatboy@hotmail.com)

WayneRyan
05-19-2003, 12:55 PM
kupe,

I think Mile's queries are the best approach, but if
this is really driving you crazy, you can try this
code from a Command Button.

I haven't tested it, but it should work.

BACK UP YOUR DATA FIRST THOUGH !!


Dim dbs As Database
Dim rst As Recordset
Dim sql As String
Dim LastSite As String
Dim ThisTotal As Long
'
' Define a recordset
'
sql = "Select * from tblData Order by SiteLink"
Set rst = dbs.OpenRecordset(sql)
'
' Get the First Record
'
rst.MoveFirst
LastSite = rst!SiteLink
'
' Process all records
'
rst.MoveNext
While Not rst.EOF and Not rst.BOF
'
' If the current record is a duplicate, get it's total + 1 and
' add it to the previous record, then delete it.
' If not a duplicate then save the Key field
'
If rst!SiteLink = LastSite Then
ThisTotal = rst!Total + 1
rst.MovePrevious
rst.Edit
rst!Total = rst!Total + ThisTotal
rst.Update
rst.MoveNext
rst.Delete
Else
LastSite = rst!SiteLink
End If
'
' Move to next record
'
rst.MoveNext
Wend
Set rst = Nothing
Set dbs = Nothing


hth,
Wayne

kupe
05-20-2003, 12:36 AM
Trying here again. Thanks for the patience, MoP.

kupe
05-20-2003, 12:39 AM
Sorry, try a different approach. (I used to wonder why people changed their names regularly on these forums.)

Mile-O
05-20-2003, 12:45 AM
Can you convert it to '97, please?

kupe
05-20-2003, 01:01 AM
Converted. Cheers.

Mile-O
05-20-2003, 01:09 AM
The table you have: tblTable1

You have the field SiteURL which is a text field - was this always a text field? If so, I had thought that the field was a hyperlink field.

kupe
05-20-2003, 01:22 AM
Actually, you mentioned that the other day, I think. But, no they are hyperlinked because they are just being totalled. Cheers.

Mile-O
05-20-2003, 01:28 AM
I've put the make table query in an autoexec macro.

kupe
05-20-2003, 01:54 AM
I'm back. Dragged off to a meeting as I was sending that message, which ought to have confirmed they were not hyperlinked. Will look at this with great interest, and come back. Cheers.

kupe
05-20-2003, 02:41 AM
My beginning was not all that dissimilar. (I'm almost too nervous to mention the following for fear it happens again. ) But when grouping the sites for duplicates, there still were duplicates. It was as if the query saw numbers of "hits" in the record, saw they were different, and assumed the string was different too.

I offered the result to the duplicates wizard and it found, I think, about 400 duplicates still remained out of the 24,000 urls.

Despite the great respect I have for you, I was a bit anxious, MoP, when I put yours to the duplicate wizard test.

But, brilliant! Not a one. It's perfect. It's been a painful learning curve this one. I'm really glad you were the teacher, and very grateful for it - and for your patience. Many many thanks.

Mile-O
05-20-2003, 02:45 AM
The problem from querying on the original table came from it having an AutoNumber field.


No probs - have fun in Access Land! :cool:

kupe
05-20-2003, 02:49 AM
Hi, Wayne, thanks very much for the code and for the trouble you went to. Mile-O-Phile's works really well. I will borrow your code if that's okay and use it in another place. So your thoughtfulness is being put to good use, and is appreciated. Many thanks.