Delete operation having unexpected results

Isaac

Lifelong Learner
Local time
Today, 12:46
Joined
Mar 14, 2017
Messages
11,111
I'm trying to delete some records from a large table, a one time clean-up operation, and having some unexpected results. Wondering whether there is something I'm missing that anybody might identify!

Large table, ~half a million records: TableMain. Among many other fields, it has sp_id and [bu]
Small table, TableMapping, intended to be a mapping table that shows the ONLY "allowed" combinations of fields [sp_id] and [bu]. This table has one record per distinct combination of [sp_id] and [bu]........and also, one record per [bu]. So there can be multiple bu's assigned to one sp_id, so you might see a few records that look like this:

sp_id | bu
001 | buname1
001 | buname2
002 | buname3
002 | buname4

which means that buname1 should ONLY belong to sp_id 001, and buname3 should ONLY belong to sp_id 002

However, in my TableMain, it is a mess. We have records where buname1 occurs for numerous different sp_id's - even though the only records we want to preserve are those where bu=buname1 and sp_id=001

So here is my code:

Code:
Sub DeleteJunkFromMainTable()
'out of the >half million records imported from the combination of all

Dim rs As DAO.Recordset, db As DAO.Database, strID As String, strBU As String, strSQL As String, strIn As String
Set db = CurrentDb
Set rs = db.OpenRecordset("select DISTINCT [BU],[SP_ID] from TableMapping")

Do Until rs.EOF = True
    strBU = Replace(rs.Fields("BU").Value, "'", "''")
     
    strID = rs.Fields("SP_ID").Value
    'delete all records from TableMain where the BU equals this particular mapping table BU, and the sp_id is NOT the proper one:
    strSQL = "delete * from TableMain where BU='" & strBU & "' and sp_id<>" & strID
 
    CurrentDb.Execute strSQL, dbFailOnError
    'Debug.Print strSQL
 
    rs.MoveNext
Loop

End Sub

That should take care of cases where TableMain actually has records with the same BU from TableMapping, BUT, the sp_id was not the correct one.

Now, in case TableMain has records with a totally unknown BU--one that didn't even exist in TableMapping, I want to eliminate all those too:

Code:
Currentdb.Execute "delete * from TableMain where BU not in(select bu from TableMapping)"

At the end of all this, as expected, TableMain is much smaller. Instead of half a million records, it has about 6,000.

However, this process seems to have eliminated too many records.

For example: The following record was eliminated: Where the BU was xxx and the sp_id was yyy, even though that combination of BU/sp_id (xxx/yyy) WAS on the TableMapping, and was an allowable combination.

What have I screwed up here??

I'm about to do some more 'stepping' than I already have, but figured I'd put it out here in case anyone noticed anything logically flawed too...
 
Last edited:
I don't think VBA code is the right way to tackle this. You can knock this out with a few queries. Step #1 in all methods is making a backup of your database, then:

Method #1--Set up a query joining your MainTable to MappingTable by the 2 appropriate fields. Bring in all the fields from MainTable and make a table out of it: MainTable2. Delete MainTable and rename MainTable2 as MainTable. Done.

Method #2--Setup a query similar to method#1, but make it a LEFT JOIN such that you are showing all records from MainTable and just those matches in MappingTable. Bring down sp_id from MappingTable and set its criteria to Null. The query will now show all the records you want to delete.

Method #3--Add a "DeleteRecord" Yes/No field to MainTable and set its default property to True. Set up the same query as Method #1 and turn it into an UPDATE query and change the DeleteRecord value to false for all matches in MainTable. Then set up a DELETE query using just MainTable and delete those where DeleteRecord is True.
 
Okay, I'm on board with that. I went ahead and used method #1. The # of records impacted seemed within the ballpark of what I was expecting - about 6300 results in the new table.

However, for some reason, I'm still missing the same pair (in the new table) that I had been focused on for troubleshooting purposes, which I was missing using the old method of doing it. I guess I need to troubleshoot some more, I've compared the text of the BU over and over in Notepad, it appears identical, but maybe there is something else about the data/value itself that I'm not seeing.

I will keep researching and see what I can find..
 
I guess there must be something - however invisible it is - that IS indeed different about that BU value, because testing it by joining the original main table with the mapping table, on sp_id and BU, and specifying only this sp_id value, the "missing" BU is not showing up at all. So I guess it is a data problem.

Anyway - I'll figure that out - thanks for your help, it made me realize the whole thing was because of some truly differing data value, and yes, I should have done the whole thing via the joins.

I've mostly worked in environments with tidy keys and all my queries are usually simple single-column joins, so I tend to get uncomfortable/uncertain with multi-column joins and then turn to VBA - perhaps inappropriately, at times.

Thanks again.
 
That's another reason numbers are better than text for being foreign/primary keys. If your eyes see 26 then you can be assured 26 is the value the computer sees, But if you see ABC123 its possible there's a tab, space, newline, etc. at the end of it that you can't perceive but makes a world of difference to a computer.

You might do a Trim() on the field you think has invisible characters to strip them out.
 
No visible characters are a Royal PITA with this sort of data cleansing.

I've been caught out by the difference between - and – before and it's very very difficult to spot when you are scanning down a massive list of things.
 
Gentlemen - Too true! Clean up operations without proper keys is a thankless task.

A big one today was (admittedly), visible, but annoying - someone had typed hyphens that had become 'longer' hyphens (in Word or something), but one thing led to another, I ultimately found out my mapping table was filled with every manner of inexcusable differences! I strive to overhaul this situation with terrible or nonexistent data structures, but it is going to take serious time...in the meantime I have to keep the plane flying with duct tape, while also rebuilding it as it flies (or rather, sputters along with daily near-death experiences)

Finding out the 'mapping table' I was given is about 90% wrong, I've decided to query the ACTUAL distinct BU values from the Main table and give the business partner a task....go through this list and assign the numeric sp_id to each of them. We'll fix the mapping table, that's the shortest way to the end from here.

Thanks again to all who took the time out to review.
 
Last edited:

Users who are viewing this thread

Back
Top Bottom