Challenge: Recategorise records based on multiple criteria

Stormin

Nawly Ragistarad Usar
Local time
Today, 17:56
Joined
Dec 30, 2016
Messages
76
Hello,

I've been trying for a couple of weeks to solve an issue with the design of a new database in Access 2010. I'm new-ish to database design but have been an end-user of Access for many years (mostly A'97 though).

Essentially I am duplicating a "database" that has been created, stored and used in Excel over many years. The idea is to duplicate it, run both in parallel and then phase out the obsolete Excel file.

One field that has got me scratching my head is a "re-categorise" field. In Excel this is a whole bunch of nested IF statements to convert a sales channel from the computer generated string to a new string based on certain criteria.

For example, if the sales channel is "Open Auction" and the buyer code is "011111A" then the new category is "Auctions Ireland".

Unfortunately not all the IFs use the same determination category, and some have lookups to check if they are contained within another table and act accordingly.
Also unfortunately, I cannot change the way the raw input data is generated.

To give a couple more examples in sloppy Excel paraphrasing:

IF [SalesChannel] = "Open Auction" AND [BuyerCode] = "099999Z" THEN "Special Buyer"
IF [SalesChannel] = "ABC-Chan" THEN LOOKUP [MainTable].[VIN] against [ExtraTable].[VIN], return [ExtraTable].[NewChannel] ((if value not found, return "Direct Sale"))

...and so on.

I have 8 different values of [SalesChannel] that need to be converted to 13 possible results with max of three possible outcomes per [SalesChannel] value, on a table with ~30k records.
There are two [SalesChannel] values that need to be checked against a separate table each, and whether or not a certain field is found within that other table determines the outcome.

Preferably I am looking to:-

  • Avoid lots of confusing nested IIF statements
  • Have the re-category field as a query result i.e. not a calculated field in the main table
  • Have the re-categorisation rules stored in a table so that they can be changed and/or added to easily (unlike nested IIFs)
...but really at this point my only necessary condition is that I find a solution that works. :banghead:

Does anyone that can decipher my blabbering have any ideas on how/if this is possible? I have tried a few things but as I said I am inexperienced with creating databases; however I have a good idea of how they work and a basic grasp of VBA (but not SQL).

Thanks in advance!!
 
rather than nested iif's have you tried a function with select case?

also, have you normalized your data that is being imported from excel? just a reminder that excel and access are 2 different beasts.
 
It sounds like an ideal candidate for a lookup table. Put all your expected criteria in a table with a column for the returned result.

I think we would probably like to see all the possible results before committing to a structure but this is the normal way of dealing with multiple look-ups.
 
It sort of depends on how many of these things you have and how many columns will participate in the recategorization tests. It also depends on how "clean" your input data happens to be. For instance, if "Open Auction" is one of the choices, you would NEVER want a record that contains "OpenAuction" (without the space) because my suggestion would not work well with unclean data.

Let's do the simple case first.

Import your raw table to Access. Add a text field to the table, "NewBuyer" - which will initialize to an empty string (zero-length string.) Be sure that you have indexes on the SalesChan and BuyerCode fields, set as Dups Allowed. Also on any other field that participates in the recategorization decision. You can have up to 10 such indexes, but note that if there IS a prime key, it takes up one of those indexes, too.

Build an "Xlate" table with your translations using fields like "SalesChan" and "BuyerCode" and "NewBuyer" plus maybe a few others.

Now build an INNER JOIN that might look like this:

Code:
UPDATE MainTable INNER JOIN Xlate ON 
    ( Xlate.SalesChan = MainTable.SalesChan ) AND 
    ( Xlate.BuyerCode = MainTable.BuyerCode )
    SET MainTable.NewCode = Xlate.NewCode
          WHERE MainTable.NewCode = "" ;

This is phase one of your recategorization and you use Xlate table to hold the translations you want to use.

Phase two is to do another query like this that uses another field from the main table such as your VIN field. Build a series of queries to cover all your bases.

You will note that the queries I suggested include a safeguard to prevent from double-dipping on setting the new values. It is either that you use the restrictive WHERE clause or that you would have to execute these queries in proper order so that you would skip cases that could overlap.

I don't know if it would ever happen, but if your MainTable contains the correct BuyerCode value, the trick COULD be as simple as doing an UPDATE query with a single ON-clause instead of the double-barrelled one I unloaded on you earlier, where the BuyerCode and NewBuyer match in your table. Or you could have an UPDATE by exception approach where your last UPDATE query looks ONLY at the records with no translation yet, and have it UPDATE the NewBuyer field from the BuyerCode field. Potato, potahto.

Now, once each query works on its own, you can make a MACRO to execute the action queries in whatever sequence appears to be correct.

Since you suggest that you might have to tune and re-tweak the results a few times, keep an UPDATE query around that would clear out the MainTable NewBuyer field to an empty string so that you can manually update your translations table and re-run the sequence if your categories change.

This would be your table-based translation that retains your translation criteria in a separate table from the main table.
 
Wow, thank you The_Doc_Man, that turned out to be exactly what I had in mind!
I now have a very nicely laid out 'Xlate' table that contains the parameters for my re-categorisations (much better than the first one I made anyway, before I reached out for help).

I replaced the
WHERE (...) = ""
with
WHERE (...) Is Null
as for some reason the first method didn't work.

I also added, after WHERE via AND, an extra check to make sure that the field that I am writing from in the Xlate table Is Not Null. This makes sure that when I'm running the queries individually the "are you sure?" dialogue is not counting replacing blank strings with blank strings.

For completeness I have several queries strung together with a macro that perform the following actions (paraphrasing again):

1. Check the SalesChan and BuyerCode across MainTable and Xlate, set MainTable.NewSC to Xlate.[Rewrite from BuyerCode]
2. Check SalesChan across MainTable and Xlate, match MainTable.BuyerCode against ExtraTable1.BuyerCode, set MainTable.NewSC to ExtraTable1.ChangeTo WHERE Xlate.CodeLookup = "TRUE"
3. Check SalesChan across MainTable and Xlate, match MainTable.VIN against ExtraTable2.VIN, set MainTable.NewSC to ExtraTable2.NewChannel WHERE Xlate.VINLookup = "TRUE"
4. Check SalesChan across MainTable and Xlate, set MainTable.NewSC to Xlate.[Rewrite Direct]
5. Set MainTable.NewSC to "Other" (WHERE MainTable.NewSC Is Null)

#4 handles both the direct re-writes and the non-values from the two ExtraTable lookups, and #5 gives anything that has been missed out a catch-all category (of which there should be none so will be a flag for incomplete categorisation parameters).

Additionally I have queries to clear all MainTable.NewSC entries and just the "Other" entries.

This all works on my test table with 1k records. Tomorrow I will backup and redefine to act upon the ~30k record main table. In theory it will work :)

Thanks again for everyone's help!
(challenge completed)
 
Last edited:
Using { IS NULL } long-term probably isn't right either. You know that query I said to keep around to erase the NewTrans fields by resetting them to "" ? If you ran that first, then your { IS NULL } could be replaced with the { = "" } and it would work.

The performance from using queries rather than a series of recordset scans and a bunch of IF-ladders or CASE-ladders should also be better. The fastest VBA code you can write is still slower than a good sequence of SQL queries. The DB engine is optimized to do queries whereas the VBA compiler is only a semi-compiler or pseudo-code compiler.
 
Using { IS NULL } long-term probably isn't right either. You know that query I said to keep around to erase the NewTrans fields by resetting them to "" ? If you ran that first, then your { IS NULL } could be replaced with the { = "" } and it would work.

My clearing queries set all the records in the NewSC field to Null. It works accurately in my tests and I see no (theoretical) reason why it wouldn't work.
Thoughts?

and yes, performance is very fast, so much so that I had to put a MsgBox at the end to confirm that the actions had been performed! :D
 
OK, if setting to Null works for you then do it. However, some of us tend to avoid use of nulls because the generally intractable nature of nulls with any formula involving them. In the final analysis, if it works for you, then you are where you need to be.

Glad to help, and also glad to know that you like the performance of this solution.
 

Users who are viewing this thread

Back
Top Bottom