Help Me Normalize my Table Of Doom

Swatkins · Oct 27, 2016

I inherited a database in my workplace a number of years ago. Initially the entire database consisted solely of a single table with all the associated data and a second table that was (and still is) as far as I can see essentially a copy of all the data in the first table. I still don't understand why the original designer did it that way.

There is a single form for the database, which linked to the first of the two tables. Clicking the Save Data button copied all that data into the second table. A few years ago, I was forced to do some rudimentary normalization - pulling a few fields out of the table and into some secondary tables - because the number of data points I needed to track had increased somewhat, and the table was already so large that adding more exceeded the maximum number of fields that Access could handle.

For a long time I resisted normalizing this thing because of the sheer amount of work involved: especially having to reconnect everything, fixing all the reports, etc. But... this thing is a beast, and it needs to be done right, for the future.

Here's the problem. This isn't a standard normalization problem, where you've got fields with tons of repeated data. Let me see if I can explain somewhat generically. My department does a lot of pro-forma finance analysis for the purchase of some large and expensive Widgets. The Widgets are very expensive, but are also an important revenue driver for our company: i.e. the widgets are a key raw material in our products. The Widgets can be used in a few different ways: we can burn a lot of the Widget's value in a heavy run of our product that we call a "Burst" that produces a lot of a specific product all at once, or we can use it in a rotational "Schedule" that uses the Widget up at a more steady rate over time.

My database tracks how we thought we would use the Widget at the time that we acquired it versus how we actually used the Widget. So there's how many "Bursts" of the Widget, and all of the associated details related to the burst, and how much "Schedule" use of the Widget we anticipated, etc. And then how many "Bursts" we actually had, and all associated details, and how much "Schedule" use, etc. So... each anticipated "Burst" at the the time of our acquisition Analysis and each of the detailed datapoints has a field. Each actual "Burst" and data points likewise. So... There's fields like "AcqBurst1Plant", "Acq1BurstDate", "AcqBurst1Cycles", "AcqBurst1Prod", "AcqBurst1Revenue", "AcqBurst1Cost" (for the amortized cost associated with that burst), etc. Then "AcqBurst2Plant", "Acq2BurstDate", "AcqBurst2Cycles", and so on. Then "ActBurst1Plant", "ActBurst1Date", "ActBurst1Cycles", and so on and on, for the Acquisition Estimate and the Actual, respectively. You see how this is an awful format?

I have groups of fields going out for up to 6 "Bursts" of the widget in production. Most times when we get a Widget into production, it's either strictly for Scheduled use, or there's only one or maybe two "Bursts". Occassionally we have a Widget that gets used up in 4, 5, or even 6 "Bursts". If we ever have a Widget that was going for a 7th or 8th... our database couldn't handle it. Consequently most of those Burst fields are empty, and for a small handful they're all used up.

It seems evident to me that there should be a separate "Bursts" table that refers back to the WidgetID key field in a main table. The Burst table probably should have a field for an "Acq" burst versus an "Actual" burst, to differentiate between our acquisition estimated Bursts vs. Actual bursts, then fields for Date, Plant, Cycles, etc.

This doesn't seem to be something the Analyze Table tool knows how to handle. I'm hoping there's an automated way to pull this data into a better format. But I'm afraid I'm going to have to brute-force manually copy this data for each widget into a proper Burst table...

Does anybody have any thoughts on a better approach?

plog · Oct 27, 2016

Does anybody have any thoughts on a better approach?

Build a new database.

No seriously. You have a system that works, but want to move to one with a different, better structure. Build that database with the better structure. Verify the structure makes sense and will accomodate the data you need. Then move the old data to it---don't switch off the old database though, you are still in testing mode. I would formalize/and streamline this data movement process as best as I could because you will be doing it more than once to test this new system.

Once the data has been moved that first time, make sure you are able to get data out like you want (via Reports/Queries). Once that test is passed, build new forms to allow you to edit/add/delete data from the new system. Once that test is passed, clear out the data from the new system and reload it once again from the live system. Verify the new system works and matches the output from the live system. Do this purgin/reloading for however long it takes for you to be confident in the new system (few weeks, few months). Then, one day, make the switch--reload the data one last time and turn off the old system.

Minty · Oct 27, 2016

Plog has it in one - start from scratch. But as to moving / copying / normalising your data across it is actually quite easy.

Put your data into an excel spreadsheet, strip out fluff and leave the columns you are trying to reformat. Have a look at this video https://www.youtube.com/watch?v=pUXJLzqlEPk
Basically your data is a bit like a cross tab / pivot table. This shows you how to turn it back into a normalised format.

Swatkins · Oct 27, 2016

You know what... you might be right about that.

I finally decided I needed to normalize this beast because it used to be just me making updates to this system. I knew that the engine under the hood wasn't so awesome, but the output was fine, and the reports were useful to Sr. Management.

As long as there was never likely to be anybody but 1 person making updates, I figured it was fine and not worth the work.

But now I have a coworker who's helping me make updates. And it's happened a couple times now I needed to be in there at the same time as her, and I knew that would cause data corruption issues. So, it was time to split the database, but splitting a beast like this just makes no sense to me. It's like putting lipstick on a pig.

I'll start work on building a new system with better data structure. Then, once it's running correctly, I'll split that.

Swatkins · Oct 27, 2016

Minty, using Excel to unpivot the data might work... But how do you load it back into Access with the foreign key reference?

Assuming I'm using an auto-number as the PK for the general data on my "Widgets" in a separate "Widget" tab, I'm not sure how I'd be able to automate replacing the Widget name with the appropriate FK in my new "Burst" table data to refer back to the Widget table...

ETA... Hmm... wait... I'm betting there's some way I could do this with some kind of query inside Access, or more likely a set of queries... maybe without having to go back to Excel to unpivot. I'll have to give this a bit more thought.

Minty · Oct 27, 2016

If you always include the PK in the data you are "Unpivoting" it will include it.
Play with it and see. after all its only a excel copy of your data not the real stuff. No harm can be done. (Unless you are using sharp implements without adult supervision)

Swatkins · Oct 27, 2016

We'll just keep the sharp pointy things between us... no need to involve adult supervision.

Okay, here's a question:

The current table already has an auto-number PK. So, like you said, if I preserve that in the data I unpivot, I'll have that key in my flattened table.

Now, supposing I am building a new Widget table from scratch with an auto-number PK, and load the Widget data into, I imagine they'll be assigned PKs that may differ from the PKs they had in the original table. To avoid this, I could make the PK field an Integer field, so I can force the new table to accept the pre-existing PK values.

But can I then, once the data is loaded, convert my any-old-integer-I-want PK field into an auto-number field? Will Access let me do that?

I'd have a reasonable level of confidence that the integer values in that field will be unique, as they're coming from a table that already uses an auto-number PK.

Minty · Oct 27, 2016

Short no pointy answer - yes. The process would be;

Import and sanity check data, with PK field set to Long Integer.
Take a backup.
Once all is well change PK field to autonumber.
Sanity check again.

Bob is your uncle.

Simon_MT · Oct 31, 2016

Perhaps look at the problem from the Widget level. After acquisition (Purchase) the attrition is either Burst or Scheduled. Or Usage method could be "A", "B" or "S" so looking at the Stock usage as long as you can separate the type of transaction you know the type of movement. Counting the number of Bursts is mathematical or could be stored.

The problem will arise that the number of anticipated Burst don't match the Actual Burst but that is something you will have to deal with. the only difference between Scheduling and Burst seems to be frequency so I for have a table for Burst and Scheduling sort of like a JOB table.

I would create this table and then dredge the "usage" table and insert this into a new table keeping the original source table ID in a separate field.

You don't need to worry about the number of Bursts or Scheduling events anymore just create a new record in the "Jobs" table or whatever you want to call it.

Simon

Simon

Personally

The_Doc_Man · Oct 31, 2016

Doing this from scratch gives you a chance to step away from the curb so that you can do the design the right way. Since you are familiar with Access, I will merely offer a couple of suggestions.

Your discussion of normalizing the BURST events seems to have that concept more or less well in-hand. You might need to consider that this is actually just an inventory system where your "stock draw" transactions have separate flavors - scheduled draw vs. burst draw, and that a scheduled draw has no burst event. This would mean an extra code in your draw transaction history to link to a burst event that would be null for a scheduled event. Think about this part of the design carefully because you are potentially defining a many-to-many relationship here - many transactions to many bursts or NO bursts. (Technically, a none-to-one and a many-to-one relationship are treated the same, which is to say "not a one-to-one relationship".) As an alternative, you might find it better to have a default "scheduled" event descriptor shared by all such scheduled draws to avoid the expected nulls.

In your proposed new database, lay our your tables and relationships first, based on your knowledge of the inner workings and data flow of the original database and your business. Since this is the new, better version, here is an important rule to remember: If there is an apparent disagreement between the data flow you see in the business today vs. the data flow of the old app, the older app was WRONG.

Once, about 15 years ago, I was in your shoes. I inherited a P.O.S. (and no, that ain't "Point of Sale") database. Had to tinker under the hood forever, including break a dependency on using a spreadsheet as one of the external tables. (Still get waves of nausea at THAT particular design choice...) The point of my digression is that now you can make it right by starting from the ground up knowing that you have a working clunker gathering data for you.

Help Me Normalize my Table Of Doom

Swatkins

Registered User.

plog

Banishment Pending

Minty

AWF VIP

Swatkins

Registered User.

Swatkins

Registered User.

Minty

AWF VIP

Swatkins

Registered User.

Minty

AWF VIP

Simon_MT

Registered User.

The_Doc_Man

Immoderate Moderator

Similar threads

Users who are viewing this thread