Normalizing (and fixing) a large, messy DB (1 Viewer)

JoeBruce

Registered User.
Local time
Today, 14:49
Joined
Jan 13, 2017
Messages
32
Howdy,

I work at a place that includes a popular rock climbing area. They have used an Access DB for many years to track their stats. After learning a fair amount about Access/DB design for my own purposes, I've taken a look at this climbing database, and it seems a pretty big mess. I would like some advice on trying to redesign or make a totally new climbing database.

After starting a new DB file and working on the underlying tables and relationships, I dived into the old climbing DB; now I'm not sure if I can handle this. My first thought was to export the table data to excel so I could clean it up a bit, but I don't know how well that will work. To give you an example, there is simply a "name" field and some of it is "First Last", some of it is "Last, First". There are ~150k records in one table, about 85k in another, so it is pretty large (at least by my standards).

Attached is an image of the current relationships screen. I will post one of my effort for a new DB later. I feel like I'm not giving you all much to go on, but I will do my best to fill in more information.
 

Attachments

  • Climbing DB relationships.jpg
    Climbing DB relationships.jpg
    60.5 KB · Views: 360

CJ_London

Super Moderator
Staff member
Local time
Today, 20:49
Joined
Feb 19, 2013
Messages
16,553
Think you need to explain how the business works - along the lines of

a climber climbs many routes and needs a permit for each route, permits are issued to the climber and can cover many routes (or permits are by route and issued to many climbers)

etc

I would concentrate initially of getting the db structure correct first and then worry about how to clean the data for the new structure.

You should also google/bing 'normalisation' which will help you to design your db. Simplistically, you should only store data. It looks like for your current structure, if a climber has 10 permits, you have to enter their name 10 times
 

MarkK

bit cruncher
Local time
Today, 13:49
Joined
Mar 17, 2004
Messages
8,178
What I commonly do if the data is a mess is make new tables, and then write code to step through old tables, and move the data into the new ones. The problem is that 1 to many relationships in the old tables will rely on unique IDs that will not be the same in the new tables, so sometimes you need to create a parent row, grab the new ID, then create a series of child rows and push that new parent ID into the child rows Foreign Key link to the parent.

For this problem it is common, in my experience, to write a pile of code that opens a pile of recordsets that you never use again.

But if that image is the design of your new system, I think there are still problems. The way you have it, the "ClimberName" is always a child of the permit. In this case if you have repeat climbers, you HAVE to add a new row to ClimberName, but the entity that you are naming is not different. I would expect to see a table called "Climber"--or even a table called "Person" with a yes/no field called "IsClimber." Then add a table between Climber and Permit, maybe called ClimberPermit, and then wouldn't the route table be a child of the ClimberPermit? Or are there a series of fixed routes too? Maybe you want a table that stores a list of routes too, or do you create a new route for every permit? Probably there exists a fixed set of routes. That would be a table. Then, if a permit will take climbers to one of those existing routes, that is a new table, which stores the ClimberPermitRoute object. Does that make sense?

But whatever the case, do get your good tables right before you do the one-off job of moving the old data.

hth
Mark
 

jdraw

Super Moderator
Staff member
Local time
Today, 16:49
Joined
Jan 23, 2006
Messages
15,364
Getting a clear description of what the database represents (or what "business" it was intended to support) would be a major first step. I would not recommend changing things until you had a description that your colleagues could vet/adjust and agree upon. Getting your tables and relationships designed (normalized, modeled and tested with some sample data and test scenarios) is a critical part of database.

Here is a link to database planning and design info that may be useful.

Good luck with your project.
 

Mark_

Longboard on the internet
Local time
Today, 13:49
Joined
Sep 12, 2017
Messages
2,111
In addition to what others have posted, what outputs will you be needing? I am guessing there are assorted "Trip tickets" and "Customer lists" that go into this? Likewise company information for the companies providing the guides. The original references a "Climbing Patrol", so what structures you need to support for patrols?

More important, what does the business want out of this that it isn't doing now? This is the part that will require you to force those who will be needing this to participate. Expect them to be as forward, open, and willing to participate as your average cat is when its time to be bathed. Exception being the cat will tend to scratch less.
 

jdraw

Super Moderator
Staff member
Local time
Today, 16:49
Joined
Jan 23, 2006
Messages
15,364
Expect them to be as forward, open, and willing to participate as your average cat is when its time to be bathed. Exception being the cat will tend to scratch less.

Or, in other words (from experience)

As willing and forthcoming as a cornered rat
 

JoeBruce

Registered User.
Local time
Today, 14:49
Joined
Jan 13, 2017
Messages
32
Thanks for the advice so far.

I learned Access last year by designing a database from scratch. I don't claim to be an expert on anything, but I have learned a ton, including the all-important necessity of normalization. Attached is a screeny of the tables I created and their relationships, my first crack at making a new one. (Note there is no data yet in this new DB.)

So no, that first screenshot in my OP is not representative of what I would want for this DB; some of you have even pointed out a major issue with having people entered multiple times. After everything I've learned, this DB and its underlying structure make me cringe.

That's why I'm not sure if I can go about fixing it. The stuff the MarkK suggested in the start of his post is probably the direction I need to head; if you all have any resources that could me help me figure that out, I would appreciate it.

I won't skip jdraw's recommended step - entering basic test data to make sure everything is working and proper (normalized). But I am pretty confident I can manage that phase with time, patience and my current knowledge. I just know that converting all the old data into new relationships is something I know nothing about, hence my true chore.
 

Attachments

  • New Climbing DB relationships.jpg
    New Climbing DB relationships.jpg
    60.2 KB · Views: 256

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 15:49
Joined
Feb 28, 2001
Messages
27,001
That's why I'm not sure if I can go about fixing it.

Which is why SOMETIMES you need to ask the old repair/replace question.

And don't discount this possibility. You could design the new database and then, when you have figured out how the tables map between old and new, you could (temporarily) declare a linked table to the old mess, use queries to extract what you want from old to new, and then later dissolve the linkage. (Obviously, do so AFTER you have sucked all the usable data from the old tables.)

The trick is to identify the dependent data and the independent data. Because when taking the approach I just suggested, you copy independent data first.

You got some suggestions on the subject of database normalization. Let me add one more topic to your reading agenda, and the Search features of this forum will help in finding the articles you might want. Look up how to model your database to match your work flow so that you will avoid the problems of "the tail wagging the dog." If you try to run your business from a bad database, you find yourself suddenly unable to do something you might wish to do because the bloody database can't support it. And THAT is when the repair/replace discussion becomes trivial.
 

plog

Banishment Pending
Local time
Today, 15:49
Joined
May 11, 2011
Messages
11,611
So here's the issues I see in your latest screen shot:

1. Circular relationships. There should only be one way to travel from one table to another in your relationships, you have created a loop among 4 of your tables--this is incorrect. You need to rethink how tblClimbRecords, tblGuideCompanies, tblClimbers and tblGuides relate. They shouldn't be connect in a loop.

2. Storing calculable data. You are explicitly tracking each climber that goes on a climb, there should be no need to record the number of climbers who went on a climb (tblClimbRecords.NumberClimbers). Instead, you simply count them when you need to know how many.

The same thing can be done for tblClimbRecords.GuidedClimb. You are storing the name of the Guide for a climb--so logically if there is a name attached its a guided climb, if not it is not a guided climb. No need to make a whole field to store that information, instead you deduce it from the other fields you have.

3. Tables with only 1 real field of data should not exist (autonumbers are not real data). I see 3 unnecessary tables that only have an ID and another field. Instead of storing the ID value in the foreign table, just store the value that ID relates to in the foreign table. For example, tblRouteNames.ClimbTypeID_FK should not store the ID of tblClimbType, but simply the ClimbType it relates to.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 20:49
Joined
Feb 19, 2013
Messages
16,553
There is more to resolve but in respect of circular references I suspect the issue is with climbers and guides tables where a guide is also a climber. If so, then if you combine the two tables (i.e. add a guide companyID and current guide fields to climbers) then delete the guide table and link the climber table to the guide companies table (without referential integrity set) it will solve your problem.

You can use the population of the guidecompany field to indicate the type of climber - just a climber if blank and is a guide if not.

Although the guidecompany table only has the one field at the moment, so perhaps is not required, you may at some future point decide to populate it with additional information such as address, contact details etc. Howevr I agree that the climb rating and type tables are not really required if it is a finite list. Just use a value list in the combo control to list the types.
 

jdraw

Super Moderator
Staff member
Local time
Today, 16:49
Joined
Jan 23, 2006
Messages
15,364
Joe,
Before adjusting existing relationships and/or tables, take some time and write a description of the "business". Start at the 30,000 ft level and gradually add detail. Identify the business rules/facts -these are what the relationships are based on.
If the existing database is a "mess" as you say, then what is the cause of the "mess"? It no longer represents the business? It is convoluted/un-normalized and difficult to get meaningful output? It doesn't represent the business anymore?

Much easier to do this with pencil and paper than to adjust a physical database.

Good luck.
 

Mark_

Longboard on the internet
Local time
Today, 13:49
Joined
Sep 12, 2017
Messages
2,111
To add to what others have mentioned regarding "Climb Type" / "Climb Rating", when you do use a lookup in a database there are two ways you do so; by reference or by value.

By reference is when you wish to attach one record to another AND have changes reflected in records you have not edited. An example would be if you link a company to a guide. If you change the contact information for the company, if you have the link by reference you would see the contact information "auto-magically" updated for the guide as you link by a reference.

By value is simply looking up a value out of a list. If the value in the list changes, you don't go back and change records. In the case of "Type" and "Rating" both can be looked up out of a list and, as I gather, if the list needs to be changed you DON'T go back and change history.

Easiest way to do this is to have ONE table used for by value lookups. The table normally has two fields, Sort and value. In your case you would have one of two values in "Sort", "Type" or "Rating". You could then use the same list to hold values for both and set up one query for each that you will use. This allows you to add other by value lookups fairly easily and manage their values in a single table.

Please note, this also allows you to add extra optional fields for by value lookups, such as descriptions of the value, an "Order" numeric field so you can order the lookup to your taste, secondary sorts for when you are using cascading combo boxes, and other more elaborate ways of looking up values to keep consistency in data you need consistent.

I do notice something is missing in this version from the current; Patrol. Is that used to indicate if this climb IS a patrol or if there is a patrol available to the climbers? Would that need its own table(s)? Or is it vestigial and being discarded?
 

Tieval

Still Clueless
Local time
Today, 20:49
Joined
Jun 26, 2015
Messages
475
Before adjusting existing relationships and/or tables, take some time and write a description of the "business". Start at the 30,000 ft level....
So you are suggesting a climber starts at the top:D:D:D
 

JoeBruce

Registered User.
Local time
Today, 14:49
Joined
Jan 13, 2017
Messages
32
Hey folks; I am resurrecting this thread because I have been working on this problem again. I've attached a screenshot of my latest relationships.

My current concern is the relationship between tblPermits and tblClimbers. Since there can be multiple climbers on a permit, I used a trick that worked for me in another database: creating a third table (tblClimberRecords) that stores foreign keys and links the other two tables. Is there a cleaner (better) way to do this?

One of the last points of discussion was about tables with only one field (plus primary key); I eliminated those and created a tblValues as was recommended, and that seems to work well. So thanks for that advice.
 

Attachments

  • Climb DB Relationships.jpg
    Climb DB Relationships.jpg
    85.9 KB · Views: 222

Mark_

Longboard on the internet
Local time
Today, 13:49
Joined
Sep 12, 2017
Messages
2,111
One question,
Why are guides a child record of climbers? Wouldn't one (or more) guides be on a permit? As is it looks like each climber has multiple personal guides.

If a climber can actually be a guide, you may want to put in a linking table between climbers who are guides to what ever you need to link them to.
 

JoeBruce

Registered User.
Local time
Today, 14:49
Joined
Jan 13, 2017
Messages
32
@Mark - Not all climbs that happen here are guided. Most climbs do not have a guide, some do. And several of the guides also climb recreationally (i.e. they aren't working/guiding).

Hopefully that helps clarify the situation. The park needs records of the guides and who they work for. And they want to track which climbs are led by a commercial company (guide). Any guide would be a climber. Maybe the error here is a one-to-many relationship; in reality it would be a one-to-one. (I think; a guide = a climber, but a climber <> a guide; so maybe join the tables on the two PKs?) That side of the relationship layout may still need some work. Well, the whole thing may still need work!

Any thoughts or advice on my recent question? Is there a better way for tblPermits to store multiple climbers per record?

@jdraw - I understand your advice, but the context of my scenario is a bit different. I will be coordinating with the folks who use the climbing DB, but we're all front-line employees at a unit of a large public agency. I'm just the guy who knows Access and we're all really annoyed with the current DB we have here. I doubt this new one will actually get used, this is more of a personal challenge to see if I can develop something better.
 

Mark_

Longboard on the internet
Local time
Today, 13:49
Joined
Sep 12, 2017
Messages
2,111
JoeBruce,

The linking record IS the best way to attach climbers to a climb.

Do your guides ever work for more than one company?
 

jdraw

Super Moderator
Staff member
Local time
Today, 16:49
Joined
Jan 23, 2006
Messages
15,364
JoeBruce,
What Mark is leading you through are the questions to determine the rules of your business. They are key elements in determining the relationships for tables in your database. This is part of analysis and the "stump the model" exercise is to verify those rules and the proposed structure/blueprint for your database.
Good luck with your project.
 

Users who are viewing this thread

Top Bottom