Need advice on DB design.

Just to make sure, your project is to do reporting on large data sets. Does this also involve data mining or is this for simple analysis or tracking?

More to the point, will you be required to pull something as exotic as "How often is product X sold in colour Y on a Wednesday afternoon"? How often are reports pulled? What is the greatest time period you will be running reports for?

Also can you get away with simply having one "Per month" record instead of needing an actual date/time?

It is possible that your record could very well be smaller if there is data you don't need to store. Depending on need, you may be able to get away with

RepID: Long FK (4 bytes)
ProdID: Long FK (4 bytes)
Month: Byte (1 bytes)
Count: Byte (1 bytes)

If you know you will never see more than 255 of a given item sold by one rep in one month. Much of this will be driven by what outputs you will be required to support though.
 
Mark's comments are on-point. In order to know how much to store, you must first determine how much you intend to ask - and then more or less work backwards - to see what it will take to support that intent. The more detail you intent to request, the more detail you will need to store.
 
SPECIAL NOTE ABOUT COUNT.

In my example I'm using a byte with the assumption most salespeople will not sell more than 255 of an item in one month. If one does, you add an extra record (or records) for the same person/month/product. This is not a LIMIT, but instead a space saving method. It does mean that every single report will need to total all transactions for one person and show the total, just in case there is one (or more) high performance salespeople.

Also, as it can be very important, are you tracking ONLY by salesperson or do you also (as is oft the case) reporting primarily by location? If by location, there is an entire extra layer that gets added where you verify who is working where when, and accumulate those to a "Location" accumulator.

In a normal database that is being used regularly accumulators are highly disliked as they take up a fair amount of resources and duplicate data. In your case, with large static recordsets, doing the work once but being able to access it multiple times may be the difference between an hour to run a report and few seconds.

In addition to what you need to output, please also let us know if there are time constraints or other special needs not normally found in most databases.
 
To Doc,


Attributes always 12. Number of groups and subgroups can change. Regarding using Access, we're trying to see if it's a fit before jumping to something like Azure SQL Server.



To Mark,
Will you be required to pull something as exotic as "How often is product X sold in colour Y on a Wednesday afternoon"? How often are reports pulled? What is the greatest time period you will be running reports for?


Nothing exotic. Who's selling the most of group x and in what subgroup and attribute? Which group was sold the most and in what subgroup and attribute? Recent Month? Past 12 months? I believe some general queries will be run at first, creating some result tables which will be used to generate other reports.


I think my only shot with Access is to create a table for each group in its own database. Fields would be something like:
RepID: Long FK (4 bytes)
ProdID: text FK (4 bytes) {g(1-20),s(a-z),a(1-12) e.g., 20z12} mapped to group/code table
MMYYY: Byte (1 bytes)
Quantity Sold: Byte (1 bytes)


Use linked table manager to sum across groups.


Thoughts?


Thanks.
 
For a moment, lets looks at the sales person and Product tables.

What information do you have/need per salesperson?
What information do you have/need per product?

If you can store location information on a per-sales person basis, that will mean you won't have to address it in your main table. I am bringing up location as this often drives these kinds of reports.

From what I can tell your products will have
ProductID (AutoNumber)
ProductName (String)
ProductGroup (Long OR String)
ProductSubGroup (Long OR String)
ProductAttribute (Long OR string)

For the group, subgroup, and attribute, you will either want to store the actual value that goes on reports and such OR a reference value if it is looked up from another table.

For your main table, you would want to reference this as a number.
If you are limited to a couple thousand possible "products" you may want to use a WORD (2 bytes) instead of a LONG (4 bytes) as your ProductID. That could save 20% in space per record. This is based on the assumption that your products do not change, or do not change often.

Of additional importance is what is the longest period you will need to report on for a by/month report? Will you have to show the monthly results from 15 years ago? Will you be doing trend analysis based on same month but different years?
 
What information do you have/need per salesperson?
What information do you have/need per product?
Each salesperson and product will have unique identifiers along with other descriptive information that are contained in other tables.


ProductID (AutoNumber)
This will be text field, several characters.


Products will start at 6 and may go to a dozen or so but not much more.


longest period
Probably a rolling13 months. Archiving in a different database at 14th month and later will be a maintenance challenge but I hope doable.
 
As soon as you say "with its own database" you are opening up Pandora's box, which contains a can of worms. Multiple databases mean you CANNOT create a formal relationship between / among the files in question. You are struggling to avoid the obvious. To do any kind of analysis on a data set this size, you're gonna need a bigger boat.

I understand that people want to avoid the cost of going to something more formal and more expensive than Access, but part of wisdom is learning to tell when the problem is just too complex for a development system designed for lower-end applications. And what you have, volume-wise, is not low-end.
 
I'm going to give Access a try. I've managed to reduce the amount of data but still need to split into multiple databases. I will be asking for help under the 'Queries' forum. Thanks again to all who contributed.
 
Last edited:

Users who are viewing this thread

Back
Top Bottom