View Full Version : Database with 20k-40k items per day.


Marty J
03-14-2011, 03:45 AM
Hello,

I am only new to Access & need to create a database that will record 20k-40k transactions per day.
I am wondering how i should design it. Would keeping all the transactions in the one database eventually reach maximum capacity? I have tried creating a database with 6 month's worth of transactions but it became very slow when running queries searching for a single transaction.
My queries would range from searching for 1 transaction to creating monthly tables. Should i create monthly databases or tables and create relationships with them. If so, would a primary key be necessary?

Thankyou

GalaxiomAtHome
03-14-2011, 04:44 AM
Access has a limit of 2GB for the database. I know from my own experience it supports a table with over nine million records by 15 fields quite happliy. If you need more you can use MSSQL Server 2008 Express (supports 10GB) or MySQL. (Both free)

Always limit the queries with date criteria as well as your main search.

Index any fields you search or join. It makes a huge difference.

Normalize any repetitive text values by representing them as integers or single alpha characters via a lookup table. Ensure search queries on joined tables apply the Where clause to the lookup table.

Primary Keys are not essential unless you have other tables that are related to unique records or need to uniquiely identify otherwise identical records.

gemma-the-husky
03-14-2011, 04:54 AM
That's getting on for 1 million items a month

At some point searching and managing that many items is likely to become slow -depending on what you want to do.

I assume this is a corporate database, and important to your company. Personally, I would think you probably need some professional assistance at an early stage, to get the best out of this.

Marty J
03-14-2011, 06:04 AM
Yes it is for a corporate database. Currently, each days work is stored in 7 seperate Excel spreadsheets. It is my Green Belt Six Sigma Project to manage all that data more efficiently.
I think the only database software i have available to use is Access since there are restriction on me placing software on our machines.
When searching for a particular transaction, i know roughly the month it was made in, so would it be best just to create a table for each month?

gemma-the-husky
03-14-2011, 06:45 AM
no - you would have a basic single table holding all your data. more tables might be necessary to analyse the data correctly, but not at all because the data extends over several periods. Do not separate data based on date - you will just extract a subset of the data based on the data range you require.

However the underlying problem is the quantity of data - 20-40K items per day = 1Million items per month = 10 million items per year.

This is a very large database.

eg - Data of that quantity doesn't lend itself to full inspection - so you have to process it by exception, and in summaries - so it may need a different way of thinking about the data.

Galaxiom
03-14-2011, 02:48 PM
I agree with Dave. This is a big project for a relative beginner to tackle on their own.

First thing is to optimise the data model. What kind of information do you have to store?

Working with that much data really does need the backend on a database server.

One thng for sure though, anything you do is going to work better than the seven spreadsheets.;)

Galaxiom
03-14-2011, 03:18 PM
The size of the table itself does not have to make it slow.

My big table has 9.35 million records. I just queried it by date and transaction amount, (both indexed fields) and the result was instant. Indexes are crucial.

It really depends what you are querying. You certainly would not want to be searching for a name in a text field.

Marty J
03-14-2011, 09:15 PM
Thankyou.. the index part works great
I will only be searching for either number or dates.
Still not sure about the Primary Key if i should have one.
The data i will import is the transaction date of deposit, deposit id number, deposit amount and deposit break-up. The deposits that don't balance will generate a duplicate record of the deposit id and with error id letter. Therefore my main record which is the deposit id is not always unique. Would i then need to create a primary code?

Galaxiom
03-14-2011, 09:47 PM
Doesn't sound like you have any other tables related to the records or have any need to update or delete them. No real need for a key per se.

However there are other considerations. A composite index (No Duplicates) on DepositID and ErrorID would prevent the possibility of duplicate records inadvertently being added to the table. Such an index is virtually equivalent to a composite key.

Presumably the ErrorID would be Null if the record is normal so the default of this field would need to be set as the NullString because the Null cannot be be one of the key values. Just type two doublequotemarks in the Default property.

gemma-the-husky
03-15-2011, 02:14 AM
Thankyou.. the index part works great
I will only be searching for either number or dates.
Still not sure about the Primary Key if i should have one.
The data i will import is the transaction date of deposit, deposit id number, deposit amount and deposit break-up. The deposits that don't balance will generate a duplicate record of the deposit id and with error id letter. Therefore my main record which is the deposit id is not always unique. Would i then need to create a primary code?


What has been said is that "searching" is a problem.

If you have an index on the search field (eg, the transaction number, or the date) then the system will find the transaction virtually "instantly", based on a unique value.

Searching for a partial match is not so simple. If the index doesn't help, then the database has to search every item for a match(es) - which is the point Galaxiom was making - you don;t want this with millions of items.

Consider the date. Given you have at least 20K items every day, then the date on its own isn't going to help much at all - you will have 20K items for that date, and you still need to find the right item within those.

Which is why this is a very difficult database for a beginner to start on, and in a corporate situation it's something where you I would advise you to spend some money getting some professional help to set this up properly. This isn't wasted money - I am sure you will spend far more in your own time floundering with this project.

Take the initial import. You no doubt need a mechanism to find the import file, validate it, check that the data is consistent, report any errors, and then integrate it with the existing data. This is straightforward for someone experienced, but not trivial by any means. It is much easier and more friendly to navigate interactively to the file you want to import, than manually type in the folder and file name.

You also need to get the structure correct, as I have no doubt that you need more than 1 table in your database.

Finally, A primary key is just a normal index, designated as a PK. In access you MUST have a PK to be able to do certain other things.