Designing Tables for Ease of CSV Import Updates (1 Viewer)

DimWit

New member
Local time
Today, 10:44
Joined
Dec 4, 2025
Messages
13
(Posted here b/c the referenced post was here--feel free to suggest a different forum)
In the planning stage of a database.

Background
A non-profit wants track member progression through training courses hosted by a third-party Learning Management System. The NP has arbitrarily split courses into levels. (Level content is subject to change so a LvlComplete table records then-current satisfactions.)

Access Questions
The only reports LMS provides are in CSV ‘flat file’ format.

I’ve studied site posts on importing CSV files and the The_Doc_Man post on doing updates.
(see updating-existing-table-in-access-with-csv-file.284630/post-1475289)

Specific questions:
(see attachment for field/table info)
-- I only need 8 of 26 fields contained in the export. Do I need to strip the unneeded fields before import (e.g., manually massaging in Excel, or via VBA) or can I ignore the excess fields during updates?

-- Do I need to maintain the ‘flat file’ table format or can I normalize structure? If the latter, how to carry out Doc’s scheme—update against a query that includes all the fields from disparate tables or do multiple updates, once for each new table?

FWIW, I am quite confident that there will be single (i.e., no) concurrent user and the data table will never exceed 30K records (2.2K records after 2 years--hence the request for a management tool to encourage progression) if that figures into the inefficiency-vs-update simplicitycalculation.

TIA
 

Attachments

  • CSV-to-AccDB (FBSN).jpg
    CSV-to-AccDB (FBSN).jpg
    94.2 KB · Views: 19
For myself, I would create a small app that "Reads" the CSV in through a query. This will allow you to review the data and resolve any issues you have. Then create an append query that reads your properly formatted query to actually add to your table.

This allows you to also create other queries to update any other tables you would need, When you have redundant data your read query can use distinct to give you just the unique data you want.

For specifics, your distinct query can be used to return any and all "User" fields using distinct to just grab the User UniqueID, UserFirst Name, and User Last Name to match to your user data.
You can then grab the fields that you need to update your "completedData".

From your table layout, I'd personally break out "CompleteData to be userID, date, and courselevel as separate records, NOT "lvl1, lvl2, and lvl3. This will save you headaches when they decide to add lvl 4, lvl5, and "Gold".
 
First, do you need to use Access? A lot can be accomplished with Pivot Tables in Excel.

Why structure a database, build APPEND/INSERT queries to populate your tables, build SELECT queries to feed reports and then build reports if you can build a much simpler process in Excel? What do your reports need to look like?

Then for your questions:

Do I need to strip the unneeded fields before import (e.g., manually massaging in Excel, or via VBA) or can I ignore the excess fields during updates?

Ignore excess fields during appends and updates.

Do I need to maintain the ‘flat file’ table format or can I normalize structure? If the latter, how to carry out Doc’s scheme—update against a query that includes all the fields from disparate tables or do multiple updates, once for each new table?

Not only can you normalize, you should. tblLvlCompleteData is not normalized. When you start numerating fields (Lvl1, Lvl2, etc.) it's time for a new table. Can you provide sample data for that table? My guess is you might have too many fields and only need that one table. Actually sample data for all your tables in your database would help us get you where you need to go.
 
Is the CSV an export of cumulative data or only new data since previous export?

Import to Access can limit which fields you want, likely no need for Excel intermediary. Or just set a link to CSV file that pulls all fields then build queries to use what you really want.

Updating non-normalized import to normalized tables would likely involve multiple SQL actions. Gets complicated when there are related tables as "parent" record must be committed before "child" records for new data.

For such a small db, could just use a UNION query on non-normalized imported data to arrange as normalized for reports. But Mark makes a good point about changes to structure - if more lvl# fields are added then tables, queries, reports, code would have to be modified. Also, file might grow more than anticipated.

"Normalize until it hurts, denormalize until it works."


Side note: your image shows ID as PK in tblLMSUserData but link is on UserID. This link should include designated PK. Either change UserID to PK or save ID as FK in tblLvlCompleteData.
 
Last edited:
Darn, you made me look up a 9-year-old post so I could remember what I advised. My memory is still pretty good for 77 years old, but it is FAR from perfect. ;)

Start with what you have - a .CSV file that contains information that you want to merge in some specified way with one or more extant (and therefore fully designed) tables.

The first question is how big is one (typical) .CSV file?

If we are talking less than 50-100 rows, I might take a VBA approach and open the .CSV file as an Excel Application Object. Then in a row-by-row loop, I might extract data from the columns of the current row to build an INSERT INTO query and use db.Execute of the constructed query.

However, if the .CSV file is a CUMULATIVE file and you have seriously high odds of having duplicate data in a larger file, that VBA loop is not a good choice. VBA would be too slow for this alternative case.

If there is a chance that the cumulative file might include data updates to previously entered records, such that you can't ignore the old records entirely, that is a third and very different scenario. It is the worst-case scenario in that it would required both INSERT and UPDATE operations, determined by the content of the individual rows. So clarification is in order.

The next question is whether any of the .CSV rows contain data that has to be normalized - like, say, information about parts of a class, where one row of the .CSV is the class and certain columns represent individual parts of the class. That requires special attention, too.

In general, you don't want to over-process the .CSV file by importing the whole thing and then run direct queries. The more you "touch" this kind of file, the greater the chance of processing errors or difficulties. You have a mechanical source for the files, so that improves the odds on it being "clean." But I did say "IN GENERAL, don't import the raw file first." Since you would have to later erase or delete the import table, you would generate "bloat" which adds to your long-term operational overhead. There are ways to avoid that bloat, too, but I'd rather avoid that discussion until we know it is needed.

We need to know about what you would do with the dataset if you were doing this on paper. There might be cases where a full pre-import is correct for the problem you face, and from what I read, I'm not sure of which case we are looking at.
 
The attached little demo file illustrates how to import data from Excel and then decompose it into a set of normalized related tables. It would equally apply to importing data from a .CSV file. The zip archive includes two versions, one in which the each cell in the Excel file contains single values, the other in which the cells in one column contain value lists.

The demo firstly imports the data into a single table, MasterTable. This is then used as the basis for the decomposition, which is undertaken by executing a set of 'append' queries in a specific order. A brief explanation of each stage is given as you step through the demo. The rule of thumb is that data is first inserted into the referenced table in each relationship before inserting data into the referencing tables. The data is then presented in the conventional user interface of a form/subform.

Once the data from the MasterTable has been decomposed, that table can be emptied. It can then be used for further imports. By virtue of the indexes in the operational tables, any duplication of data from further imports will be suppressed.
 

Attachments

Users who are viewing this thread

  • Back
    Top Bottom