Designing Tables for Ease of CSV Import Updates (2 Viewers)

DimWit

New member
Local time
Today, 02:34
Joined
Dec 4, 2025
Messages
22
(Posted here b/c the referenced post was here--feel free to suggest a different forum)
In the planning stage of a database.

Background
A non-profit wants track member progression through training courses hosted by a third-party Learning Management System. The NP has arbitrarily split courses into levels. (Level content is subject to change so a LvlComplete table records then-current satisfactions.)

Access Questions
The only reports LMS provides are in CSV ‘flat file’ format.

I’ve studied site posts on importing CSV files and the The_Doc_Man post on doing updates.
(see updating-existing-table-in-access-with-csv-file.284630/post-1475289)

Specific questions:
(see attachment for field/table info)
-- I only need 8 of 26 fields contained in the export. Do I need to strip the unneeded fields before import (e.g., manually massaging in Excel, or via VBA) or can I ignore the excess fields during updates?

-- Do I need to maintain the ‘flat file’ table format or can I normalize structure? If the latter, how to carry out Doc’s scheme—update against a query that includes all the fields from disparate tables or do multiple updates, once for each new table?

FWIW, I am quite confident that there will be single (i.e., no) concurrent user and the data table will never exceed 30K records (2.2K records after 2 years--hence the request for a management tool to encourage progression) if that figures into the inefficiency-vs-update simplicitycalculation.

TIA
 

Attachments

  • CSV-to-AccDB (FBSN).jpg
    CSV-to-AccDB (FBSN).jpg
    94.2 KB · Views: 22
For myself, I would create a small app that "Reads" the CSV in through a query. This will allow you to review the data and resolve any issues you have. Then create an append query that reads your properly formatted query to actually add to your table.

This allows you to also create other queries to update any other tables you would need, When you have redundant data your read query can use distinct to give you just the unique data you want.

For specifics, your distinct query can be used to return any and all "User" fields using distinct to just grab the User UniqueID, UserFirst Name, and User Last Name to match to your user data.
You can then grab the fields that you need to update your "completedData".

From your table layout, I'd personally break out "CompleteData to be userID, date, and courselevel as separate records, NOT "lvl1, lvl2, and lvl3. This will save you headaches when they decide to add lvl 4, lvl5, and "Gold".
 
First, do you need to use Access? A lot can be accomplished with Pivot Tables in Excel.

Why structure a database, build APPEND/INSERT queries to populate your tables, build SELECT queries to feed reports and then build reports if you can build a much simpler process in Excel? What do your reports need to look like?

Then for your questions:

Do I need to strip the unneeded fields before import (e.g., manually massaging in Excel, or via VBA) or can I ignore the excess fields during updates?

Ignore excess fields during appends and updates.

Do I need to maintain the ‘flat file’ table format or can I normalize structure? If the latter, how to carry out Doc’s scheme—update against a query that includes all the fields from disparate tables or do multiple updates, once for each new table?

Not only can you normalize, you should. tblLvlCompleteData is not normalized. When you start numerating fields (Lvl1, Lvl2, etc.) it's time for a new table. Can you provide sample data for that table? My guess is you might have too many fields and only need that one table. Actually sample data for all your tables in your database would help us get you where you need to go.
 
Is the CSV an export of cumulative data or only new data since previous export?

Import to Access can limit which fields you want, likely no need for Excel intermediary. Or just set a link to CSV file that pulls all fields then build queries to use what you really want.

Updating non-normalized import to normalized tables would likely involve multiple SQL actions. Gets complicated when there are related tables as "parent" record must be committed before "child" records for new data.

For such a small db, could just use a UNION query on non-normalized imported data to arrange as normalized for reports. But Mark makes a good point about changes to structure - if more lvl# fields are added then tables, queries, reports, code would have to be modified. Also, file might grow more than anticipated.

"Normalize until it hurts, denormalize until it works."


Side note: your image shows ID as PK in tblLMSUserData but link is on UserID. This link should include designated PK. Either change UserID to PK or save ID as FK in tblLvlCompleteData.
 
Last edited:
Darn, you made me look up a 9-year-old post so I could remember what I advised. My memory is still pretty good for 77 years old, but it is FAR from perfect. ;)

Start with what you have - a .CSV file that contains information that you want to merge in some specified way with one or more extant (and therefore fully designed) tables.

The first question is how big is one (typical) .CSV file?

If we are talking less than 50-100 rows, I might take a VBA approach and open the .CSV file as an Excel Application Object. Then in a row-by-row loop, I might extract data from the columns of the current row to build an INSERT INTO query and use db.Execute of the constructed query.

However, if the .CSV file is a CUMULATIVE file and you have seriously high odds of having duplicate data in a larger file, that VBA loop is not a good choice. VBA would be too slow for this alternative case.

If there is a chance that the cumulative file might include data updates to previously entered records, such that you can't ignore the old records entirely, that is a third and very different scenario. It is the worst-case scenario in that it would required both INSERT and UPDATE operations, determined by the content of the individual rows. So clarification is in order.

The next question is whether any of the .CSV rows contain data that has to be normalized - like, say, information about parts of a class, where one row of the .CSV is the class and certain columns represent individual parts of the class. That requires special attention, too.

In general, you don't want to over-process the .CSV file by importing the whole thing and then run direct queries. The more you "touch" this kind of file, the greater the chance of processing errors or difficulties. You have a mechanical source for the files, so that improves the odds on it being "clean." But I did say "IN GENERAL, don't import the raw file first." Since you would have to later erase or delete the import table, you would generate "bloat" which adds to your long-term operational overhead. There are ways to avoid that bloat, too, but I'd rather avoid that discussion until we know it is needed.

We need to know about what you would do with the dataset if you were doing this on paper. There might be cases where a full pre-import is correct for the problem you face, and from what I read, I'm not sure of which case we are looking at.
 
The attached little demo file illustrates how to import data from Excel and then decompose it into a set of normalized related tables. It would equally apply to importing data from a .CSV file. The zip archive includes two versions, one in which the each cell in the Excel file contains single values, the other in which the cells in one column contain value lists.

The demo firstly imports the data into a single table, MasterTable. This is then used as the basis for the decomposition, which is undertaken by executing a set of 'append' queries in a specific order. A brief explanation of each stage is given as you step through the demo. The rule of thumb is that data is first inserted into the referenced table in each relationship before inserting data into the referencing tables. The data is then presented in the conventional user interface of a form/subform.

Once the data from the MasterTable has been decomposed, that table can be emptied. It can then be used for further imports. By virtue of the indexes in the operational tables, any duplication of data from further imports will be suppressed.
 

Attachments

For myself, I would create a small app that "Reads" the CSV in through a query. This will allow you to review the data and resolve any issues you have. Then create an append query that reads your properly formatted query to actually add to your table.
...
From your table layout, I'd personally break out "CompleteData to be userID, date, and courselevel as separate records, NOT "lvl1, lvl2, and lvl3. This will save you headaches when they decide to add lvl 4, lvl5, and "Gold".
Thanks Mark. I'll get back with a few questions if I go this way. Great point about level-creep (don't forget silver, platinum, titanium, lanthanum, cerium, praseodymium, etc. 😁)
 
Thanks plog
First, do you need to use Access? A lot can be accomplished with Pivot Tables in Excel.
I started in Excel using vlookup and got inconsistent results appending levels. Plus Excel (what's the technical term? oh, yes) 'sucks' at reports. And, sigh, I understand Access better than Excel [that is a relative comparison].
Why structure a database, build APPEND/INSERT queries to populate your tables, build SELECT queries to feed reports and then build reports if you can build a much simpler process in Excel? What do your reports need to look like?
My goal is to deliver a user-maintainable solution that allows them to import updates, make changes to what and how their curriculum is divided (courses/levels), track individual progress, and run stock reports without continuous hand-holding--and to minimize the chance for unintended disaster if someone mucks about (envisioning running from a console with tables hidden).
 
Is the CSV an export of cumulative data or only new data since previous export?
I'd prefer it be cumulative.
Data retrieval was demonstrated to me and the person responsible for retrieving information (a 3rd year law student [lawyer jokes aside, is both intelligent and smart]) had so thoroughly bollixed up the four date selection criteria that it took minutes to gentle coaching to get a complete record set. So I'm concerned that any instruction beyond "before running the export, make sure all date selection boxes are empty" will yield missed records and degraded confidence. As demonstrated and from what I could observe, there is no 'flag' that could be used as a starting point for the next export.
Side note: your image shows ID as PK in tblLMSUserData but link is on UserID. This link should include designated PK. Either change UserID to PK or save ID as FK in tblLvlCompleteData.
Yup, I was rearranging existing fields/tables to produce a concept (that was poorly executed)
Thanks much
 
Okay, cumulative - could probably just set a link to the CSV and then work with that linked table. Give the CSV file a generic name (no date part) then every time you do cumulative export from LMS overwrite the file so name is unchanged and link is stable. Use the linked table as source for normalization. Normalize into other tables or just use a UNION query. For such a small db, not sure it is worth effort of re-structuring and coding for normalized tables. Doesn't matter if there are 30K or 30M records, coding effort for normalized tables is basically same. What is your coding expertise level? The more user-friendly, the more code. Title of your thread includes "ease of CSV import" - easiest is likely just setting a link and building queries from source structure. You decide where to dedicate your time and effort.

Excel reads csv file as if it were xlsx. An xlsx might be better for linking in Access. However, I have linked to both without issue.
 
Last edited:
I'd prefer it be cumulative.

The key to that is correct indexing of the columns in the Access tables, along with correctly structured INSERT INTO statements. In my demo for instance the Country column in the Countries table is indexed uniquely, whereas the City table in the Cities table is not, because city names can legitimately be duplicated. Consequently a row with an existing Country name won't be inserted into the Countries table. A row with an existing city name would result in duplication if a simple INSERT INTO statement were executed as there would be no index violation. To avoid this the City and RegionID columns are included in a single unique index, and the INSERT INTO statement is:

SQL:
INSERT INTO
    Cities (City, RegionID)
SELECT DISTINCT
    MasterTable.City,
    Regions.RegionID
FROM
    Countries
    INNER JOIN (
        MasterTable
        INNER JOIN Regions ON MasterTable.Region = Regions.Region
    ) ON (Countries.CountryID = Regions.CountryID)
    AND (MasterTable.Country = Countries.Country);

So a new row would only be inserted into Cities if the city is in a region and country for which no city of the same name yet exists, e.g a new city Paris would only be inserted if it were Paris, Texas, USA for instance, where the only existing row in the Cities table for Paris is for Paris, Île-de-France, France. The same principle is applied in the case of insertions into all other tables. Consequently no additional rows would be inserted into any tables if the row from the source file had previously been imported.
 
Last edited:
Darn, you made me look up a 9-year-old post so I could remember what I advised. My memory is still pretty good for 77 years old, but it is FAR from perfect. ;)
In deference to your advanced age (;)) I wanted to make it easy and tried include the full link but when I went to post, it barked about spam. [I hope in 2 years I have a fraction of your memory and recall.]
The first question is how big is one (typical) .CSV file?
The (up-to-date, complete) export at the end of November was 2.8K rows. But the request to me is to help to better monitor and understand individual progress through a curriculum—and if that’s successful, the rate of growth in accelerate.
However, if the .CSV file is a CUMULATIVE file and you have seriously high odds of having duplicate data in a larger file, that VBA loop is not a good choice. VBA would be too slow for this alternative case.
Door number 2 is the correct choice; a cumulative file is my preference (for the reasons expressed in my June7 reply).
The next question is whether any of the .CSV rows contain data that has to be normalized - like, say, information about parts of a class, where one row of the .CSV is the class and certain columns represent individual parts of the class. That requires special attention, too.
I mean this in the most congenial and respectful way, but sometimes I read your responses and it’s like Charlie Brown’s teacher—all I initially hear is “wa-wa-wa-wa”—BUT I’ve learned that persistence in decoding your wisdom is a virtue with great rewards! We are people separated by a common language…

If I’m correctly understanding, one row should be a complete ‘snapshot’ of one attempt [outcome could be a failure and the course would have to be re-taken—generating a new row appearing just like a first-time attempt]. But there are no columns inside a row that are dependent on another in the way your example describes.
We need to know about what you would do with the dataset if you were doing this on paper. There might be cases where a full pre-import is correct for the problem you face, and from what I read, I'm not sure of which case we are looking at.
I’ve never met a word I didn’t appreciate—but which you might not appreciate having to wade through. I’ll work on a more robust problem description and post as a reply asap.
 
Okay, cumulative - could probably just set a link to the CSV and then work with that linked table. Give the CSV file a generic name (no date part) then every time you do cumulative export from LMS overwrite the file so name is unchanged and link is stable. Use the linked table as source for normalization. Normalize into other tables or just use a UNION query. For such a small db, not sure it is worth effort of re-structuring and coding for normalized tables. Doesn't matter if there are 30K or 30M records, coding effort for normalized tables is basically same. What is your coding expertise level? The more user-friendly, the more code. Title of your thread includes "ease of CSV import" - easiest is likely just setting a link and building queries from source structure. You decide where to dedicate your time and effort.

Excel reads csv file as if it were xlsx. An xlsx might be better for linking in Access. However, I have linked to both without issue.
Thanks for the insights.
My initial port to Access was from an .xlsx file--I initially tried to work in Excel but ran into inconsistent results using vlookup.
Coding background: Back in the last millennium I started with VB3 and stepped away from VB6 when it appeared that VB.NET was C#-based and I didn't want to climb that learning curve. Recognizing 20 years intervene since my last DO WHILE, I still grasp concepts while syntax is wanting.
 
OK, let's put better words to the wa-wa-wa.

The thing I was asking about is the content of one row of the .CSV file, whether it was a complete entity unto itself, or whether it carried baggage.

The thing I hoped to avoid is when you have a row that has something like:
<PersonID>, <ClassID>, <DateTime>, <other-info-about-overall-class>, <SCORE-for-part-I>, <SCORE-for-part-II>, <SCORE-for-part-III>

Doing the above layout, those three partial scores don't belong in the main table, they would belong in a child table that represents the results of the individual parts. The part-score stuff would be the unwelcome baggage and would also be that part needing normalization.

The "cumulative file" problem can probably be done in two parts plus some setup, but there is still a decision to consider.

In setup, you have to make the .CSV content available to SQL, so that means importing or mapping a table from that .CSV file. Let's hope that there are no anomalies in the file.

The insertion part can be done by building a JOIN between that cumulative table and the target table, where you insert from the cumulative table to the target where the PersonID, ClassID, and ClassDate are NOT already in the target table. (This allows a re-take by having the class date as a part of the JOIN.) See also NOT IN syntax as a potential element in a WHERE clause. You basically want to insert records that are not already in the target table that you are working on.

The update part becomes trickier. You need rules on whether you allow an update to a score when the person, class, and date match between the record with the old score and the related record in a more recent iteration.
 
(Posted here b/c the referenced post was here--feel free to suggest a different forum)
In the planning stage of a database.

Background
A non-profit wants track member progression through training courses hosted by a third-party Learning Management System. The NP has arbitrarily split courses into levels. (Level content is subject to change so a LvlComplete table records then-current satisfactions.)

Access Questions
The only reports LMS provides are in CSV ‘flat file’ format.

I’ve studied site posts on importing CSV files and the The_Doc_Man post on doing updates.
(see updating-existing-table-in-access-with-csv-file.284630/post-1475289)

Specific questions:
(see attachment for field/table info)
-- I only need 8 of 26 fields contained in the export. Do I need to strip the unneeded fields before import (e.g., manually massaging in Excel, or via VBA) or can I ignore the excess fields during updates?

-- Do I need to maintain the ‘flat file’ table format or can I normalize structure? If the latter, how to carry out Doc’s scheme—update against a query that includes all the fields from disparate tables or do multiple updates, once for each new table?

FWIW, I am quite confident that there will be single (i.e., no) concurrent user and the data table will never exceed 30K records (2.2K records after 2 years--hence the request for a management tool to encourage progression) if that figures into the inefficiency-vs-update simplicitycalculation.

TIA
If I wanted to track who completed which courses and:
  1. Each Level has multiple courses
  2. Each course has multiple Begin and End Dates Terms
  3. Each Term has multiple members attending
  4. Each member attending has hours attending
  5. Each member attending has completed the course or not (Yes/No)
I might start with a design like this:
1766017479733.png

You can add, move or delete individual fields as required and which tables import what data is something for later. Start with a normalized design first, then worry about importing.
 
The_Doc_Man scenario assumes that an existing record for a PersonID/ClassID/ClassDate combination would never be edited by the LMS. If that is the case (which seems reasonable for your situation) then the NOT IN restriction is appropriate. Otherwise, explore what is called an UPSERT action. It is an UPDATE with table join that will both edit existing and add new records https://stackoverflow.com/questions/6199417/upserting-in-microsoft-access
 
Thanks, June7 - had forgotten about the UPSERT since I'm doing more home-oriented stuff now that I'm retired.
 
...that will both edit existing and add new records https : / / stackoverflowDOTcom/ questions/ 6199417/ upserting-in-microsoft-access
There's so much good stuff going on in this thread but I stop for a trivial issue.
I tried to include the entire link to Doc_Man's 6-year-old post at the top of this thread and when I went to post, I got a message saying it looked like spam. I was able to proceed by removing https : / / wwwDOTaccess-programmersDOTcoDOTuk/ forums/ threads/. But you posted a complete link.
Why did it work for you? (For future reference)
Edit Just happened trying to post this until I 'disabled' both links
 

Attachments

  • Screenshot 2025-12-17 221505.jpg
    Screenshot 2025-12-17 221505.jpg
    32.9 KB · Views: 2

Users who are viewing this thread

Back
Top Bottom