Ideas on flagging a row of a table, if there is an expected change to it?

Sampoline · Mar 2, 2021

Hi all. I have a database that retrieves the metadata properties of a file:

Path, Name, Size, Owner, Date Created, Date Last Modified, Date Last Accessed, Type.

Now I'll be getting people to run imports on a directory every morning. The idea of that is to:

1) Import new files into the database
2) Track the changes to the existing columns

Now my issue is I'm not able to figure out how to do point 2. Because I don't want to import files that are already existing in the database. So I added check for if that file exists then don't import. But in order to track changes made to files since last import, I probably need to import them right? So what should I try doing here? I want something like, 'if file exists then skip, except if date last modified has changed'. But I don't know VBA well enough to make it do this for me. Or perhaps this isn't a VBA thing? Not sure.

I've attached a sample of my database. Thanks.

theDBguy · Mar 2, 2021

You might be able to use the FileDateTime() function to compare the modify dates and also check if the file size is different.

arnelgp · Mar 2, 2021

see this modification on your ListFiles() sub.

Pat Hartman · Mar 2, 2021

What defines a new file? A unique file name?
What defines changes to an existing file? A duplicate file name with a later modified date?
What exactly do you want to happen with a modified file? Delete all records and replace with what is in the updated file? Update matching rows and ignore adds/deletes? Seems like the first option makes the most sense.

I would keep a log table of the files that were imported and assign each import a batchID when I import the details. That way, If I want to replace all the rows imported in file 1023, then I can run a delete query that deletes all the rows for 1023. Then I can update the demographics in the log table and import the replacement rows for batch 1023.

Sampoline · Mar 3, 2021

theDBguy said:
You might be able to use the FileDateTime() function to compare the modify dates and also check if the file size is different.

arnelgp said:
see this modification on your ListFiles() sub.

Pat Hartman said:
What defines a new file? A unique file name?
What defines changes to an existing file? A duplicate file name with a later modified date?
What exactly do you want to happen with a modified file? Delete all records and replace with what is in the updated file? Update matching rows and ignore adds/deletes? Seems like the first option makes the most sense.

I would keep a log table of the files that were imported and assign each import a batchID when I import the details. That way, If I want to replace all the rows imported in file 1023, then I can run a delete query that deletes all the rows for 1023. Then I can update the demographics in the log table and import the replacement rows for batch 1023.

@theDBguy so like add all the records and then have a FileDateTime to compare if the last modified date has changed?

@arnelgp
So with:

Code:

If Nz(DLookup("FDateLastModified", mysql, "FPath = '" & MyFile.Path & "'"), #1/1/1890#) < MyFile.DateLastModified Then

Does this add the record into the table only if the date last modified has been changed? If I say add a column, how could I make that a flagging column if a duplicate record is made to compare the data? I guess if I do that I can run a query make table to compare them.

@Pat Hartman yeah that does sound like a good idea.. i'll try that out

Pat Hartman · Mar 3, 2021

In order for us to help you, you need to answer questions. You quoted my comments but didn't answer the questions.

arnelgp · Mar 3, 2021

it doesn't flag any field, it just add the file again if the DateLastMofidifed < than that on the actual file.
you just then compare those "same" records"

Sampoline · Mar 3, 2021

Pat Hartman said:
In order for us to help you, you need to answer questions. You quoted my comments but didn't answer the questions.

My apologies

What defines a new file? A unique file name?
When the targeted folder location has a new file in it. So when I run the sub in my db, it will add a record into the table with the metadata for it. e.g. C:\Users\%USERNAME%\Documents had 8 unique files in it on Monday. Then when I ran the sub again on Tuesday there was 21. So the other 13 unique files would be added to the db.

What defines changes to an existing file? A duplicate file name with a later modified date?
Yeah so if the file already existed in that folder location, but the metadata had changes, for e.g. the Date Last Modified was different to the last import

What exactly do you want to happen with a modified file? Delete all records and replace with what is in the updated file? Update matching rows and ignore adds/deletes? Seems like the first option makes the most sense.
So merging with my previous answer, once a field(s) in a record was found to be different to the last import, then I wanted it to flag (perhaps in another column in the table) with a message telling me which record and maybe even which field was causing it. Then the user would check the actual file to investigate the reason. I guess the only way to compare the new and old data is for the table to contain duplicate filenames whenever a change to the metadata occurred.

Hope I'm making sense.

Sampoline · Mar 3, 2021

arnelgp said:
it doesn't flag any field, it just add the file again if the DateLastMofidifed < than that on the actual file.
you just then compare those "same" records"

Ahhh! Yep that makes sense. I'll try to workout how to flag it and get back to you. Thanks Arnel.

Pat Hartman · Mar 3, 2021

Rather than allowing files to accumulate in the input folder, move them to the processed folder at the end of the input process. If you want to reprocess a file, you can move it back to the input folder after deleting that batch.

If you have a duplicate input file, I would not import it. Link to it instead But you really need to think about what to do. Having the user process every row with differences could be difficult and you need to know how to store the intermediate results until all the review is done. My suggestion to delete and re-import is clean but it dies assume that the user knows what he's doing. Don't make a complicated review process if the user doesn't actually need it.

gemma-the-husky · Mar 3, 2021

I think I would do what Pat just suggested.
Rename any processed file, and/or move it to a different folder.

What do you do with the rows in the import tables? Do you check to see if they are new, or pre-existing? What you could do is store the date imported, and the filename, so you can identify the source of any imported data. (Less expensive in data terms) is to store the row id of the metadata table you have)

Isaac · Mar 3, 2021

+3. Any time I'm processing files, one of the first things I do on Development, Day 1 is create a few folders like Incoming, Processed, Archive, etc. etc.

Save yourself a lotta loopin'

Sampoline · Mar 7, 2021

arnelgp said:
it doesn't flag any field, it just add the file again if the DateLastMofidifed < than that on the actual file.
you just then compare those "same" records"

Sorry all for the late replies, I had some issues with internet last week. But I'm back now properly. Thanks for your responses.

In regards to your replies @Pat Hartman & @gemma-the-husky my idea was that I wanted whoever ran the db to see all the files in a table and track the changes to the metadata, in particular date modified. Maybe instead of importing into the same table, I could create an exception table.

And going off our last conversation @arnelgp say that I want those modified files to populate another table, how would I do that?

So for example, on Monday I imported the metadata from a folder. Tuesday ran import again and it adds ones that didn't already exist in the metadata table into the table. But for the ones that are already pre-existing, it checks whether the date modified has changed. If it hasn't changed, then it should end the process and move onto checking the Next file. But if the date modified has changed, it imports that new import record into another table. Let's just call this table "tblExceptions" for example. These date modified records get added to that exception table for review. Is that possible?

To sum up:
1. Everytime the import is run, it first checks if the file exists in (tblDirectoryDefault)
2. If the file already exists:
2a. It must see if the new imported file's 'Last Date Modified' and the FLastDateModified in (tblDirectoryDefault) matches - if they match, then End at step 4
2b. Else if those two fields do NOT match, then it must log that new import into a new table (tblDirectoryExceptions)
2c. Else any other issues, then log into (tblDirectoryExceptions)
3. But if the file doesn't exist previously in (tblDirectoryDefault) then it should simply import into that table
4. End

Attached the db again for reference, thanks.

gemma-the-husky · Mar 8, 2021

I think you are describing a problem similar to tracking changes in excel files.

If you have csv or other files that you import, I can see no good reason to re-visit those files. If the files change after you have processed them, then really this is a management issue. You could re-name the processed files, move them to a different location, change them to read only. All sorts of things. What you shouldn't do (as an organisation) is allow them to be changed after they have been used. You just can't work on the basis of having to revisit old data time and again. IMO.

Pat Hartman · Mar 8, 2021

We don't know what type of files you are importing so we can't assess the validity of the solution. I'm pretty sure keeping two separate tables isn't actually necessary.

I have several import processes including one that does bank statements so a checkbook can be reconciled. In some, the file name would included the date to make it unique (like the bank statement, it includes the account num and the closing date in the name). In others, the file create date needs to be included in the unique index. As I explained, I assign a batch number when I log the file name so If I get a replacement, I can cleanly delete the batch and reimport the file. I would never make a process where I tried to match the rows between two import files. In most cases the import files would NOT have unique identifiers for each row so matching becomes impossible since there is nothing to match on. If any field can change, how do you know which two rows should match?

If the user wants to import a duplicate file, I ask if they want to delete the previous import first. If they do, I delete the original rows and import the replacements. They can also request to delete an import. Each new import gets a new batch number and the record of the prior import stays in the log table even though the detail rows are deleted.

The Bank statement files not only have to be unique, they have to be sequential. They can't be imported out of sequence because it would mess up the reconciliation process. This is the most complicated kind of import since the imported records update the "checkbook" to log what happened to a particular check on this statement.

gemma-the-husky · Mar 8, 2021

@Pat Hartman

Do you not get problems if you have partly processed a previous import? It can't be as easy as just deleting the previous import.

Pat Hartman · Mar 8, 2021

It depends on what the data is. For a bank statement, it doesn't make sense because you can't undo the updates you did as a result of the import. That is why you have to make sure to import the files in the correct order. For most of the other imports, deleting and reimporting works. Updating records with a replacement file is not possible unless there is some uniqueID. We have no idea what kind of data we are talking about. We are just guessing as to what is reasonable.

Regarding the bank statement example, It would be a pain but you should be able to identify the records that were updated by the previous import and revert them back to their state as of the end of last month and then reimport. This "recovery" only works for files with fixed periods and appropriate backups.

Isaac · Mar 8, 2021

in my experience i've never come across a time when a system or vendor needed to "edit" an existing, already-exported, already-processed file.
it definitely seems weird ... and that if at ALL possible, you should rather try to get the involved parties to re-export more information, as compared to trying to 'edit' already-exported-and-processed files.

Sampoline · Mar 8, 2021

Pat Hartman said:
We don't know what type of files you are importing so we can't assess the validity of the solution. I'm pretty sure keeping two separate tables isn't actually necessary.

I have several import processes including one that does bank statements so a checkbook can be reconciled. In some, the file name would included the date to make it unique (like the bank statement, it includes the account num and the closing date in the name). In others, the file create date needs to be included in the unique index. As I explained, I assign a batch number when I log the file name so If I get a replacement, I can cleanly delete the batch and reimport the file. I would never make a process where I tried to match the rows between two import files. In most cases the import files would NOT have unique identifiers for each row so matching becomes impossible since there is nothing to match on. If any field can change, how do you know which two rows should match?

If the user wants to import a duplicate file, I ask if they want to delete the previous import first. If they do, I delete the original rows and import the replacements. They can also request to delete an import. Each new import gets a new batch number and the record of the prior import stays in the log table even though the detail rows are deleted.

The Bank statement files not only have to be unique, they have to be sequential. They can't be imported out of sequence because it would mess up the reconciliation process. This is the most complicated kind of import since the imported records update the "checkbook" to log what happened to a particular check on this statement.

Sorry I should have clarified this before. So the files I want to import are either TIF's or JPEG's. The reason I wanted the second table was so that if the date modified had been changed (from the 1st import), I could see that and investigate the actual image in it's folder and the person who created the image to find out the reason for the change. Having that second table would simply make it easier for anyone viewing it to understand that one table is the initial import of those images and the second table is to monitor issues. That's just the idea I felt would work with me, but the VBA behind it is a little daunting.

Pat Hartman · Mar 9, 2021

OK, we're getting closer. Why would a replacement file be sent? Why would you not simply replace the old image? Are you going to keep the old images? Why does the user need to know that the image was replaced? Will they have to take some action? What happens in the mean time if you have an "accepted" image and a "replacement" image? I assume the "accepted" version will continue to be used until the "replacement" is approved if that is where you're gong with this.

It just seems like a lot of moving parts that might not be necessary and which might cause the wrong file to be used if the approval takes too long.

Ideas on flagging a row of a table, if there is an expected change to it?

Member

Attachments

I’m here to help

..forever waiting... waiting for jellybean!

Attachments

Super Moderator

Member

Super Moderator

..forever waiting... waiting for jellybean!

Member

Member

Super Moderator

Super Moderator

Lifelong Learner

Member

Attachments

Super Moderator

Super Moderator

Super Moderator

Super Moderator

Lifelong Learner

Member

Super Moderator

Similar threads

Users who are viewing this thread