Ideas on flagging a row of a table, if there is an expected change to it?

Sampoline

Member
Local time
Tomorrow, 09:20
Joined
Oct 19, 2020
Messages
161
Hi all. I have a database that retrieves the metadata properties of a file:

Path, Name, Size, Owner, Date Created, Date Last Modified, Date Last Accessed, Type.

Now I'll be getting people to run imports on a directory every morning. The idea of that is to:

1) Import new files into the database
2) Track the changes to the existing columns

Now my issue is I'm not able to figure out how to do point 2. Because I don't want to import files that are already existing in the database. So I added check for if that file exists then don't import. But in order to track changes made to files since last import, I probably need to import them right? So what should I try doing here? I want something like, 'if file exists then skip, except if date last modified has changed'. But I don't know VBA well enough to make it do this for me. Or perhaps this isn't a VBA thing? Not sure.

I've attached a sample of my database. Thanks.
 

Attachments

You might be able to use the FileDateTime() function to compare the modify dates and also check if the file size is different.
 
see this modification on your ListFiles() sub.
 

Attachments

You might be able to use the FileDateTime() function to compare the modify dates and also check if the file size is different.
see this modification on your ListFiles() sub.
What defines a new file? A unique file name?
What defines changes to an existing file? A duplicate file name with a later modified date?
What exactly do you want to happen with a modified file? Delete all records and replace with what is in the updated file? Update matching rows and ignore adds/deletes? Seems like the first option makes the most sense.

I would keep a log table of the files that were imported and assign each import a batchID when I import the details. That way, If I want to replace all the rows imported in file 1023, then I can run a delete query that deletes all the rows for 1023. Then I can update the demographics in the log table and import the replacement rows for batch 1023.
@theDBguy so like add all the records and then have a FileDateTime to compare if the last modified date has changed?

@arnelgp
So with:

Code:
If Nz(DLookup("FDateLastModified", mysql, "FPath = '" & MyFile.Path & "'"), #1/1/1890#) < MyFile.DateLastModified Then

Does this add the record into the table only if the date last modified has been changed? If I say add a column, how could I make that a flagging column if a duplicate record is made to compare the data? I guess if I do that I can run a query make table to compare them.

@Pat Hartman yeah that does sound like a good idea.. i'll try that out
 
it doesn't flag any field, it just add the file again if the DateLastMofidifed < than that on the actual file.
you just then compare those "same" records"
 
In order for us to help you, you need to answer questions. You quoted my comments but didn't answer the questions.
My apologies

What defines a new file? A unique file name?
When the targeted folder location has a new file in it. So when I run the sub in my db, it will add a record into the table with the metadata for it. e.g. C:\Users\%USERNAME%\Documents had 8 unique files in it on Monday. Then when I ran the sub again on Tuesday there was 21. So the other 13 unique files would be added to the db.

What defines changes to an existing file? A duplicate file name with a later modified date?
Yeah so if the file already existed in that folder location, but the metadata had changes, for e.g. the Date Last Modified was different to the last import

What exactly do you want to happen with a modified file? Delete all records and replace with what is in the updated file? Update matching rows and ignore adds/deletes? Seems like the first option makes the most sense.
So merging with my previous answer, once a field(s) in a record was found to be different to the last import, then I wanted it to flag (perhaps in another column in the table) with a message telling me which record and maybe even which field was causing it. Then the user would check the actual file to investigate the reason. I guess the only way to compare the new and old data is for the table to contain duplicate filenames whenever a change to the metadata occurred.

Hope I'm making sense.
 
it doesn't flag any field, it just add the file again if the DateLastMofidifed < than that on the actual file.
you just then compare those "same" records"
Ahhh! Yep that makes sense. I'll try to workout how to flag it and get back to you. Thanks Arnel.
 
I think I would do what Pat just suggested.
Rename any processed file, and/or move it to a different folder.

What do you do with the rows in the import tables? Do you check to see if they are new, or pre-existing? What you could do is store the date imported, and the filename, so you can identify the source of any imported data. (Less expensive in data terms) is to store the row id of the metadata table you have)
 
+3. Any time I'm processing files, one of the first things I do on Development, Day 1 is create a few folders like Incoming, Processed, Archive, etc. etc.

Save yourself a lotta loopin'
 
it doesn't flag any field, it just add the file again if the DateLastMofidifed < than that on the actual file.
you just then compare those "same" records"
Sorry all for the late replies, I had some issues with internet last week. But I'm back now properly. Thanks for your responses.

In regards to your replies @Pat Hartman & @gemma-the-husky my idea was that I wanted whoever ran the db to see all the files in a table and track the changes to the metadata, in particular date modified. Maybe instead of importing into the same table, I could create an exception table.

And going off our last conversation @arnelgp say that I want those modified files to populate another table, how would I do that?

So for example, on Monday I imported the metadata from a folder. Tuesday ran import again and it adds ones that didn't already exist in the metadata table into the table. But for the ones that are already pre-existing, it checks whether the date modified has changed. If it hasn't changed, then it should end the process and move onto checking the Next file. But if the date modified has changed, it imports that new import record into another table. Let's just call this table "tblExceptions" for example. These date modified records get added to that exception table for review. Is that possible?

To sum up:
1. Everytime the import is run, it first checks if the file exists in (tblDirectoryDefault)
2. If the file already exists:
2a. It must see if the new imported file's 'Last Date Modified' and the FLastDateModified in (tblDirectoryDefault) matches - if they match, then End at step 4
2b. Else if those two fields do NOT match, then it must log that new import into a new table (tblDirectoryExceptions)
2c. Else any other issues, then log into (tblDirectoryExceptions)
3. But if the file doesn't exist previously in (tblDirectoryDefault) then it should simply import into that table
4. End

Attached the db again for reference, thanks.
 

Attachments

Last edited:
I think you are describing a problem similar to tracking changes in excel files.

If you have csv or other files that you import, I can see no good reason to re-visit those files. If the files change after you have processed them, then really this is a management issue. You could re-name the processed files, move them to a different location, change them to read only. All sorts of things. What you shouldn't do (as an organisation) is allow them to be changed after they have been used. You just can't work on the basis of having to revisit old data time and again. IMO.
 
@Pat Hartman

Do you not get problems if you have partly processed a previous import? It can't be as easy as just deleting the previous import.
 
in my experience i've never come across a time when a system or vendor needed to "edit" an existing, already-exported, already-processed file.
it definitely seems weird ... and that if at ALL possible, you should rather try to get the involved parties to re-export more information, as compared to trying to 'edit' already-exported-and-processed files.
 
We don't know what type of files you are importing so we can't assess the validity of the solution. I'm pretty sure keeping two separate tables isn't actually necessary.

I have several import processes including one that does bank statements so a checkbook can be reconciled. In some, the file name would included the date to make it unique (like the bank statement, it includes the account num and the closing date in the name). In others, the file create date needs to be included in the unique index. As I explained, I assign a batch number when I log the file name so If I get a replacement, I can cleanly delete the batch and reimport the file. I would never make a process where I tried to match the rows between two import files. In most cases the import files would NOT have unique identifiers for each row so matching becomes impossible since there is nothing to match on. If any field can change, how do you know which two rows should match?

If the user wants to import a duplicate file, I ask if they want to delete the previous import first. If they do, I delete the original rows and import the replacements. They can also request to delete an import. Each new import gets a new batch number and the record of the prior import stays in the log table even though the detail rows are deleted.

The Bank statement files not only have to be unique, they have to be sequential. They can't be imported out of sequence because it would mess up the reconciliation process. This is the most complicated kind of import since the imported records update the "checkbook" to log what happened to a particular check on this statement.
Sorry I should have clarified this before. So the files I want to import are either TIF's or JPEG's. The reason I wanted the second table was so that if the date modified had been changed (from the 1st import), I could see that and investigate the actual image in it's folder and the person who created the image to find out the reason for the change. Having that second table would simply make it easier for anyone viewing it to understand that one table is the initial import of those images and the second table is to monitor issues. That's just the idea I felt would work with me, but the VBA behind it is a little daunting.
 
Yeah I understand that @Pat Hartman so here is how things are working at the moment.

So there will be some photoshop work done on the images. The person doing the edit work will quality check if the original image is good or not first of all. If the image wasn't camera captured correctly, then they would inform the photographer of an issue. Then that original image would then be replaced by another one. So essentially it becomes a rescan.

I want to see this in the database. So that the editor is confirming the requirement for the replacement. So I wanted the editor to see both the original and the replacement in two different tables to compare and confirm the changes. There have been instances of miscommunication and the replacement wasn't required to be made. In that case, the original remains. That's why I feel having both tables would help. So when the date last modified changes, it logs into that 2nd table. Then editor would check and confirm, or simply delete that row from the table.

This is sort of the requirement I was given by my higher ups too when I explained the situation. Hence why I wanted the two tables, it just seemed to make logical sense to all parties involved.
 
Yeah I understand that @Pat Hartman so here is how things are working at the moment.

So there will be some photoshop work done on the images. The person doing the edit work will quality check if the original image is good or not first of all. If the image wasn't camera captured correctly, then they would inform the photographer of an issue. Then that original image would then be replaced by another one. So essentially it becomes a rescan.

I want to see this in the database. So that the editor is confirming the requirement for the replacement. So I wanted the editor to see both the original and the replacement in two different tables to compare and confirm the changes. There have been instances of miscommunication and the replacement wasn't required to be made. In that case, the original remains. That's why I feel having both tables would help. So when the date last modified changes, it logs into that 2nd table. Then editor would check and confirm, or simply delete that row from the table.

This is sort of the requirement I was given by my higher ups too when I explained the situation. Hence why I wanted the two tables, it just seemed to make logical sense to all parties involved.
Surely it would be better to renumber the new file
Make the first image file 1234-A. jpg, and revisions renumbered with a different suffix so you get versions

1234-A
1224-B
1234-C etc

Then for image 1235 you get

1235-A
1225-B etc

and it becomes easy to see what's going on. Releasing new versions is much better than editing existing versions. This way, you don't have to examine the changes personally. You can just get the person who released the changes to annotate the database.
 
Surely it would be better to renumber the new file
Make the first image file 1234-A. jpg, and revisions renumbered with a different suffix so you get versions

1234-A
1224-B
1234-C etc

Then for image 1235 you get

1235-A
1225-B etc

and it becomes easy to see what's going on. Releasing new versions is much better than editing existing versions. This way, you don't have to examine the changes personally. You can just get the person who released the changes to annotate the database.
Yeah Gemma you're making a lot of sense. I was instructed that only the supervisor of the project and the editor were going to be reviewing this database. So that's why I was trying to figure out how they could know if the change being made is required or not. To give you more context, I have file movement process ongoing too, Once the image is made, the file(s) is copied from the imaging folder to a separate location. The file(s) in the imaging folder are then moved to the folder that the editor works on. So there is always a copy of the original available.

Say that a rescan is required, the editor will delete those images from their folder and the photographer will reshoot and it will go into the imaging folder as usual. Then when I run the file movement, this image will copy again to the separate location and now the original image will move to a subfolder with the create date and time added to the filename.

The editor will work with this information and review the original and rescan images in the copy folder; deleting the one he doesn't require henceforth. And this keeps repeating.

What do you suggest in this case? Because I was told to do it the two table method. And I'm kinda pressed about getting it done, just stuck on the VBA of it.
 
Last edited:
'just the bottom of the quote

What do you suggest in this case? Because I was told to do it the two table method. And I'm kinda pressed about getting it done, just stuck on the VBA of it.

Hopefully @arnelgp has just given you a practical solution

I just think have to re-examine images that might have changed in the past tales you down a very difficult cycle. I think your organisation needs to find a way that works, that hopefully doesn't involve changing data that has already been managed and processed. I think we thought you were talking about files of data, such as invoices. However even though you are talking about single images, the same principles apply. The current image ought to be the one with the highest version, and all you need is a way to record the changes and reasons for anyone interested in tracing the development. That's why there really needs to be no question of modifying work that has been previously issued as "fit for processing". It's a matter of version management. Point out that it is next to impossible to examine imagines and spot the changes. There are whole quiz books of spot the difference comparisons.

I have a system that saves versions of quotations, Every time a new item gets added, or a price changed, a new version of the quote, with a new version number gets stored. Anyone who is interested can go back and see the changes. I didn't actually track the actual changes, although I could have. What users couldn't do is just change an old version. Any change to the active data would produce a new version. I suppose I could have kept an audit trail of changes, and then played the whole thing it back as a sort of "game history", but then you don't get the historical snapshots. I didn't think of the alternative at the time. In practice the old versions are hardly ever looked at anyway.
 

Users who are viewing this thread

Back
Top Bottom