Looking for Faster way to update Table - boolean field (2 Viewers)

One question that hasn't been asked is how often will you want to do this? If it is seldom, or once only, or just an initial load then anything that does the job is OK.

Only if it is a regular occurrence for regular updates does seeking the best way makes sense. Also there is no hint at the number of files involved: tens, hundreds, thousands, tens of thousands?

I must admit the whole approach suggested at the start of the thread seems flawed to me. My archive database processing is achieved with File System object and Update queries. Similar I expect to Moke123, but without any dictionary step.
 
One question that hasn't been asked is how often will you want to do this? If it is seldom, or once only, or just an initial load then anything that does the job is OK.

Only if it is a regular occurrence for regular updates does seeking the best way makes sense. Also there is no hint at the number of files involved: tens, hundreds, thousands, tens of thousands?

I must admit the whole approach suggested at the start of the thread seems flawed to me. My archive database processing is achieved with File System object and Update queries. Similar I expect to Moke123, but without any dictionary step.
The original number was stated to be @ 40,000 files. For me it was a question of recursion. Hard to think there would be that many files in a single directory.
 
Correct. I'm using a recursive directory function on about 25TB of miscellaneous drives and directories. The frequency I run this function is perhaps every couple of weeks. the Database is essentially a personal multimedia catalog that I want to maintain as "current" relative to deletions, downloads, creations, etc.... I have several directories where the sources files live and the database gives me a direct way to search and launch the multitude of various files. It's basically a pet project I started back in 2011 and as my collection grew, my elementary and crude coding cobbled together from the internet has proven highly inefficient.

Hence this thread. The ideas presented are helping me as I grow to better understand the ins & outs of database design, use, and associated vba coding. I am NOT a competent and experienced programmer, but I do enjoy working and growing my understanding of such subjects.... Thanks for all the advise. I'm now slowly rebuilding the vba such that I build tables and queries rather than the multitude of dictionaries I had been building....
 
The paradigm I use albeit on about only 8% of the number files is two part process.

First iterate through the table where (assuming you have stored the full paths) checking every file listed exists ( using FSO.FileExist() and if not delete the record in the table.

Secondly, iterate through the relevant directories checking that a record exists (rst.findFirst) and if not creat a record using rst.AddNew, using the details from an FSO.file

This is all predicated having the full paths in the table and usage of FileSystem Object

As a throwaway observation this seems to work about 20% faster using UNC names rather than mapped drives. The whole process lasts about 6 to 10 seconds depending on nework traffic.
 
Last edited:
Cool. I have a bit of sophistication in my code (If statements to detect specific directories and excluding certain conditions), but perhaps I could simplify things a bit using your method. It may be another option to speed up the process.
 
Iterate twice?

I find it more efficient to write all relevant folders with its files into a table, you need one pass to enter new files anyway. Once I have a table, I can simply compare it with a table, a DB developer can write queries.

A worse choice:
rs.FindFirst... it's really worth it to transfer a table into a dictionary so that you can then query it quickly and specifically.
 
it's really worth it to transfer a table into a dictionary so that you can then query it quickly and specifically.
I've looped a table of 180,000 records into a dictionary and was surprised it took only 2 seconds. (using the universally accepted 1 mississippi 2 mississippi method of timing)
 
Iterate twice?

I find it more efficient to write all relevant folders with its files into a table, you need one pass to enter new files anyway. Once I have a table, I can simply compare it with a table, a DB developer can write queries.

A worse choice:
rs.FindFirst... it's really worth it to transfer a table into a dictionary so that you can then query it quickly and specifically.
I'll just say that I have tried many methods, including dictionaries, and this method has proven to be the quickest. Just remember we are talking of thousands of files.

I do iterations twice because they are separate logical processes. One is to remove extraneous records, the other to record new files.

Both writing to an extra table and using dictionaries are time consuming and unnecessary steps. Dictionaries are very slow and inefficient for more than a few 10s of records, as they reorder themselves as every record is added. (An exponential process!)

The iteration through the actual files is the slowest part of the process, and all methods described have to do this
 
I also base my statements on my own experience. An rs.FindFirst is also noticeably slow with large tables. In this case, it would be better to use Seek or the intermediate step via collection/dictionary.

When writing to a table, I would only set the index to be used later after the data has been entered.

The iteration through the actual files is the slowest part of the process, and all methods described have to do this
One should keep that in mind for now.
For larger quantities and performance considerations, one should then resort directly to Win APIs.
 
I'm using a recursive directory function on about 25TB of miscellaneous drives and directories.

You obviously have been making this work for a while, so I'll just add a little warning here. Recursion can be pretty effective for directory structures, but be very careful about how you implement it. True recursion is possible in VBA. I've used it for a family tree application. But the problem is that if you do true recursion - in the sense of a routine calling itself to do something at a deeper directory level before returning to a previous level by exiting from a subroutine - you risk running into the problem of stack overflows. (No, not the Stack Overflow web site... the actual error message.) If you ever get the "Stack overflows heap" (or equally bad, "Heap overflows stack") you will have a mess on your hands to clean up. When I was working with my recursive function, I got that a few times and it was NO FUN.
 
IMO the fastest way to update single record on a table with thousands of records is using the "seek" & "update" Methods in a indexed tabledef recordset (Only for MS Access BE)
 
This whole thread has been fascinating to me so I did a bit of experimenting with my Archive database tables. Dictionaries, temp files, findFirst, Seek etc.

There was actually little to choose between any of the method. Surprisingly using Seek instead of FindFirst gave only marginal improvement which I suspect was because of the length of the text entries in the index which are all complete file paths.

Turns out the simplest method and easily quick enough to not impact the speed of the process was to traverse the document folders and try to write a new record (.addNew) for all of the files and let the write (.Update) fail on duplicate records.
 
unfortunately, as i tried the code on a new db, it halts my db.
probably waiting to see if there is any changes on the folder.
i then added a file to the monitored folder and temporary the db wakes up.
since i have the code on the Timer event, the db again goes to limbo, waiting for
any activity on the folder.
 
Therefore, it would be beneficial to monitor the file system itself and to immediately record current changes and pass them on to the database. Something like this could help.

This would be a useful idea but not for me. I only run the complete system occasionally as a housekeeping and pretty much it seldom ever identifies any discrepancies. Mostly my Database Adds records as I add new files to my archive, and deleting a record deletes the associated file.
 
In this topic, however, we are primarily dealing with the task described in #1: marking the records in the table in which the associated file no longer exists.

Something missing can only be determined by comparing what is complete with what is present => a classic task for SQL, and for this you need two usable tables.
Or the file system itself reports immediately what changes have occurred.
 
In this topic, however, we are primarily dealing with the task described in #1: marking the records in the table in which the associated file no longer exists.

Something missing can only be determined by comparing what is complete with what is present => a classic task for SQL, and for this you need two usable tables.
Or the file system itself reports immediately what changes have occurred.

I think you are oversimplifying it. As I read it, it was also identifying the opposite - files where no record exists: and this has been the basis of most of the efficiency arguments(discussion). The missing files task you refer to is the simple part of the operation!
 
Perhaps you could store the files incrementally in a table as they are generated/come into existence, so that all "unprocessed" files would be as simple as a query that you could loop through to perform the updates. This is very similar to Doc's suggestion, I'm just emphasizing that maybe you can control the process as the files are generated rather than the heavyweight process of looping through them all from scratch.
 
Thanks for all the help. I'm close to completing the update to my db, but I've a snag when trying to work with sql. Per the recommendations found in this thread, I've done some testing and have gotten to the point where a LEFT JOIN query accomplishes the task of showing all records in the main table that were not found during the recursive directory search. When this query is run in the Access query builder interface, I realized I can take it further and change the value of blnAlive directly in the query window and the results flow back to the primary data table. Now I'm struggling on how to do that entirely in vba code.

See my snipet below and let me know if it is possible execute an UPDATE have the entirety of the records move from TRUE to FALSE.


Code:
Dim strMsg As String
Dim strSql As String
Dim rsFilesDead As DAO.Recordset




'Only do the section below if the user did NOT choose a directory to search.
If Not Me.chkDir Then
'SQL below is to identify all records not found in active directory search
    strSql = "SELECT tblClips.ID, tblClips.blnAlive " & _
                "FROM tblClips LEFT JOIN rsFilesFound ON tblClips.fldLocation = rsFilesFound.fldFileAddress " & _
                    "WHERE (((tblClips.blnAlive)=True) AND ((rsFilesFound.fldFileAddress) Is Null));"
    
    Set rsFilesDead = CurrentDb.OpenRecordset(strSql)
    If rsFilesDead.RecordCount > 0 Then
        rsFilesDead.MoveLast
        rsFilesDead.MoveFirst
    End If
    
    strMsg = MsgBox("The SQL Query contains " & rsFilesDead.RecordCount & " references.", vbOKCancel, "TESTING")
    If strMsg = vbCancel Then Exit Sub
    
    'Insert UPDATE Function on rsFilesDead to set the blnAlive field to FALSE.
    
End If
 

Users who are viewing this thread

Back
Top Bottom