Is there a more efficient way? (1 Viewer)

Frothingslosh

Premier Pale Stale Ale
Local time
Today, 04:26
Joined
Oct 17, 2012
Messages
3,276
I just got handed an assignment at work that involves creating a system that includes a number of challenges:

  1. Several types of spreadsheets will be imported, such as "Eligbility", "Consents", and "Clinic Info", each with their own data (and thus, header) requirements.
  2. All customers will use the same column headers on the above submissions.
  3. These headers will not be in the same order between customers.
  4. The sheet I want is not always in the same position in the workbook, but will always contain the text 'data' in the name.

On the bright side, I already have #4 taken care of, as I already have a procedure that locates the sheet I want in a given workbook based on the name of the sheet.

So far I've written a header verification procedure that should work fine that I'll include below, but I have this nagging feeling that I could do this more efficiently another way, so I'm asking if anyone has any suggestions along this line.

Basically, I have a table that stores the column headers for each type of spreadsheet. I pull a recordset with just those column headers, sorted by the primary key, then using that, find each header in turn, and add them to a global array that stores the column number and column name.

Assuming this is the best approach for the verification, my next step will be the import routine, but I wont' worry about that until Monday. My question here is honestly just if anyone has suggestions on any way to make this more efficient. I'll attach a small test database with the code, some small tables for testing, and my two custom-made test spreadsheets. Since I'm currently at home (yeah, yeah), they're indicated as being saved in C:\Development, but that can be changed easily enough on the test form.

Code:
Option Compare Database
Option Explicit
Option Base 0

Public gavarColumnHeaders() As Variant      'Public array because it is generated by verify routine and used in import.
Public gdbsCurrDB As DAO.Database           'This is the current database.  Used only in verify and import routines.
Public grstColumnList As DAO.Recordset      'This recordset is used by both the verify and the import routines.

Public Function funVerifyHeaders(ByVal xlS As Worksheet, _
                                 ByVal lngSpreadsheetType As Long) As Integer
' ************************************************************
' Created by       : <Frothingslosh>
' Parameters       : Referenced spreadsheeet
' Result           : Integer (0 = Misc Error, 1 = Empty tblColums, 2 = Header Mismatch, 3 = Okay)
' Date             : 6/1/13
' Remarks          : This routine reads the column headers and verifies the headers against values in tblColumns.  It also creates
'                    an array indicating which column contains which header so that it can be imported correctly.
'
'                    This routine will determine column locations - order is irrelevant.
'
'                    *** THIS ROUTINE IS DESIGNED FOR APPLICATIONS WHERE MULTIPLE TYPES OF SPREADSHEETS ARE REGULARLY IMPORTED ***
'
'                    *** BLANK COLUMN HEADERS WILL ABORT THE VERIFICATION - THIS ROUTINE ASSUMES THAT BLANK HEADERS
'                        INDICATE THERE ARE NO MORE COLUMNS TO SEARCH. ***
'
'                    This routine requires a table named tblColumnData in the back end with the following fields:
'                       ColumnID                        Autonumber      (Primary Key field)
'                       SpreadsheetID                   Long Integer    (Determines which spreadsheet header information to use)
'                       ColumnHeader                    Text(255)       (Column header as expected in the spreadsheet)
'                       (UNUSED - ColumnField)          Text(255)       (Associated field name in the table the data is imported into)
'                                                                       (Only used during the import routine)
'                       (OPTIONAL - ColumnDesc)         Memo            (Description of the column)
'                       (UNUSED - ColumnDataType)       Text(50)        (Data type to be expected in the column - only used during import)
'
'                    You will also need a table named tblSpreadsheets in the back end (linked to the above) with the following fields:
'                       SpreadsheetID                   Autonumber      (Primary Key field)
'                       SpreadsheetCat                  Text(75)        (1-2 word description of the spreadsheet category - Invoicing, Contractors, etc)
'                       (OPTIONAL - SpreadsheetDesc)    Memo            (Description of the spreadsheet)
'                       *** This table is not directly referenced by this routine, but is needed for data maintence by the DBA. ***
'
'                    Enter the indicated data into these tables.  (There should be a form for this available to the DBA)
'
'                    This routine requires the use of a number of public declarations as indicated in the declarations section above, as they
'                       will also be used by the subsequent import routine.
'
'                    *IMPORTANT - EXCEL LIBRARY REFERENCE MUST BE ACTIVE TO USE THIS ROUTINE*
' ************************************************************

On Error GoTo funVerifyHeaders_Err

    Dim strProcName As String                       'Procedure name
    Dim strColumnSQL As String                      'SQL string
    Dim intColumnCount As Integer                   'Number of columns to be checked
    Dim intColumnNumber As Integer                  'Column number in the spreadsheet being verified
    Dim intElement As Integer                       'Number to be used as the first element in the array gaintColumnHeaders
    Dim bolHeaderFound As Boolean                   'Setting true indicates header was found and search loop can terminate for that particular header
    Dim varCellValue As Variant                     'Value of the specified cell in the spreadsheet
    
    'Set defaults.
    strProcName = "funVerifyHeaders"
    funVerifyHeaders = 0                            'Defaults to 'Unknown/Miscellaneous Error'
    intElement = 0                                  'Array elements start at 0
    
    'Set the value of strColumnSQL.  This same string will be used in the import routine as well in order to retain consistancy.
    strColumnSQL = "SELECT tblColumnData.ColumnHeader FROM tblColumnData WHERE (((tblColumnData.SpreadsheetID)=" & lngSpreadsheetType & ")) ORDER BY tblColumnData.ColumnID;"
    
    'Open a recordset based on tblColumns to get needed data for searching the designated worksheet.
    Set gdbsCurrDB = CurrentDb
    Set grstColumnList = gdbsCurrDB.OpenRecordset(strColumnSQL, dbOpenSnapshot, dbReadOnly)
    
    With grstColumnList
        
        'Main verification engine.
        If .BOF And .EOF Then                       'Recordset is empty.
            funVerifyHeaders = 1                    'Return code for empty recordset.
        Else                                        'Recordset is not empty.
        
            'Determine number of columns to verify.
            .MoveLast
            intColumnCount = .RecordCount
            
            'Redim global array used to track columns.  For each first element, second element 1 is field name, second element 2 is column in the spreadsheet.
            ReDim gavarColumnHeaders(intColumnCount - 1, 1)     'Subtract 1 because arrays start at 0.
        
            'Move back to the start of the recordset for the actual verification.
            .MoveFirst
            
            'Compare every header in the columns table against the headers in the spreadsheet.
            Do
            
                'Reset defaults for the following loop.
                bolHeaderFound = False
                intColumnNumber = 1
                
                Do
                    
                    'Determine the value of the cell in the selected column.
                    varCellValue = Trim(xlS.Cells(1, intColumnNumber).Value)    'Pull the value
                    
                    'If varCellValue is either Null or "", the header doesn't exist.  Exit the loop with bolHeaderFound = False.
                    If IsNull(varCellValue) Or varCellValue = "" Then Exit Do
                    
                    'Convert from variant to string
                    varCellValue = CStr(varCellValue)
                    
                    'For the selected column, compare the header in the spreadsheet vs the expected value.
                    If varCellValue = .Fields("ColumnHeader") Then
                    
                        'Match found - update global array, then end the loop.
                        gavarColumnHeaders(intElement, 0) = .Fields("ColumnHeader")
                        gavarColumnHeaders(intElement, 1) = intColumnNumber
                        bolHeaderFound = True
                    End If
                    
                    'Increment the column number for the next iteration of the search loop.
                    intColumnNumber = intColumnNumber + 1
                    
                Loop Until bolHeaderFound = True
                
                'If the above loop ended with bolHeaderFound still False, then the header doesn't exist in the spreadsheet.  Exit this loop immediately.
                If bolHeaderFound = False Then Exit Do
                
                'Current header was found.  Increment intElement for the next header, advance the pointer, and loop.
                intElement = intElement + 1
                .MoveNext
            Loop Until .EOF
            
            'If EOF, then all headers matched.  If not EOF, then there was a mismatch.
            If .EOF Then
                funVerifyHeaders = 3    'Return code for header okay.
            Else
                funVerifyHeaders = 2    'Return code for header mismatch.
            End If
        End If
    End With
    
funVerifyHeaders_Exit:
    'If the result was anything other than 3 (Okay) and the recordset is open, then clean them up.  Recordset will not be used.
    If funVerifyHeaders <> 3 And Not grstColumnList Is Nothing Then
        grstColumnList.Close
        Set grstColumnList = Nothing
        Set gdbsCurrDB = Nothing
    End If
    Exit Function
 
funVerifyHeaders_Err:
            
    MsgBox "Error occurred" & vbCrLf & vbCrLf & _
    "In Function:" & vbTab & strProcName & vbCrLf & _
    "Err Number: " & vbTab & Err.Number & vbCrLf & _
    "Description: " & vbTab & Err.Description, vbCritical, _
    "Error in " & Chr$(34) & strProcName & Chr$(34)
    Resume funVerifyHeaders_Exit

End Function

Also, once I'm done, if folks think it might be useful, I can dump the completed procedure(s) into the Code Repository forum, assuming I can make a suitably generic import procedure.

Edit:

Ugh, thought I'd fixed this. I've already replaced

Code:
Set grstColumnList = gdbsCurrDB.OpenRecordset(strColumnSQL, dbOpenSnapshot, dbReadOnly)

with

Code:
Set grstColumnList = gdbsCurrDB.OpenRecordset(strColumnSQL, dbOpenSnapshot)
 

Attachments

  • TestData.zip
    55.9 KB · Views: 100
Last edited:

spikepl

Eledittingent Beliped
Local time
Today, 10:26
Joined
Nov 3, 2010
Messages
6,142
Define "efficient" and "more efficient"
 

Frothingslosh

Premier Pale Stale Ale
Local time
Today, 04:26
Joined
Oct 17, 2012
Messages
3,276
Faster, mainly, although less cpu-intensive would work as well. If possible, something I could re-use in other projects, as my company gets a LOT of spreadsheets from clients as parts of different projects.

The previous dev simply did a separate verification and import procedure for each spreadsheet in each tool, consisting of hard-coded column headers defined as constants in each module and sequential if-thens, one for each column in each procedure. I'm looking to make something more adaptable/flexible because of how often I'm going to be using it.

Some of these spreadsheets have 50+ columns and a couple thousand rows.
 

spikepl

Eledittingent Beliped
Local time
Today, 10:26
Joined
Nov 3, 2010
Messages
6,142
Seriously, again: "faster" means what EXACTLY? Is there a problem right now? If not then polishing it is a waste of effort, because meatware is expensive, and hardware isn't.

I won't look at your code before the operational problem is either defined or killed as a non-problem.
 

Frothingslosh

Premier Pale Stale Ale
Local time
Today, 04:26
Joined
Oct 17, 2012
Messages
3,276
Then don't. I was trying to see if I was on a wrong track or if someone with more experience than I have had a better approach to the task before I got more involved on this path. It's not like I've put much time into this so far.

If the best you can say is "I won't tell you if I know a better method because it's not worth my time", then just move on. I just thought that perhaps someone on this forum may have faced the same situation before and would have some suggestions.
 

spikepl

Eledittingent Beliped
Local time
Today, 10:26
Joined
Nov 3, 2010
Messages
6,142
OK-.

You need to understand that we are not mind readers, but take our clues from the text you supply. If you ask for "faster" or "efficient" we cannot guess what you mean, hence my questions. There is no point in making something run in 0.1 seconds instead of 0.2 seconds.

If you want your approach reviewed for eg,. viability then say so. Don't chase nefarious undefined BS concepts like "better", "faster" etc ... If you say "For 1000 records it takes 2 minutes, and that is unacceptable" then that is something specific to deal with.
 

Users who are viewing this thread

Top Bottom