Complicated Function...Very Slow...

JaedenRuiner

Registered User.
Local time
Today, 05:19
Joined
Jun 22, 2005
Messages
154
okay,

As I've stated in my past collage of questions, i'm parsing an external DBF into an access table. My first stage is to take all fields (or at least the ones i use) of each record and clean them up for easier parsing.

My insert methodology is based on a two-stage action queries for each step of the conversion. The first step as I said, Parses the original DBF table. The Second Step splits the newly parsed fields from the 1st step into recognizable entries, and the third step inserts all the newly calculated fields into a special table which I will eventually use to create my relational database tables.
All these Records are based on a dual-field unique identifier Tape_ID and Title.
The hope, is that once i do the initial 3-stage conversion, my 4th stage will only update the info, not replace it. But since this data is ever changing in the current database system, i have to import the DBF each time, this isn't a one time deal. However every time i work it, it seems to go exceptionally slow, and i'm trying to speed that up.
The reason i'm doing it this way, is because over 33000 records, I want to have visual updates (progress) to let the user know that I am actually doing something over the extensive time that it takes to complete this process.
Code:
action stage 1:
parameter Alpha Text(1);
insert into [table] (fields)
select (CalcFcns(fields)) 
from [table]
where ((tape_id = "00000000") or (tape_id not like "00*")) and (title like [Alpha] & "*");
this is about 5000 or so records, but gets all my irregular IDs out of the way
Code:
action stage 2:
parameter IDStart Text(9), IDEnd Text(9);
insert into [table] (fields)
select (CalcFcns(fields)) 
from [table]
where (tape_ID between [IDStart] and [IDEnd]);
this handles the remain 27000 records that have standard tape_ids. For those, I break my querydef parameter loop down by groups in the 100000's. I get the min and Max of all the tape_id's starting "00#", (currently between 100000 and 500000), and then check for group of hundred thousands for the presence of records. (currently 3 and 4 contain none). I then loop through the thousands and execute my queries from min to max of each group. It is quite efficient for the effect I want, updating the progress bar, with status messages and everything quite smoothly. A normal process only took a minute or so.
However, as I've had to "clean-up" certain fields more and more, the process keeps slowing down because of my module functions that are cleaning up the fields. I will give a few examples of what I have to work with:
  • Title - This Contains the actual title of the entry, unless it contains a ">>" which means it is a look up reference into another title in the DBF. More On this in title_rem field
  • Title_Rem - this is the remarks field which contains multiple pieces of information:
    • Countr(y|ies) of Origin - if multiple countries, they are seperated by a "/" usually.
    • Flags - These consist of specific flags of note per entry, from the type of interset to certain format issues. there are currently 20 or so possibilities, usually all single character but some are predicated or suffixed by a number.
    • Continuation of Title - since the original DBF field length is 40 sometimes those titles that run too long are filled in here, especially if the title is a look up reference (the ">>" thing)
  • Alt_Id_Rem - Can contain repair information, usualy a date and a count, but sometimes contains completely different information that means the item was replaced.

As I am doing this process, I am trying to reduce the amount of module functions called in a given query, to make sure that the query is quick about its business. So in my initial transfer from the DBF i primarily remove any trailing spaces (RTrim()) from any text field, however i have found it necessary to clean some of the fields more than Trim(). I remove the disk count flag from title_rem into a seperate field. And, as you can see by the following CleanTitleRem() function there are a LOT of discrepancies in title_rem format. Sometimes instead of a "/" i get a " /" or a "/ ", and in a couple cases it's a ",". Some flags are predicated with a "*" others with a "." The number of disks is #P, so I have to cycle the "P"'s found in the string to find the one preceeded by a #.

DBF -> My Table query contains:

CleanAlt([alt_id_rem]) as CleanAltId, DiskCount([title_rem]) as NumDisks, CleanTitleRem([title_rem], [NumDisks]) as CleanRem, CountryPart([title_rem]) as Countries

(these are in my select statement of the clause.)

MyTable -> MyIntermediateTable Query contains:

FlagsPart([title_rem], [Countries]) as Flags, RemarksPart([title_rem], [Countries]) as Remarks

And other functions too. FlagsPart() uses the Countries field that was previously calculated to remove them from the resulting value. It then cleans up the flags so that every flag is predicated by a ".". RemarksPart() does the same as well as recalculating the flagspart for the same reason. So you can see i've got these action queries that are executing functions which in effect perform the same operations two and three times each record.

Example:
From DBF -> MyTable
I call Numdisks([Title_rem]) to retrieve the number of disks, then i call CleanTitleRem() to clean the Title_Rem field up so it is understandable by all my other functions that use that field as well as remove the disk count flag from the field. I also call CountryPart, which neads the cleaned title_rem field not the original. I could put CountryPart in the second query, but the FlagsPart and RemarksPart need the countries section of the string in order to remove them from the field and split it up into its remaining sections.

Now I'm not sure how much it has affected the performance, but What I tried to do with the CleanTitleRem() and CountryPart being in the same query, is that CleanTitleRem() is called first in the select list, and as long as the provided diskcount is > 0 it Cleans the title_rem field, and stores the result in a static local variable. The CountryPart function is called immediately after, and it instantly invokes a call to CleanTitleRem() but with a -1 in the disk count parameter, which causes CleanTitleRem() to return the last calculated value instead of recalculating all that it just went through.

So looking at the code below, which are three initial functions that i use in my first action query (from DBF to my table) is there any way to speed up this process of "cleaning" the fields. Was it wise for me to utilize the static local variable, or would a global be more efficient? If i were to save the diskcount or title remarks "cleaned" values as a global variable, how assured can i be that the order of execution will always flow left to right in my select statement, etc, etc, etc. Thanks, any help would be most appreciated.
Code:
Function DiskCount(ByVal sStr As String) As Integer
  Dim i As Integer, _
      n As String
    
    DiskCount = 1
    i = InStr(i + 1, sStr, "P")
    Do While (i > 1)
      n = Mid(sStr, i - 1, 1)
      If IsNumeric(n) Then
        DiskCount = CInt(n)
        Exit Function
      Else: i = InStr(i + 1, sStr, "P")
      End If
    Loop
End Function

Public Function CleanTitleRem(ByVal sRem As String, ByVal lDisks As Long) As String
  Static sLastClean As String

  Dim s As String, _
      v As Variant, _
      p As Integer
    sRem = RTrim(sRem)
    If lDisks = -1 Then
      sRem = sLastClean
      sLastClean = ""
    ElseIf sRem <> "" Then
      If lDisks > 1 Then sRem = Replace(sRem, lDisks & "P", "")
      If sRem <> "" Then
        If InStr(1, sRem, "*P") > 0 Then sRem = Replace(sRem, "*P", "P")
        If sRem Like ".[ACDFGLMNORSUW]* *" Then
          s = Mid(sRem, 1, InStr(1, sRem, " "))
          sRem = Replace(sRem, s, "", 1, 1) & s
        End If
        If InStr(1, sRem, ",") Then
          If InTable(Split(sRem, ",")(0)) Then
            sRem = Replace(sRem, ",", "/")
          End If
        End If
        If InStr(1, sRem, " (") > 1 Then
          v = Split(sRem, " (")
          If InTable(v(0)) And InTable(Left(v(1), Len(v(1)) - 1)) Then
            sRem = Replace(Replace(sRem, ")", "", 1, 1), " (", "/", 1, 1)
          End If
        End If
        If InStr(1, sRem, " [") > 1 Then
          If InTable(Split(sRem, " [")(0)) Then
            sRem = Replace(Replace(sRem, "]", "", 1, 1), " [", ".", 1, 1)
          End If
        End If
        p = InStr(1, sRem, "U.S.")
        If p > 1 Then
          If Mid(sRem, p - 1, 1) <> "/" Then p = 0
        End If
        If p > 0 Then
          sRem = Replace(sRem, "U.S.", "US", 1, 1)
        End If
        If InStr(1, sRem, "*DIR.*") Then sRem = Replace(sRem, "DIR.", "DIR", 1, 1)
        If InStr(1, sRem, ".0") > 0 Then sRem = Replace(sRem, ".0", ".O")
        If InStr(1, sRem, "*") > 0 Then sRem = Replace(sRem, "*", ".")
        If InStr(1, sRem, "  ") > 0 Then sRem = Replace(sRem, "  ", " ")
        If InStr(1, sRem, " /") > 0 Then sRem = Replace(sRem, " /", "/")
        If InStr(1, sRem, "/ ") > 0 Then sRem = Replace(sRem, "/ ", "/")
        If InStr(1, sRem, " .") > 0 Then sRem = Replace(sRem, " .", ".")
        If InStr(1, sRem, "..") > 0 Then sRem = Replace(sRem, "..", ".")
        If InStr(1, sRem, ".RO") > 0 Then sRem = Replace(sRem, ".RO", ".R0")
        sRem = Trim(sRem)
        If sRem = "." Then sRem = ""
      End If
      sLastClean = sRem
    End If
  CleanTitleRem = sRem
End Function

Jaeden "Sifo Dyas" al'Raec Ruiner
 

Users who are viewing this thread

Back
Top Bottom