Array info and duplicates in an array

exaccess

Registered User.
Local time
Today, 14:40
Joined
Apr 21, 2013
Messages
287
Hello Experts,
I have to deal with string arrays that store text. I need info on copying, comparing, appending arrays. Also on passing arrays as parameters to subs or functions. Could someone please point to the right source where I can get to this info quickly without having to browse through many screens.
In addition to this I have some questions:
ı have to find the duplicates of values in an array. Here is the code that I use.
Code:
Sub FindDuplicates()
    Dim I As Integer, J As Integer, IEND As Integer, text() As String
    ReDim text(IEND)
    For I = 1 To IEND - 1
        For J = I + 1 To IEND
            If text(I) = text(J) Then text(J) = ""
        Next J
    Next I
End Sub
It works but is not performing well. Are there more efficient ways of doing this?
 
Do you have some sample data? Array index normally starts at 0.
Where do you load the array? Did you mean this to have parameters?

Please describe not performing well.
 
Why do you have to deal with string arrays? Access is a database. Put this data in a table, and then you can search and sort and count and group by with the greatest of ease.

Create a table with a single field. Then, enumerate your array, and add a row in the table for each array element. Done. Now you use recordsets, not arrays, and everything will be blisteringly fast.

Code:
SELECT Field1, Count(*) As Field1Count
FROM YourTable
GROUP BY Field1;
Code:
SELECT Field1
FROM YourTable
ORDER BY Field1;

To open a recordset . . .
Code:
dim rst as dao.recordset
set rst = currentdb.openrecordset( _
   "SELECT Field1 " & _
   "FROM YourTable " & _
   "ORDER BY Field1;")
 
Not performing well - that algorithm to find duplicates compares every element of the array with every other, so it will perform poorly.

If you use a table, with a single INDEXED field, then the list is sorted as you create it. And with a sorted list, finding dupes is way faster, since they will always all be right next to each other.
 
If you do use an array, I am pretty sure that irrespective of byref/byval, arrays are always passed byref. So you are always working on the real array, even if you try to pass a copy (byval)
 
Why do you have to deal with string arrays? Access is a database. Put this data in a table, and then you can search and sort and count and group by with the greatest of ease.

Create a table with a single field. Then, enumerate your array, and add a row in the table for each array element. Done. Now you use recordsets, not arrays, and everything will be blisteringly fast.

Code:
SELECT Field1, Count(*) As Field1Count
FROM YourTable
GROUP BY Field1;
Code:
SELECT Field1
FROM YourTable
ORDER BY Field1;

To open a recordset . . .
Code:
dim rst as dao.recordset
set rst = currentdb.openrecordset( _
   "SELECT Field1 " & _
   "FROM YourTable " & _
   "ORDER BY Field1;")
Well here is more info. I have a long field of text which can contain any number of words or single characters or numbers. I placed this long text field as a row in an access table. This row is scanned from left to right. When I find 4 or more adjacent blanks I take this as a delimiter for a phrase. In this way the long text string is divided into phrases delimited by blanks. Each row of this table becomes an array of phrases. The array is defined as
Code:
text() as String
The size of the array is not known and has to be redimensioned when the array is formed. Hence
Code:
Redim text(IEND) as String
with IEND being the upper limit of the array. This is the string array that I am talking about. Here is some sample data for the whole row:
Code:
Virement européen     Références de l'opération:    H1030905K00                                      Pièce justificative en annexe                                                                                                                              Paiement en votre faveur de                                   75
I hope this gives a good idea of the problem.
For every row if I define a table and insert an element of the array into each row then perhaps I can sort out the duplicates problem relatively easy, but how am I going to have the original sequence rebuilt. This original sequence is important as these phrases eventually can form meaningful sentences. Yes if I have an automated numbering field and I revert to this original sequence I can rebuild the original row. I shall try this.
 
Last edited:
Do you have some sample data? Array index normally starts at 0.
Where do you load the array? Did you mean this to have parameters?

Please describe not performing well.

I have given sample data in previous post. Program has
Code:
Option Base 1
Previous post also tells where the data is loaded. If I define the array locally then it is difficult to pass it to a function as a parameter. In fact it can be stored in a variant to pass as a parameter, but I could not figure out a way of getting back the array from the variant. If I define it as public or global it can be accessed anywhere in the module, but must be each time or once redefined (I am not sure which one works).
Not performing well is rather based on experience long years of working in this field I would say. Of course one can always argue whether it is correct or not.
 
If you need to preserve the original order, then create a table with two fields.
 
I have given sample data in previous post. Program has
Code:
Option Base 1
Previous post also tells where the data is loaded. If I define the array locally then it is difficult to pass it to a function as a parameter. In fact it can be stored in a variant to pass as a parameter, but I could not figure out a way of getting back the array from the variant. If I define it as public or global it can be accessed anywhere in the module, but must be each time or once redefined (I am not sure which one works).
Not performing well is rather based on experience long years of working in this field I would say. Of course one can always argue whether it is correct or not.

that's what I said. even if you pass the array as a byval parameter (which may be useful as a control feature in your programme) I do not believe it actually passes a "copy" of the array to the sub. It still works as if you used the original array.
 
Not performing well - that algorithm to find duplicates compares every element of the array with every other, so it will perform poorly.

If you use a table, with a single INDEXED field, then the list is sorted as you create it. And with a sorted list, finding dupes is way faster, since they will always all be right next to each other.

As you have suggested I created an indexed table with two fields. The automated number as ID and the Phrase. It worked. I got rid of the duplicates but now the problem is I can not restore the original sorting sequence. I sorted on ID but it does not help because the program deletes always the first occurrence. I tried by sorting from Z to A before deletion but that does not help either. Here is the code and data with all the debugging help included.

Code:
Sub main1()
    Dim Textline As String, intoffset As Integer
    Dim ss As Integer, iend1 As Integer
    ReDim text1(20) As String
    Dim vt1 As Variant
    
    ss = 1
    text1(1) = "Virement européen"
    text1(2) = "De: M+Mme Uldrich - Baes"
    text1(3) = "Avenue de l'Avocat 10"
    text1(4) = "1600"
    text1(5) = "Woluve saint lambert"
    text1(6) = "Alecs Douglas"
    text1(7) = "IBAN: BE3261199999999999999"
    text1(8) = "Communication:"
    text1(9) = "Virement européen"
    text1(10) = "Aloast Yearly Fee for Retirees"
    text1(11) = "Avenue de l'Avocat 10"
    text1(12) = "Belgique"
   
    iend1 = 12
    vt1 = text1
    Call DelDups(ss, iend1, vt1)
End Sub


'DelDups Removes duplicate records in TextArrayTbl matched on Phrase
Public Sub DelDups(KK As Integer, iend, vtext1 As Variant)
    On Error GoTo Err_Msg
    ReDim text1(iend)
    Dim I As Integer
    text1 = vtext1
    For I = 1 To iend
        Debug.Print "text1("; I; ")="; text1(I)
    Next I
    Dim rst As DAO.Recordset, iend1 As Integer, IEND2 As Integer, vtext As Variant, varLastVal As Variant

    Debug.Print "DelDups started"
    On Error Resume Next
    Set rst = CurrentDb.OpenRecordset("TextArrayTbl")
    rst.MoveFirst
    Do While Not rst.EOF
        rst.Delete
        rst.MoveNext
    Loop
    On Error GoTo Err_Msg
If KK = 1 Then
    iend1 = iend
    Debug.Print "Entering loop KK="; KK; "end1="; iend1
    For I = 1 To iend1
        rst.AddNew
        rst.Fields("Phrase").Value = text1(I)
        Debug.Print "text1("; I; ")="; text1(I); " is stored"
        rst.Update
        'rst.MoveNext
    Next I
    rst.Close
    Dim strSQL As String
   'Recordset must be full table but sorted by the field we're checking
    strSQL = "SELECT * FROM [" & "TextArrayTbl" & "] ORDER BY [" & "Phrase" & "] DESc"
    With CurrentDb.OpenRecordset(strSQL, dbOpenDynaset)
        varLastVal = Null
        'For each record, check against previous value in Phrase
        'If same then this is a duplicate so can be removed
        Do Until .EOF
            If .Fields("Phrase") = varLastVal Then
                Debug.Print .Fields("Phrase"); " will be deleted"
                Call .Delete
                
            Else
                varLastVal = .Fields("Phrase")
            End If
            Call .MoveNext
         Loop
         .Close
    End With
    
    Debug.Print "Duplicates eliminated "
ReadBack:
    strSQL = "SELECT * FROM [" & "TextArrayTbl" & "] ORDER BY [" & "ID" & "]"
    With CurrentDb.OpenRecordset(strSQL, dbOpenDynaset)
    I = 0
    .MoveFirst
    Do While Not .EOF
        I = I + 1
        text1(I) = .Fields("Phrase")
        Debug.Print "text1("; I; ")="; text1(I)
        .MoveNext
    Loop
    End With
    Set rst = Nothing
    
    MsgBox "Result " & I & " records sorted"
    
    Exit Sub
Err_Msg:
    MsgBox "Error number = " & Err.Number & " Error = " & Err.Description
    

End If
End Sub

This is the result:
Code:
text1( 1 )=De: M+Mme Uldrich - Baes
text1( 2 )=1600
text1( 3 )=Woluve saint lambert
text1( 4 )=Alecs Douglas
text1( 5 )=IBAN: BE3261199999999999999
text1( 6 )=Communication:
text1( 7 )=Virement européen
text1( 8 )=Aloast Yearly Fee for Retirees
text1( 9 )=Avenue de l'Avocat 10
text1( 10 )=Belgique

As you can see the first occurrences of the two duplicates "Virement.." and "Avenu..." are gone.
 
Sort by both fields here . . .
Code:
    strSQL = "SELECT * FROM [" & "TextArrayTbl" & "] ORDER BY [" & "Phrase" & "] DESc"
. . . and you don't need those "&" chars since those are all hard coded strings . . .
Code:
strSQL = "SELECT * FROM TextArrayTbl ORDER BY Phrase, ID"
. . . and play with that. Maybe you need to do . . .
Code:
strSQL = "SELECT * FROM TextArrayTbl ORDER BY Phrase DESC, ID"
. . . or . . .
Code:
strSQL = "SELECT * FROM TextArrayTbl ORDER BY Phrase, ID DESC"
. . . to delete the second occurance.
Cheers,
 
Try changing this line

For I = 1 To iend1

to

For I = iend1 To 1 Step -1

Good luck

OOooops: I see Markk has responded
 
jdraw, that's a good solution, the principle being: traverse the list backwards.
 
Try changing this line

For I = 1 To iend1

to

For I = iend1 To 1 Step -1

Good luck

OOooops: I see Markk has responded

Changed only one line and it works perfectly. Many thanks to both of you.
 

Users who are viewing this thread

Back
Top Bottom