Solved Remove duplicated values (1 Viewer)

spike_access

New member
Local time
Today, 16:34
Joined
Mar 22, 2022
Messages
14
Hi all,

I've been browsing through the forum and found loads of examples on how to remove duplicates, but I haven't found a solution that applies to my scenario. If it is out there I apologize if making any unnecessary posting. Could of course be my lack of understanding the posts as well...

Background
I have a table (tblQMs) which is updated on a daily basis with a report that contains additive data, and in some cases duplicated data.
The main fields used are titleID, Audio, Sub. All values in Audio and Sub is comma separated.
I know that the structure isn't the preferred (using comma separated values) but it is what I have to live with as it is how the source system provides the reports, also taking into account that it is not a defined set of possible values, it could be any language code in the world.
The update of tblQMs is performed by a report being imported into a temp table (tblQMs_Updates) from which, either, records are appended to tblQMs or values added to existing records (Audio/Sub).

Issue
The part of updating the tblQMs is all resolved. The challenge now is to "clean out" duplicates in the fields Audio and Sub after the update.
The table contains thousands of records and is an ever growing table with new records added on a daily basis and there is no form from which I can trigger a function to clean out duplicated values on a record per record basis. I basically need code that, post-update, will clean the data.
Any alternative solution pre-update would of course also be welcome if deemed a much better solution!

Example data
titleIDAudioSub
123456en-US, pt-PTen-US, pt-PT, ro-RO, hu-HU
156324en-US, hu-HU, en-USen-US, pt-PT, bg-BG, pt-PT, se-SV
163254en-USen-US, se-SV, en-US, se-SV, no-NO, pl-PL

In the above examples;
titleID 156324 should have one en-US removed from the Audio field and one pt-PT removed from the Sub field
titleID 163254 should have one en-US and one se-SV removed from the Sub field.

I have been working with Access and VBA to some extent but all self-learned so there are definitely gaps in my knowledge, most likely even with the basics so please bear with me.

Any help or direction would be much appreciated.

Best
 

June7

AWF VIP
Local time
Today, 06:34
Joined
Mar 9, 2014
Messages
5,490
Last edited:

arnelgp

..forever waiting... waiting for jellybean!
Local time
Today, 22:34
Joined
May 7, 2009
Messages
19,246
create a function in Module that will clean your string fields:
Code:
'arnelgp
Public Function fnRemoveDup(ByVal p As Variant, Optional ByVal delim As String = ", ")
Dim var As Variant, itm As Variant
Dim dict As Object, i As Long
Dim ret As String
    fnRemoveDup = p
    p = p & ""
    If Len(p) < 1 Then
        Exit Function
    End If
    var = Split(p, delim)
    Set dict = CreateObject("scripting.dictionary")
    On Error Resume Next
    For Each itm In var
        itm = Trim$(itm & "")
        If Len(itm) <> 0 Then
            dict.Add Item:=itm, Key:=UCase(itm)
        End If
    Next
    For i = 0 To dict.Count - 1
        ret = ret & dict.Items()(i) & delim
    Next
    If Len(ret) <> 0 Then
        ret = Left$(ret, Len(ret) - Len(delim))
    End If
    fnRemoveDup = ret
End Function

create an Update Query that will update your table and remove dups from the fields.

UPDATE yourTableName SET yourTableName .Audio = fnRemoveDup([Audio],", "), yourTableName .Sub = fnRemoveDup([Sub],", ");
 

Eugene-LS

Registered User.
Local time
Today, 17:34
Joined
Dec 7, 2018
Messages
481
titleID 163254 should have one en-US and one se-SV removed from the Sub field.
And one more:
Code:
Public Function UnicalValues(vVal) As Variant
'   ?UnicalValues("en-US, hu-HU, en-US")
'   ?UnicalValues("en-US, se-SV, en-US, se-SV, no-NO, pl-PL")
'----------------------------------------------------------------------------------------------
Dim vArr() As String
Dim iVal%, iValRes%, sVal$, bDone As Boolean
Const sPepeatReplace As String = "--"

On Error GoTo UnicalValues_Err
    If Len(vVal & "") = 0 Then GoTo UnicalValues_End
  
    sVal = Replace(vVal, " ", "")
    vArr = Split(sVal, ",")
  
    For iVal = 0 To UBound(vArr)
        sVal = vArr(iVal)
      
        If Not sVal = sPepeatReplace Then
            bDone = False
            For iValRes = iVal To UBound(vArr)
                If sVal = vArr(iValRes) Then
                    vArr(iValRes) = sPepeatReplace
                    If bDone = False Then
                        UnicalValues = UnicalValues & ", " & sVal
                        bDone = True
                    End If
                End If
            Next iValRes
        End If
    Next iVal
    UnicalValues = Mid(UnicalValues, 3)
UnicalValues_End:
    Exit Function

UnicalValues_Err:
    'MsgBox "Error " & Err.Number & " (" & Err.Description & ") in Function" & _
           "UnicalValues - 00_Tests.", vbCritical, "Error!"
    'Debug.Print "UnicalValues_Line: " & Erl & "."
    Err.Clear
    Resume UnicalValues_End
End Function
 
Last edited:

spike_access

New member
Local time
Today, 16:34
Joined
Mar 22, 2022
Messages
14
Actually, if you wouldn't mind, I do have another question related to these languages.

Once I've cleaned the data I need to compare the values to another field which contains language codes.

So basically compare the value in tblQMs.[Audio] to the value in tblContentTracker.[Audio]. If all values in tblContentTracker.[Audio] exists in tblQMs.[Audio] where tblQMs.[titleID] = tblContentTracker.[skinny title].
Depending on if all defined languages exist in tblQMs.[Audio] I want to be able to set a status "COMPLETED" in another field. Furthermore the ultimate fallback if not all exists, instead of the status "COMPLETED" it should list the languages that did exist.

One caveat is that the order of the languages could be different, and the languages in tblContentTracker.[Audio] is only a two letter code as opposed to 4 letter code as in tblQMs.[Audio]. (I can import the tblQMs.Audio data into tblContentTracker if that would make things easier).

Example data:

tblContentTracker.AudiotblQMs.AudioStatus
EN, PT, HUen-US, pt-PT, hu-HU, sl-SLCOMPLETED
EN, CZ, SKen-US, sk-SKEN, SK
EN, BG[keep blank]

My current solution is messy, and requires me to update the code whenever a new language code is added. I basically clean the data in tblQMs.Audio depending on which languages are expected, so one UPDATE per defined language;

1: UPDATE tblContentTracker INNER JOIN tblQMs ON tblContentTracker.[Skinny title] = tblQMs.[Title Id] SET tblContentTracker.[Audio QM status] = [tblQMs].[Audio] WHERE (((tblContentTracker.Audio) Like ""*EN*"") AND ((tblQMs.[Audio]) Like ""*EN*""));

2: UPDATE tblContentTracker INNER JOIN tblQMs ON tblContentTracker.[Skinny title] = tblQMs.[Title Id] SET tblContentTracker.[Audio QM status] = [tblContentTracker].[Audio delivery status] & "", "" & [tblQMs].[Audio] WHERE (((tblContentTracker.Audio) Like ""*IS*"") AND ((tblQMs.[Audio]) Like ""*IS*""));

and so forth...


Any ideas on how to make this more clean and future-proof?

Best
 

Eugene-LS

Registered User.
Local time
Today, 17:34
Joined
Dec 7, 2018
Messages
481
if not all exists, instead of the status "COMPLETED" it should list the languages that did exist.
Code:
Public Function GetStatus(vLeng, vAudio)
'   ?GetStatus("EN, PT, HU", "en-US, pt-PT, hu-HU, sl-SL")
'   ?GetStatus("EN, CZ, SK", "en-US, sk-SK")
'   ?GetStatus("EN, BG",Null)
'----------------------------------------------------------------------------------------------
Dim sArr() As String
Dim sVal$, iVal%, iXVal%, iCount%
On Error GoTo GetStatus_Err
   
    If Len(vLeng & "") = 0 Then GoTo GetStatus_End
    If Len(vAudio & "") = 0 Then GoTo GetStatus_End
   
    sVal = Replace(vLeng, " ", "")
    sArr = Split(sVal, ",")

    iXVal = UBound(sArr)
    For iVal = 0 To iXVal
        If InStr(1, vAudio, sArr(iVal)) > 0 Then
            iCount = iCount + 1
            GetStatus = GetStatus & ", " & sArr(iVal)
        End If
    Next iVal

    If iCount = iXVal + 1 Then
        GetStatus = "COMPLETED"
    Else
        GetStatus = Mid(GetStatus, 3)
    End If

GetStatus_End:
    Exit Function

GetStatus_Err:
    'MsgBox "Error " & Err.Number & " (" & Err.Description & ") in Function" & _
           "GetStatus - 00_Tests.", vbCritical, "Err!"
    'Debug.Print "GetStatus_Line: " & Erl & "."
    Err.Clear
    Resume GetStatus_End
End Function
 

arnelgp

..forever waiting... waiting for jellybean!
Local time
Today, 22:34
Joined
May 7, 2009
Messages
19,246
here is another function:
Code:
Public Function IsAudioLangComplete(ByVal a As Variant, B As Variant) As Boolean
    'arnelgp
    'compare if A (audio language) can be found in B
    '
    'Note:
    ' A     =   tblContentTracker.Audio
    ' B     =   tblQMs.Audio
    '
    Dim var As Variant
    Dim dict As Object
    Dim i As Integer, j As Integer
    a = a & ""
    B = B & ""
    If Len(a) = 0 Or Len(B) = 0 Then
        Exit Function
    End If
    Set dict = CreateObject("scripting.dictionary")
    var = Split(A, ",")
    For i = 0 To UBound(var)
        var(i) = UCase(Trim$(var(i)))
        If Len(var(i)) <> 0 Then
            dict.Add Item:=var(i), Key:=var(i)
        End If
    Next
    var = Split(B, ",")
    For i = 0 To UBound(var)
        var(i) = UCase(Trim$(var(i)))
        If Len(var(i)) <> 0 Then
            var(i) = Split(var(i), "-")(0)
            j = j - dict.exists(var(i))
        End If
    Next
    IsAudioLangComplete = (j = dict.Count)
    Set dict = Nothing
End Function

your query:

UPDATE tblContentTracker
INNER JOIN
tblQMs
ON tblContentTracker.[Skinny title] = tblQMs.[Title Id]
SET tblContentTracker.[Audio QM status] =
IIF(IsAudioLangComplete([tblQMS.Audio, tblContentTracker.Audio), "COMPLETED", tblContentTracker.[Audio QM status])
 
Last edited:

arnelgp

..forever waiting... waiting for jellybean!
Local time
Today, 22:34
Joined
May 7, 2009
Messages
19,246
change the function:
Code:
Public Function IsAudioLangComplete(ByVal a As Variant, B As Variant) As String
    'arnelgp
    'compare if A (audio language) can be found in B
    '
    'Note:
    ' A     =   tblContentTracker.Audio
    ' B     =   tblQMs.Audio
    '
    Dim var As Variant
    Dim dict As Object
    Dim i As Integer, j As Integer
    Dim ret As String
    a = a & ""
    B = B & ""
    If Len(a) = 0 Or Len(B) = 0 Then
        Exit Function
    End If
    Set dict = CreateObject("scripting.dictionary")
    var = Split(a, ",")
    For i = 0 To UBound(var)
        var(i) = UCase(Trim$(var(i)))
        If Len(var(i)) <> 0 Then
            dict.Add Item:=var(i), Key:=UCase(var(i))
        End If
    Next
    var = Split(B, ",")
    For i = 0 To UBound(var)
        var(i) = UCase(Trim$(var(i)))
        If Len(var(i)) <> 0 Then
            var(i) = Split(var(i), "-")(0)
            If dict.exists(UCase(var(i))) Then
                j = j + 1
                ret = ret & dict.Item(UCase(var(i))) & ", "
            End If
        End If
    Next
    If (j = dict.Count) Then
        IsAudioLangComplete = "COMPLETED"
    Else
        If Len(ret) Then
            IsAudioLangComplete = Left$(ret, Len(ret) - 2)
        End If
    End If
    Set dict = Nothing
End Function

the query now:

UPDATE tblContentTracker
INNER JOIN
tblQMs
ON tblContentTracker.[Skinny title] = tblQMs.[Title Id]
SET tblContentTracker.[Audio QM status] =
IsAudioLangComplete(tblContentTracker.Audio , tblQMS.Audio)
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 10:34
Joined
Feb 19, 2002
Messages
43,431
Depending on if all defined languages exist in tblQMs.[Audio]
In your first post, you told us there was NOT a defined set of values for languages.

Just because the data comes to you in an unnormalized format doesn't mean that you need to perpetuate the crime. Normalizing will allow you to use RI to solve the duplication problem and will also sort them consistently which you haven't asked about yet.

If there truly is a definitive list of languages, then you can keep them in a table and joining to that table and counting the results will let you know if the list is complete.
 

spike_access

New member
Local time
Today, 16:34
Joined
Mar 22, 2022
Messages
14
Thank you all for the help! It did the trick

@Pat Hartman the data in the tblContentTracker.Audio is two letter language code(s) based on what is deemed the expected languages for a given region. However, the tblQMs.Audio holds ALL available languages to that content. (Yes we're talking video content with audio/sub tracks). The same content i.e. video package, could be used across multiple regions/countries. But only the necessary languages for a given territory (Romania requires Romanian, Poland requires Polish etc) matters. It becomes more complex since the original language could potentially be any given language in the world, and building that database with all the possible language codes (two letter format, four letter format etc) is a bit too time consuming.
 

Users who are viewing this thread

Top Bottom