Randomizing Data for Posted db's (1 Viewer)

Status
Not open for further replies.

Micron

AWF VIP
Local time
Today, 12:16
Joined
Oct 20, 2018
Messages
3,409
We often ask Original Poster to upload a copy of their db but the issue might be that it contains sensitive data. Complete data removal may not allow us to troubleshoot the OP's issue because queries, forms, etc. need records in order to observe or trouble shoot the problem.

Here's a little function that we could point them to whereby they could, ON A COPY OF THEIR DB, randomize string field data by using an update query that calls the function below. This could solve the issue of revealing sensitive data by randomizing all or part of a string field. The function call goes into the UPDATE TO row of the query design grid and a sample sql statement would look like
Code:
UPDATE tbl1 SET tbl1.field1 = randomizedata(tbl1.field1,3);
The supposition is that it may be desirable to retain n characters at the beginning of the string and randomize the rest, hence the numeric parameter. If not, use 0 (untested). The function call would go into each text field that you want to randomize, using the proper field reference of course. The function being called is:
Code:
Function RandomizeData(strField As String, i As Integer) As String
Dim str1 As String, str2 As String

'i= count of characters from beginning of string that are to be retained. 
'Pass to Left function and assign to str1
 str1 = Left(strField, i)

'make loop counter start at position number of NEXT character which is to be replaced. 
'End value is string length.
For i = i + 1 To Len(strField)

'make str2 equal current value + a random lower case character from a to z 
'and repeat to end of counter. Spaces are retained.

If Mid(strField, i, 1) = " " Then
    str2 = str2 + Mid(strField, i, 1)
Else
    str2 = str2 & Chr((121 - 97 + 1) * Rnd + 97)
    'Debug.Print str2
End If
Next

'piece left and right parts together and return to query
RandomizeData = str1 + str2

'e.g if input value is 'Donald Duck', output string with 1st 2 characters retained might be Dosnoh itat

End Function
***NOTE***
Randomizing data is only practical IF the values are NOT PK (Primary Key) values, or if they are and exist as meaningful data, then Cascade Updates should be set in relationships between a field to be randomized and any other table where those values exist as Foreign Keys. Randomizing one side of a relationship and not the other would be pointless.

Lastly I'd mention that the user should realize that the result of the function cannot be reverse engineered; i.e. once randomized there is no way to determine what the original values were.
 

isladogs

CID VIP
Local time
Today, 17:16
Joined
Jan 14, 2017
Messages
13,623
Using this is certainly a lot quicker than the manual randomising I've used in the past to create DEMO databases for my website.

This should be a very useful reference to suggest as a solution when OPs say they cannot upload data because its confidential.

I've tested it using 0 and all characters are randomised successfully.

It would be even better to expand this idea for text field that contain a mixture of alphanumeric characters using numbers where appropriate and similarly for number & date fields
 

Micron

AWF VIP
Local time
Today, 12:16
Joined
Oct 20, 2018
Messages
3,409
It would be even better to expand this idea for text field that contain a mixture of alphanumeric characters using numbers where appropriate and similarly for number & date fields
Do you mean to randomize the numbers but leave special characters alone (such as - , ( ) ' $ etc) or not randomize numbers and not special characters? Or something else?
 

isladogs

CID VIP
Local time
Today, 17:16
Joined
Jan 14, 2017
Messages
13,623
I was thinking about scrambling numbers as well as letters in fields such as tutor group (11BN), part number (RT23-TH5Q-WA) or date of birth (08/11/1982) but any special characters to stay unchanged

Ideally any scrambled date values would remain valid dates ….:)
 

Micron

AWF VIP
Local time
Today, 12:16
Joined
Oct 20, 2018
Messages
3,409
Sorry for the long reply, but it's a convoluted situation.

Have been playing with this and have done limited testing; not really sure how best to approach the suggestions with respect to dates but I made a decision.

First, I'd say that dates are a non-starter for me given the different formats at play. While one could probably tweak a date based on European or US format, I don't foresee much of a need. My supposition is that the type of data that most likely needs 'protection' is names and addresses, and to a lesser extent, phone numbers (will anyone worry that someone will get their hands on randomized names, dial up the real phone number and asks to speak to "Dybqw Nbplh")? I could say "don't use it on date fields because a date of 66/81/9054 will likely cause issues beyond the one(s) you're posting about" but someone will eventually do that so I have tried to cover that situation.

Changes to look for:
- If Null wasn't captured it balked

- if used on a date field the date had to be updated to what it already was, otherwise the field values became null or "" (I didn't check which). Updating to the same value seemed pointless and raised the confirmation prompt for no reason so I had to find a way to not pass anything back to a query in that case. Answer was to set it to Nothing, which raises error 91, which is handled.

- the fact that inputs could be anything means the variables had to be changed to variants

- former code would have altered Capital Letters and special characters. Should not do so now.

- some characters in the standard set and also extended characters beyond 122 are not altered.

I try not to code in a way that causes errors to be raised only to be ignored or suppressed, but in this case it seems necessary. One thing that just occurred to me is the possibility that this code might alter hidden characters such as those used between rtf tags. I hope that some of you will put this code to the test on different field data types as I am trying to catch up on a lot of other projects lately and don't have time to do a ton of testing.

Code:
Function RandomizeData(varField As Variant, i As Integer) As Variant
Dim var1 As Variant, var2 As Variant

On Error GoTo errHandler

'i= count of characters from beginning of string that are to be retained.
'Pass to Left function and assign to str1
 var1 = Left(varField, i)
 
 If IsDate(varField) Or IsNull(varField) Then
    RandomizeData = Nothing
    Exit Function
End If

'make loop counter start at position number of NEXT character which is to be replaced.
'End value is string length.
For i = i + 1 To Len(varField)

'make str2 equal current value + a random lower case character from a to z
'and repeat to end of counter.

  Select Case Asc(Mid(varField, i, 1))
    Case 0 To 47, 58 To 64, Is > 122 'don't alter these characters
      var2 = var2 + Mid(varField, i, 1)
      
    Case 48 To 57
      var2 = var2 & Chr((56 - 48 + 1) * Rnd + 48)
      
    Case 65 To 90
        var2 = var2 & Chr((89 - 65 + 1) * Rnd + 65)
        
    Case 97 To 121
      var2 = var2 & Chr((120 - 97 + 1) * Rnd + 97)
      
  End Select
Next

'piece left and right parts together and return to query
RandomizeData = var1 + var2
'e.g if input value is 'Donald Duck', output string with 1st 2 characters retained might be Dosnoh itat

exitHere:
Exit Function

errHandler:
If Err.Number = 91 Then
    Exit Function
Else
    MsgBox "Error " & Err.Number & ": " & Err.Description
End If

End Function
 

isladogs

CID VIP
Local time
Today, 17:16
Joined
Jan 14, 2017
Messages
13,623
Hello is that Dibqu Nibpool … apologies if I've mispronounced your name :D
Thanks for doing this.
I realised it would add complications but I've done some limited testing as follows
1. Text including retaining capital letters - perfect. I'm glad you made that change
2. Number or number/text fields such as 11BN - also perfect
3. Date fields are easy enough to handle
Instead of using RandomizeData(DOB,0), I used
Code:
CDate(randomizedata(CLng([DateOfBirth]),0))
This also means no need to worry about US or UK date formats
That could probably be incorporated into the code where you test for IsDate but I just did it in the query as it was a quick test.

No time to test rich text fields. Suggest possibly apply the Plaintext function in such cases before randomising?
 

Micron

AWF VIP
Local time
Today, 12:16
Joined
Oct 20, 2018
Messages
3,409
As written, it will alter rtf tags; e.g. <div> could become <arw> which will mess up the field value. As for dates, I tested with your expression and had a type conversion error for one record. I'm still not convinced that altering dates so dramatically will be important for those who have data they want to randomize, and it may mess up code or sql that uses operators on them. I suppose anyone could use your expression easily enough should they decide to randomize dates.

As for rtf, not sure when I'll have time to do anything but will try to get back to it. Even though it's raining, I still have stuff I'm supposed to be doing. :)
 

moke123

AWF VIP
Local time
Today, 12:16
Joined
Jan 11, 2013
Messages
1,599
I was away for a few days but this thread was on my mind. I was thinking that it might be difficult for some posters to execute code from a module.
I threw together a form that could be imported into a COPY of the database. You can then select the tables and fields to randomize. Currently it works on text, number, and date fields. The primary key field is excluded from the list as is boolean fields, ole, attachments, calculated.etc. Foreign keys are listed and caution should be used not to alter them.
If you update your randomization code, it shouldn't be too difficult to incorporate it into the form.
 

Attachments

isladogs

CID VIP
Local time
Today, 17:16
Joined
Jan 14, 2017
Messages
13,623
I mainly use RC4 encryption which provides a high level of protection though it is reversible … if you know the cipher key.
Without that, it is almost impossible to break.

Anyway, I'm going to reopen this locked thread in case Micron wishes to respond to it
 

MajP

You've got your good things, and you've got mine.
Local time
Today, 12:16
Joined
May 21, 2018
Messages
3,273
Helpful code. Someone PMed me a DB, and I randomized it to post back in case there was Proprietary info. Good Idea!
 
Status
Not open for further replies.

Users Who Are Viewing This Thread (Users: 0, Guests: 1)

Top Bottom