Lightwave
03-30-2009, 02:07 AM
Dear All
I was wondering if anyone can point me in the direction of any algorithms that would convert a name eg
Barack Obama
into a number eg
345
In some of the sports races I've been timing we sometimes number the competitors and it would be useful if the numbering had some hard relationship to the person rather than just through a stored database somewhere.
Might be a bad idea but thought I'd investigate it anyway
adamcoppard
03-30-2009, 02:31 AM
I seem to remember (but can't remember what it's called!) a way of converting alphabetical data, into numerical data. I can't remember what it's called, but I'll have a Google. Unless, you use each persons UID as their competitor number aswell?
gemma-the-husky
03-30-2009, 02:48 AM
what you need is called a hashing function, i think
you need something that examines and processes each character, one at a time
the problem is that you need an algorithm that will generate a unique result, as otherwise any anagram of a name will give the same result, as well as probably many other names - so it certainly wont produce a small number such as 345. (although the videoplus system seems to produce some surprisngly small numbers at times)
[edited - ive just checked - a hashing function genarally generates a limited number of possible values for a given hash (termed a bucket) - so you then need a way of distinguishing the selection from the candidate values - wikipedia article was interesting and thorough
i suppose one way of doing this would be to take just the first two chars of each name BaOb - and convert these to a number in some way eg
alphabetposition(first letter)*1 + alphabetposition(secondletter)*2 + + alphabetposition(thirdletter)*3 + + alphabetposition(fourthletter)*4
it just depends how often somethingl ike this produces the same hashed value for different names - and how you THEN distinguish between them.
ie if Barack Obama generates 345, fine - but
if Tommy Smith then also generates 345, how do you resolve the clash?
Dennisk
04-02-2009, 04:54 AM
its called soundex. here is an example and a helper function
Function Soundex(LastName As String)
Dim i As Integer, j As Integer, Str_Len As Integer
Dim SCode As String, PrevCode As String, strResult As String, CharTemp As String * 1
If LastName = "" Then
Soundex = ""
Exit Function
End If
If Len(LastName) < 3 Then
Soundex = LastName
Exit Function
End If
LastName = Get_Name(LastName)
Str_Len = Len(LastName)
j = 0
i = 0
PrevCode = "0"
Do While (i < Str_Len And j < 4)
i = i + 1
CharTemp = Mid$(LastName, i, 1)
Select Case CharTemp
Case "R"
SCode = "6"
Case "M", "N"
SCode = "5"
Case "L"
SCode = "4"
Case "D", "T"
SCode = "3"
Case "C", "G", "J", "K", "Q", "S", "X", "Z"
SCode = "2"
Case "B", "F", "P", "V"
SCode = "1"
Case Else
SCode = "0"
End Select
If CharTemp = "H" Or CharTemp = "W" Then
SCode = PrevCode
End If
If (SCode > "0" Or j = 0) Then
If (SCode <> PrevCode Or j = 0) Then
strResult = strResult + SCode
j = j + 1
End If
End If
If j = 0 Then
j = j + 1
End If
PrevCode = SCode
Loop
i = j
Do While (i <= 4)
strResult = strResult + "0"
i = i + 1
Loop
Soundex = Left(LastName, 1) + Mid$(strResult, 2, 3)
End Function
'------------------------------------------------
' |
' This function gets the name and cleans it up |
' so that it can be soundexed. |
' |
'------------------------------------------------
Function Get_Name(inLastName As String) As String
Dim i As Integer, Str_Len As Integer
Dim LastName As String, Str1 As String, Str2 As String, ch As String * 1, inString As String
inString = UCase$(Trim(inLastName))
Str_Len = Len(inString)
If (Mid$(inString, 1, 3) = "ST.") Then
inString = "SAINT" + Right$(inString, Str_Len - 3)
Str_Len = Str_Len + 2
End If
If (Mid$(inString, 1, 3) = "ST ") Then
inString = "SAINT" + Right$(inString, Str_Len - 3)
Str_Len = Str_Len + 2
End If
For i = 1 To Str_Len
ch = Mid$(inString, i, 1)
If (ch >= "A" And ch <= "Z") Then
LastName = LastName + ch
End If
If ch = "," Then
i = Str_Len
End If
Next i
Get_Name = LastName
End Function
Atomic Shrimp
04-03-2009, 04:43 AM
Soundex is really cool - the algorithm is nearly a century old
It also won't generate unique values though - in fact, there isn't going to be an algorithm that generates unique, shorter output for every possible input (Shannon's Theorem).