View Full Version : Suggestions for Name Algorithm


Lightwave
03-30-2009, 02:07 AM
Dear All

I was wondering if anyone can point me in the direction of any algorithms that would convert a name eg

Barack Obama

into a number eg

345

In some of the sports races I've been timing we sometimes number the competitors and it would be useful if the numbering had some hard relationship to the person rather than just through a stored database somewhere.

Might be a bad idea but thought I'd investigate it anyway

adamcoppard
03-30-2009, 02:31 AM
I seem to remember (but can't remember what it's called!) a way of converting alphabetical data, into numerical data. I can't remember what it's called, but I'll have a Google. Unless, you use each persons UID as their competitor number aswell?

gemma-the-husky
03-30-2009, 02:48 AM
what you need is called a hashing function, i think

you need something that examines and processes each character, one at a time

the problem is that you need an algorithm that will generate a unique result, as otherwise any anagram of a name will give the same result, as well as probably many other names - so it certainly wont produce a small number such as 345. (although the videoplus system seems to produce some surprisngly small numbers at times)

[edited - ive just checked - a hashing function genarally generates a limited number of possible values for a given hash (termed a bucket) - so you then need a way of distinguishing the selection from the candidate values - wikipedia article was interesting and thorough

i suppose one way of doing this would be to take just the first two chars of each name BaOb - and convert these to a number in some way eg
alphabetposition(first letter)*1 + alphabetposition(secondletter)*2 + + alphabetposition(thirdletter)*3 + + alphabetposition(fourthletter)*4
it just depends how often somethingl ike this produces the same hashed value for different names - and how you THEN distinguish between them.

ie if Barack Obama generates 345, fine - but
if Tommy Smith then also generates 345, how do you resolve the clash?

Dennisk
04-02-2009, 04:54 AM
its called soundex. here is an example and a helper function


Function Soundex(LastName As String)

Dim i As Integer, j As Integer, Str_Len As Integer
Dim SCode As String, PrevCode As String, strResult As String, CharTemp As String * 1

If LastName = "" Then
Soundex = ""
Exit Function
End If

If Len(LastName) < 3 Then
Soundex = LastName
Exit Function
End If

LastName = Get_Name(LastName)
Str_Len = Len(LastName)

j = 0
i = 0
PrevCode = "0"
Do While (i < Str_Len And j < 4)
i = i + 1

CharTemp = Mid$(LastName, i, 1)

Select Case CharTemp
Case "R"
SCode = "6"
Case "M", "N"
SCode = "5"
Case "L"
SCode = "4"
Case "D", "T"
SCode = "3"
Case "C", "G", "J", "K", "Q", "S", "X", "Z"
SCode = "2"
Case "B", "F", "P", "V"
SCode = "1"
Case Else
SCode = "0"
End Select

If CharTemp = "H" Or CharTemp = "W" Then
SCode = PrevCode
End If

If (SCode > "0" Or j = 0) Then
If (SCode <> PrevCode Or j = 0) Then
strResult = strResult + SCode
j = j + 1
End If
End If

If j = 0 Then
j = j + 1
End If

PrevCode = SCode
Loop

i = j
Do While (i <= 4)
strResult = strResult + "0"
i = i + 1
Loop

Soundex = Left(LastName, 1) + Mid$(strResult, 2, 3)

End Function


'------------------------------------------------
' |
' This function gets the name and cleans it up |
' so that it can be soundexed. |
' |
'------------------------------------------------

Function Get_Name(inLastName As String) As String
Dim i As Integer, Str_Len As Integer
Dim LastName As String, Str1 As String, Str2 As String, ch As String * 1, inString As String

inString = UCase$(Trim(inLastName))
Str_Len = Len(inString)

If (Mid$(inString, 1, 3) = "ST.") Then
inString = "SAINT" + Right$(inString, Str_Len - 3)
Str_Len = Str_Len + 2
End If

If (Mid$(inString, 1, 3) = "ST ") Then
inString = "SAINT" + Right$(inString, Str_Len - 3)
Str_Len = Str_Len + 2
End If

For i = 1 To Str_Len
ch = Mid$(inString, i, 1)

If (ch >= "A" And ch <= "Z") Then
LastName = LastName + ch
End If

If ch = "," Then
i = Str_Len
End If

Next i

Get_Name = LastName

End Function

Atomic Shrimp
04-03-2009, 04:43 AM
Soundex is really cool - the algorithm is nearly a century old

It also won't generate unique values though - in fact, there isn't going to be an algorithm that generates unique, shorter output for every possible input (Shannon's Theorem).