extract an email address from a large textfile

KO_BayState

New member
Local time
Today, 07:38
Joined
Feb 18, 2014
Messages
4
I am trying to find a way to extract an email from a large text file that is an output from our email system. I would like to be able to extract the email address using a query or collection of queries. I have been able to extract all of the text that contains the @ symbol. From their I created a query expression:IE: Mid([field1],InStrRev([field1]," ")) that captures some but not everything I need.

well I was going to post some of the outputs but I guess I havent posted enough on this forum to show mock email addresses. anyways If some one has any ideas please let me know.

Thanks

Kevin
 

Attachments

Last edited:
Hello KO_BayState, Welcome to AWF :)

You can add some sample data and the desired result in a TXT file, ZIP the file and upload it here, so we can guide you from there. As seeing what you have, and what you have done so far would be really helpful for us to help you.
 
Thanks for the quick reply and the warm welcome! In addition to the raw text file I have attached on my initial post I have included an example output from the query I posted in my first post. The "testdata" file is the raw text file and "testdata1" is the output after I imported the text file into Access, ran a query to only include lines with the "@" symbol and then ran the expression IE: Mid([field1],InStrRev([field1]," "))
 

Attachments

So I have been looking at another solution to a similar problem and I have a couple of problems but it seems to work slightly better. I've changed the expression to Mid([field1],InStrRev([field1]," ",(,InStr([field1],"@"))+1) and I need to remove <, >, (, ), . and any other miscellaneous characters infront and or behind the email address
 
Looking at the .txt files, I can see numerous email addresses within them. Are you looking for a specific one?
 
I am trying to extract all of the emails within the text file. I just need to get rid of all of the noise around the email address for the most part. whether that means creating a separate query in a process or incorporating additional components into the existing expression.
 
Bit of a Frankenstein's Module this, given it's made of up some code copies from different internet sources and I've written up a bit. Loose edges all round, etc, and can be used as a jumping off point to refine the process. But it does at least return the individual email addresses in a text file (concatenated together with a semi-colon as delimiter).

Copy all the code below into a new module. Now you can just call GetEmails() and pass the full file path in that function and it will return all your email addresses.

Bear in mind that it will still return some of the junkier looking ones, purely because they meet the email address format rules.

Code:
Option Compare Database
Option Explicit

' http://www.regular-expressions.info/email.html
' http://regexlib.com/Search.aspx?k=email&c=-1&m=-1&ps=100
Private Const strRFC2822 = "[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!" & _
                        "#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:" & _
                        "[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:" & _
                        "[a-z0-9-]*[a-z0-9])?"



Public Function GetEmails(ByVal strFilePath As String) As String

    Dim lngCounter As Long
    Dim astrEmails() As String
    Dim astrUnique() As String
    Dim strEmails As String
    
    astrEmails = Split(GetAllEmails(strFilePath), ";")

    BubbleSortRecords astrEmails()

    GetUnique astrEmails(), astrUnique()

    For lngCounter = LBound(astrUnique()) To UBound(astrUnique())
        strEmails = strEmails & astrUnique(lngCounter) & ";"
    Next lngCounter

    GetEmails = strEmails

End Function 'GetEmails



Private Sub GetUnique(ByRef astrEmails() As String, ByRef astrUnique() As String)

    ' Source: http://stackoverflow.com/questions/3017852/vba-get-unique-values-from-array
    Dim lngCounter As Long
    Dim var As Variant
    Dim objDict As Object
    
    Set objDict = CreateObject("Scripting.Dictionary")
    
    For lngCounter = LBound(astrEmails()) To UBound(astrEmails())
        objDict(astrEmails(lngCounter)) = 1
    Next lngCounter
    
    lngCounter = 0
    
    For Each var In objDict.Keys()
        ReDim Preserve astrUnique(lngCounter)
        astrUnique(lngCounter) = var
        lngCounter = lngCounter + 1
    Next var

End Sub ' GetUnique

Private Sub BubbleSortRecords(ByRef astrText() As String)

    ' This will sort the email addresses in ascending alphabetical order
    ' Not the most efficient way - look up quick sort - but a quick win
    
    Dim lngInner As Long
    Dim lngOuter As Long
    Dim strTemp As String
    
    For lngOuter = LBound(astrText()) To (UBound(astrText()) - 1)
        For lngInner = (lngOuter + 1) To UBound(astrText())
            If astrText(lngOuter) > astrText(lngInner) Then
                strTemp = astrText(lngOuter)
                astrText(lngOuter) = astrText(lngInner)
                astrText(lngInner) = strTemp
            End If
        Next lngInner
    Next lngOuter
    
End Sub ' BubbleSortRecords


Private Function GetAllEmails(ByVal strFilePath As String) As String

    Dim lngFileNumber As Integer
    Dim strData As String
    Dim strText As String
    Dim strEmail As String
    
    lngFileNumber = FreeFile()
    
    Open strFilePath For Input As #lngFileNumber
    
    While Not EOF(lngFileNumber)
        Line Input #lngFileNumber, strData
        strData = Replace(strData, "=", " ")
        strData = Replace(strData, "<", " ")
        strData = Replace(strData, ">", " ")
        strData = Trim(strData)
        strEmail = GetEmailFromLine(strData)
        If strEmail <> vbNullString Then
            strText = strText & strEmail & ";"
        End If
    Wend
    
    Close #lngFileNumber

    GetAllEmails = strText

End Function ' GetAllEmails

Private Function GetEmailFromLine(ByVal strText As String) As String

    Dim astrTemp() As String
    Dim lngCounter As Long
    
    astrTemp() = Split(strText, " ")
    For lngCounter = LBound(astrTemp()) To UBound(astrTemp())
        If IsValidEmail(astrTemp(lngCounter)) Then
            GetEmailFromLine = astrTemp(lngCounter)
            Exit Function
        End If
    Next lngCounter
    
End Function ' GetEmailFromLine


Private Function IsValidEmail(ByVal strEmail As String) As Boolean
    Dim objRegEx As Object
    On Error GoTo Fin
    Set objRegEx = CreateObject("Vbscript.Regexp")
    With objRegEx
        .Pattern = strRFC2822
        .IgnoreCase = True
        IsValidEmail = .Test(strEmail)
    End With
Fin:
    Set objRegEx = Nothing
    If Err.Number <> 0 Then MsgBox "Error: " & _
        Err.Number & " " & Err.Description
End Function ' IsEmailAddress
 

Users who are viewing this thread

Back
Top Bottom