Hey folks,
I'm building a help-desk application that converts incoming emails to trouble tickets, scarping various details about the inquiry from the body and subject of the message.
Some of the emails come directly from clients, making identifying the client fairly simple - I simply compare the domain of a message's SenderEmailAddress property to a table of clients and match it up accordingly.
Unfortunately, some of the emails are forwarded by other employees of our company, meaning the SenderEmailAddress will always return "xxx@ourcompany.com" .
As such, I created a rather messy bunch of code that looks for email addresses within the body of the email. First, it searches for "From:" as this is generally the first line of a header block of a forwarded/replied to email. From there, it searches for the "@" then pieces together an email address based on valid characters in front of the "@", valid characters between the "@" and the next "." and the three letters after the ".". (Yes, I know that the length of domain suffixes vary - those that aren't 3 characters are so rare, though, that they'll fall within our acceptable "undertermined" ranges.)
It works, mostly, but it's kind of unwieldy - has anyone else done something like this? I'd be interested in any alternatives. Also, while doing research I stumbled upon a few cool checks, like if the len(emailAddress) < 8, it's likely an invalid email address. Any other nifty little tricks like that I could use? Again, not looking for perfection, but if there are few things I can add to the code that will elminate the bulk of the non-email addresses, it would be very helpful.
Thanks!
Will
I'm building a help-desk application that converts incoming emails to trouble tickets, scarping various details about the inquiry from the body and subject of the message.
Some of the emails come directly from clients, making identifying the client fairly simple - I simply compare the domain of a message's SenderEmailAddress property to a table of clients and match it up accordingly.
Unfortunately, some of the emails are forwarded by other employees of our company, meaning the SenderEmailAddress will always return "xxx@ourcompany.com" .
As such, I created a rather messy bunch of code that looks for email addresses within the body of the email. First, it searches for "From:" as this is generally the first line of a header block of a forwarded/replied to email. From there, it searches for the "@" then pieces together an email address based on valid characters in front of the "@", valid characters between the "@" and the next "." and the three letters after the ".". (Yes, I know that the length of domain suffixes vary - those that aren't 3 characters are so rare, though, that they'll fall within our acceptable "undertermined" ranges.)
It works, mostly, but it's kind of unwieldy - has anyone else done something like this? I'd be interested in any alternatives. Also, while doing research I stumbled upon a few cool checks, like if the len(emailAddress) < 8, it's likely an invalid email address. Any other nifty little tricks like that I could use? Again, not looking for perfection, but if there are few things I can add to the code that will elminate the bulk of the non-email addresses, it would be very helpful.
Thanks!
Will