Home All Groups Group Topic Archive Search About

extracting email addresses from text string using regex

Author
27 Dec 2006 7:53 PM
Khalid Rahaman
I am attempting to extract the email addresses from the body of an email. I
download the email to my app, extract the subject etc but i now want to also
extract all email addresses from the body of this email.

I have been trying the following code using regex but i only get results if
i enter an email address alone in the textbox, as long as i combine it with
other text i get no results.

My Code so far
----------------------------------
Dim textstring As String = TextBox1.Text
Dim emailpattern As String =
"^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|[ccc](([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$"

Dim r As New Regex(emailpattern, RegexOptions.IgnorePatternWhitespace +
RegexOptions.IgnoreCase + RegexOptions.Multiline)

Dim m As Match = r.Match(textstring.ToString)
While m.Success
  ListBox3.Items.Add(m.ToString())
  m = m.NextMatch()
End While

------------------------------------

I have also used the split command to split the entire contents of the
textbox into an array using single space as the delimiter and running regex
on each element of the array, this works great except sometimes when the
email address is at the end of the line and therefore no space comes after
it, instead the enter key was pressed.

My Code for this
-----------------------------------

Dim textstring As Array
textstring = Split(TextBox1.Text) ' no delimeter means use space
Dim emailpattern As String =
"^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$"

Dim i As Integer
Dim r As New Regex(emailpattern, RegexOptions.IgnoreCase +
RegexOptions.Multiline)
For i = 0 To textstring.GetLength(0) - 1
  Dim m As Match = r.Match(textstring(i))
  ListBox2.Items.Add(textstring(i))
  While m.Success
    ListBox3.Items.Add(m.ToString())
    m = m.NextMatch()
  End While
Next i

------------------------

Any help would really be appreciated.

Thanks in Advance

Khalid Rahaman

Author
27 Dec 2006 8:25 PM
Rad [Visual C# MVP]
On Wed, 27 Dec 2006 11:53:00 -0800, Khalid Rahaman wrote:

Show quote
> I am attempting to extract the email addresses from the body of an email. I
> download the email to my app, extract the subject etc but i now want to also
> extract all email addresses from the body of this email.
>
> I have been trying the following code using regex but i only get results if
> i enter an email address alone in the textbox, as long as i combine it with
> other text i get no results.
>
> My Code so far
> ----------------------------------
> Dim textstring As String = TextBox1.Text
> Dim emailpattern As String =
> "^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|[ccc](([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$"
>
> Dim r As New Regex(emailpattern, RegexOptions.IgnorePatternWhitespace +
> RegexOptions.IgnoreCase + RegexOptions.Multiline)
>
> Dim m As Match = r.Match(textstring.ToString)
> While m.Success
>   ListBox3.Items.Add(m.ToString())
>   m = m.NextMatch()
> End While
>
> ------------------------------------
>
> I have also used the split command to split the entire contents of the
> textbox into an array using single space as the delimiter and running regex
> on each element of the array, this works great except sometimes when the
> email address is at the end of the line and therefore no space comes after
> it, instead the enter key was pressed.
>
> My Code for this
> -----------------------------------
>
> Dim textstring As Array
> textstring = Split(TextBox1.Text) ' no delimeter means use space
> Dim emailpattern As String =
> "^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$"
>
> Dim i As Integer
> Dim r As New Regex(emailpattern, RegexOptions.IgnoreCase +
> RegexOptions.Multiline)
> For i = 0 To textstring.GetLength(0) - 1
>   Dim m As Match = r.Match(textstring(i))
>   ListBox2.Items.Add(textstring(i))
>   While m.Success
>     ListBox3.Items.Add(m.ToString())
>     m = m.NextMatch()
>   End While
> Next i
>
> ------------------------
>
> Any help would really be appreciated.
>
> Thanks in Advance
>
> Khalid Rahaman

Remove the ^ and the $. These mean that the entire text to be matched must
be the email address (^ matches the beginning of text and $ matches the end
of the text
Author
27 Dec 2006 8:58 PM
Khalid Rahaman
Excellent, thank you very much. Solved my problem just like that.

Show quote
"Rad [Visual C# MVP]" wrote:

> On Wed, 27 Dec 2006 11:53:00 -0800, Khalid Rahaman wrote:
>
> > I am attempting to extract the email addresses from the body of an email. I
> > download the email to my app, extract the subject etc but i now want to also
> > extract all email addresses from the body of this email.
> >
> > I have been trying the following code using regex but i only get results if
> > i enter an email address alone in the textbox, as long as i combine it with
> > other text i get no results.
> >
> > My Code so far
> > ----------------------------------
> > Dim textstring As String = TextBox1.Text
> > Dim emailpattern As String =
> > "^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|[ccc](([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$"
> >
> > Dim r As New Regex(emailpattern, RegexOptions.IgnorePatternWhitespace +
> > RegexOptions.IgnoreCase + RegexOptions.Multiline)
> >
> > Dim m As Match = r.Match(textstring.ToString)
> > While m.Success
> >   ListBox3.Items.Add(m.ToString())
> >   m = m.NextMatch()
> > End While
> >
> > ------------------------------------
> >
> > I have also used the split command to split the entire contents of the
> > textbox into an array using single space as the delimiter and running regex
> > on each element of the array, this works great except sometimes when the
> > email address is at the end of the line and therefore no space comes after
> > it, instead the enter key was pressed.
> >
> > My Code for this
> > -----------------------------------
> >
> > Dim textstring As Array
> > textstring = Split(TextBox1.Text) ' no delimeter means use space
> > Dim emailpattern As String =
> > "^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$"
> >
> > Dim i As Integer
> > Dim r As New Regex(emailpattern, RegexOptions.IgnoreCase +
> > RegexOptions.Multiline)
> > For i = 0 To textstring.GetLength(0) - 1
> >   Dim m As Match = r.Match(textstring(i))
> >   ListBox2.Items.Add(textstring(i))
> >   While m.Success
> >     ListBox3.Items.Add(m.ToString())
> >     m = m.NextMatch()
> >   End While
> > Next i
> >
> > ------------------------
> >
> > Any help would really be appreciated.
> >
> > Thanks in Advance
> >
> > Khalid Rahaman
>
> Remove the ^ and the $. These mean that the entire text to be matched must
> be the email address (^ matches the beginning of text and $ matches the end
> of the text
> --
> Bits.Bytes
> http://bytes.thinkersroom.com
>

AddThis Social Bookmark Button