Home All Groups Group Topic Archive Search About
Author
21 Feb 2006 1:46 PM
jppop
Hi,

I want to parse string like "<number><trailing-char><title>" where <number>
is a string containing digits and dots, <trailing-char> a whitespace or a
semicolon and <title> any chars.

I use a regexp (framework 1.1) to parse the strings. Here is the code:

static Regex reNumSepTitle = new
Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$");

public static string ExtractTitle(string str) {
     string title = str.Trim();
     m = reNumSepTitle.Match(title);
     if ( m.Success ) {
           return m.Result("${title}");
     }
    return title;
}

When I call the method with "1.1:\tHeading", it returns "Heading".

With "1.1.1.1:\tHeading", I get "\tHeading" (and the number part is '1.').

What's wrong ?

Thanks.

Author
21 Feb 2006 6:38 PM
Kevin Spencer
Try either naming or not naming all of your groups.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
To a tea you esteem
a hurting back as a wallet.


Show quote
"jppop" <jp***@discussions.microsoft.com> wrote in message
news:AF8E4339-2445-46AE-84FA-2B11078B6484@microsoft.com...
> Hi,
>
> I want to parse string like "<number><trailing-char><title>" where
> <number>
> is a string containing digits and dots, <trailing-char> a whitespace or a
> semicolon and <title> any chars.
>
> I use a regexp (framework 1.1) to parse the strings. Here is the code:
>
> static Regex reNumSepTitle = new
> Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$");
>
> public static string ExtractTitle(string str) {
>     string title = str.Trim();
>     m = reNumSepTitle.Match(title);
>     if ( m.Success ) {
>           return m.Result("${title}");
>     }
>    return title;
> }
>
> When I call the method with "1.1:\tHeading", it returns "Heading".
>
> With "1.1.1.1:\tHeading", I get "\tHeading" (and the number part is '1.').
>
> What's wrong ?
>
> Thanks.
Author
22 Feb 2006 10:09 AM
jppop
Thank you for your post.
I have tried both different naming and anonymous group, but it still doesn't
work.


Show quote
"Kevin Spencer" wrote:

> Try either naming or not naming all of your groups.
>
> --
> HTH,
>
> Kevin Spencer
> Microsoft MVP
> ..Net Developer
> To a tea you esteem
> a hurting back as a wallet.
>
>
> "jppop" <jp***@discussions.microsoft.com> wrote in message
> news:AF8E4339-2445-46AE-84FA-2B11078B6484@microsoft.com...
> > Hi,
> >
> > I want to parse string like "<number><trailing-char><title>" where
> > <number>
> > is a string containing digits and dots, <trailing-char> a whitespace or a
> > semicolon and <title> any chars.
> >
> > I use a regexp (framework 1.1) to parse the strings. Here is the code:
> >
> > static Regex reNumSepTitle = new
> > Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$");
> >
> > public static string ExtractTitle(string str) {
> >     string title = str.Trim();
> >     m = reNumSepTitle.Match(title);
> >     if ( m.Success ) {
> >           return m.Result("${title}");
> >     }
> >    return title;
> > }
> >
> > When I call the method with "1.1:\tHeading", it returns "Heading".
> >
> > With "1.1.1.1:\tHeading", I get "\tHeading" (and the number part is '1.').
> >
> > What's wrong ?
> >
> > Thanks.
>
>
>
Author
22 Feb 2006 11:29 AM
jppop
I am sorry.
This not the exact behaviour. I haven't posted the exact snippet code (see
below)

Actually, before trying to parse the string using the regexp
@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$", the method try to parse the
string using a regexp where each group are separated by a control char (\x31).

Finally, the behaviour is the following:
With the string "1.1.\tHeading", the _first_  regexp matches the string and
the method returns ".\tHeading".

If I use a 'printable' ascii char (like '|' for example), everything works
fine.

Thanks.

--- code ---
private const string reUS = "\\u0031";

// RE used for parsing numbered title where UnitSep. is used
static Regex reNumTitle = new Regex(@"^(?<number>.+)" + reUS +
@"(?<trailing>.)*" + reUS + @"(?<title>.*)$");

// if not found with the previous RE, try '<num><sep><heading>' where sep is
a tab or a semicolon
static Regex reNumSepTitle = new
Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$");

public static string ExtractTitle(string str) {
        string title = str.Trim();
        Match m = reNumTitle.Match(title);
        if ( !m.Success ) {
                m = reNumSepTitle.Match(title);
        }
        if ( m.Success ) {
                Debug.WriteLine("number:" + m.Result("${number}"));
                Debug.WriteLine("Title:" + m.Result("${title}"));
                if ( m.Groups.Count == 4 ) {
                        title = m.Groups["title"].Value;
                }
        }
        return title;
}



Show quote
"jppop" wrote:

> Hi,
>
> I want to parse string like "<number><trailing-char><title>" where <number>
> is a string containing digits and dots, <trailing-char> a whitespace or a
> semicolon and <title> any chars.
>
> I use a regexp (framework 1.1) to parse the strings. Here is the code:
>
> static Regex reNumSepTitle = new
> Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$");
>
> public static string ExtractTitle(string str) {
>      string title = str.Trim();
>      m = reNumSepTitle.Match(title);
>      if ( m.Success ) {
>            return m.Result("${title}");
>      }
>     return title;
> }
>
> When I call the method with "1.1:\tHeading", it returns "Heading".
>
> With "1.1.1.1:\tHeading", I get "\tHeading" (and the number part is '1.').
>
> What's wrong ?
>
> Thanks.
Author
22 Feb 2006 11:48 AM
jppop
Sorry again.

The mistake is a conversion error ! The control char I used is not a control
car (31, decimal notation is the code of the char '1'). 1F is better...

Finnaly, all work fine.

Thanks.

Show quote
"jppop" wrote:

> I am sorry.
> This not the exact behaviour. I haven't posted the exact snippet code (see
> below)
>
> Actually, before trying to parse the string using the regexp
> @"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$", the method try to parse the
> string using a regexp where each group are separated by a control char (\x31).
>
> Finally, the behaviour is the following:
> With the string "1.1.\tHeading", the _first_  regexp matches the string and
> the method returns ".\tHeading".
>
> If I use a 'printable' ascii char (like '|' for example), everything works
> fine.
>
> Thanks.
>
> --- code ---
> private const string reUS = "\\u0031";
>
> // RE used for parsing numbered title where UnitSep. is used
> static Regex reNumTitle = new Regex(@"^(?<number>.+)" + reUS +
> @"(?<trailing>.)*" + reUS + @"(?<title>.*)$");
>
> // if not found with the previous RE, try '<num><sep><heading>' where sep is
> a tab or a semicolon
> static Regex reNumSepTitle = new
> Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$");
>
> public static string ExtractTitle(string str) {
>         string title = str.Trim();
>         Match m = reNumTitle.Match(title);
>         if ( !m.Success ) {
>                 m = reNumSepTitle.Match(title);
>         }
>         if ( m.Success ) {
>                 Debug.WriteLine("number:" + m.Result("${number}"));
>                 Debug.WriteLine("Title:" + m.Result("${title}"));
>                 if ( m.Groups.Count == 4 ) {
>                         title = m.Groups["title"].Value;
>                 }
>         }
>         return title;
> }
>
>
>
> "jppop" wrote:
>
> > Hi,
> >
> > I want to parse string like "<number><trailing-char><title>" where <number>
> > is a string containing digits and dots, <trailing-char> a whitespace or a
> > semicolon and <title> any chars.
> >
> > I use a regexp (framework 1.1) to parse the strings. Here is the code:
> >
> > static Regex reNumSepTitle = new
> > Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$");
> >
> > public static string ExtractTitle(string str) {
> >      string title = str.Trim();
> >      m = reNumSepTitle.Match(title);
> >      if ( m.Success ) {
> >            return m.Result("${title}");
> >      }
> >     return title;
> > }
> >
> > When I call the method with "1.1:\tHeading", it returns "Heading".
> >
> > With "1.1.1.1:\tHeading", I get "\tHeading" (and the number part is '1.').
> >
> > What's wrong ?
> >
> > Thanks.
Author
22 Feb 2006 2:28 PM
Kevin Spencer
Try using '\x31' in your Regular Expression for  control character 31.
Example:

^(?<number>.+)\x31(?<trailing>.)*\x31(?<title>.*)$

The character sequence you were using ('\\u0031') is Unicode. You may be
reading an ASCII document.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
To a tea you esteem
a hurting back as a wallet.


Show quote
"jppop" <jp***@discussions.microsoft.com> wrote in message
news:92B2807A-96DF-49EA-A881-CB4D10EC7C7E@microsoft.com...
>I am sorry.
> This not the exact behaviour. I haven't posted the exact snippet code (see
> below)
>
> Actually, before trying to parse the string using the regexp
> @"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$", the method try to parse the
> string using a regexp where each group are separated by a control char
> (\x31).
>
> Finally, the behaviour is the following:
> With the string "1.1.\tHeading", the _first_  regexp matches the string
> and
> the method returns ".\tHeading".
>
> If I use a 'printable' ascii char (like '|' for example), everything works
> fine.
>
> Thanks.
>
> --- code ---
> private const string reUS = "\\u0031";
>
> // RE used for parsing numbered title where UnitSep. is used
> static Regex reNumTitle = new Regex(@"^(?<number>.+)" + reUS +
> @"(?<trailing>.)*" + reUS + @"(?<title>.*)$");
>
> // if not found with the previous RE, try '<num><sep><heading>' where sep
> is
> a tab or a semicolon
> static Regex reNumSepTitle = new
> Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$");
>
> public static string ExtractTitle(string str) {
>        string title = str.Trim();
>        Match m = reNumTitle.Match(title);
>        if ( !m.Success ) {
>                m = reNumSepTitle.Match(title);
>        }
>        if ( m.Success ) {
>                Debug.WriteLine("number:" + m.Result("${number}"));
>                Debug.WriteLine("Title:" + m.Result("${title}"));
>                if ( m.Groups.Count == 4 ) {
>                        title = m.Groups["title"].Value;
>                }
>        }
>        return title;
> }
>
>
>
> "jppop" wrote:
>
>> Hi,
>>
>> I want to parse string like "<number><trailing-char><title>" where
>> <number>
>> is a string containing digits and dots, <trailing-char> a whitespace or a
>> semicolon and <title> any chars.
>>
>> I use a regexp (framework 1.1) to parse the strings. Here is the code:
>>
>> static Regex reNumSepTitle = new
>> Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$");
>>
>> public static string ExtractTitle(string str) {
>>      string title = str.Trim();
>>      m = reNumSepTitle.Match(title);
>>      if ( m.Success ) {
>>            return m.Result("${title}");
>>      }
>>     return title;
>> }
>>
>> When I call the method with "1.1:\tHeading", it returns "Heading".
>>
>> With "1.1.1.1:\tHeading", I get "\tHeading" (and the number part is
>> '1.').
>>
>> What's wrong ?
>>
>> Thanks.

AddThis Social Bookmark Button