|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
regexp - need helpI want to parse string like "<number><trailing-char><title>" where <number> is a string containing digits and dots, <trailing-char> a whitespace or a semicolon and <title> any chars. I use a regexp (framework 1.1) to parse the strings. Here is the code: static Regex reNumSepTitle = new Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$"); public static string ExtractTitle(string str) { string title = str.Trim(); m = reNumSepTitle.Match(title); if ( m.Success ) { return m.Result("${title}"); } return title; } When I call the method with "1.1:\tHeading", it returns "Heading". With "1.1.1.1:\tHeading", I get "\tHeading" (and the number part is '1.'). What's wrong ? Thanks. Try either naming or not naming all of your groups.
-- Show quoteHTH, Kevin Spencer Microsoft MVP ..Net Developer To a tea you esteem a hurting back as a wallet. "jppop" <jp***@discussions.microsoft.com> wrote in message news:AF8E4339-2445-46AE-84FA-2B11078B6484@microsoft.com... > Hi, > > I want to parse string like "<number><trailing-char><title>" where > <number> > is a string containing digits and dots, <trailing-char> a whitespace or a > semicolon and <title> any chars. > > I use a regexp (framework 1.1) to parse the strings. Here is the code: > > static Regex reNumSepTitle = new > Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$"); > > public static string ExtractTitle(string str) { > string title = str.Trim(); > m = reNumSepTitle.Match(title); > if ( m.Success ) { > return m.Result("${title}"); > } > return title; > } > > When I call the method with "1.1:\tHeading", it returns "Heading". > > With "1.1.1.1:\tHeading", I get "\tHeading" (and the number part is '1.'). > > What's wrong ? > > Thanks. Thank you for your post.
I have tried both different naming and anonymous group, but it still doesn't work. Show quote "Kevin Spencer" wrote: > Try either naming or not naming all of your groups. > > -- > HTH, > > Kevin Spencer > Microsoft MVP > ..Net Developer > To a tea you esteem > a hurting back as a wallet. > > > "jppop" <jp***@discussions.microsoft.com> wrote in message > news:AF8E4339-2445-46AE-84FA-2B11078B6484@microsoft.com... > > Hi, > > > > I want to parse string like "<number><trailing-char><title>" where > > <number> > > is a string containing digits and dots, <trailing-char> a whitespace or a > > semicolon and <title> any chars. > > > > I use a regexp (framework 1.1) to parse the strings. Here is the code: > > > > static Regex reNumSepTitle = new > > Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$"); > > > > public static string ExtractTitle(string str) { > > string title = str.Trim(); > > m = reNumSepTitle.Match(title); > > if ( m.Success ) { > > return m.Result("${title}"); > > } > > return title; > > } > > > > When I call the method with "1.1:\tHeading", it returns "Heading". > > > > With "1.1.1.1:\tHeading", I get "\tHeading" (and the number part is '1.'). > > > > What's wrong ? > > > > Thanks. > > > I am sorry.
This not the exact behaviour. I haven't posted the exact snippet code (see below) Actually, before trying to parse the string using the regexp @"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$", the method try to parse the string using a regexp where each group are separated by a control char (\x31). Finally, the behaviour is the following: With the string "1.1.\tHeading", the _first_ regexp matches the string and the method returns ".\tHeading". If I use a 'printable' ascii char (like '|' for example), everything works fine. Thanks. --- code --- private const string reUS = "\\u0031"; // RE used for parsing numbered title where UnitSep. is used static Regex reNumTitle = new Regex(@"^(?<number>.+)" + reUS + @"(?<trailing>.)*" + reUS + @"(?<title>.*)$"); // if not found with the previous RE, try '<num><sep><heading>' where sep is a tab or a semicolon static Regex reNumSepTitle = new Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$"); public static string ExtractTitle(string str) { string title = str.Trim(); Match m = reNumTitle.Match(title); if ( !m.Success ) { m = reNumSepTitle.Match(title); } if ( m.Success ) { Debug.WriteLine("number:" + m.Result("${number}")); Debug.WriteLine("Title:" + m.Result("${title}")); if ( m.Groups.Count == 4 ) { title = m.Groups["title"].Value; } } return title; } Show quote "jppop" wrote: > Hi, > > I want to parse string like "<number><trailing-char><title>" where <number> > is a string containing digits and dots, <trailing-char> a whitespace or a > semicolon and <title> any chars. > > I use a regexp (framework 1.1) to parse the strings. Here is the code: > > static Regex reNumSepTitle = new > Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$"); > > public static string ExtractTitle(string str) { > string title = str.Trim(); > m = reNumSepTitle.Match(title); > if ( m.Success ) { > return m.Result("${title}"); > } > return title; > } > > When I call the method with "1.1:\tHeading", it returns "Heading". > > With "1.1.1.1:\tHeading", I get "\tHeading" (and the number part is '1.'). > > What's wrong ? > > Thanks. Sorry again.
The mistake is a conversion error ! The control char I used is not a control car (31, decimal notation is the code of the char '1'). 1F is better... Finnaly, all work fine. Thanks. Show quote "jppop" wrote: > I am sorry. > This not the exact behaviour. I haven't posted the exact snippet code (see > below) > > Actually, before trying to parse the string using the regexp > @"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$", the method try to parse the > string using a regexp where each group are separated by a control char (\x31). > > Finally, the behaviour is the following: > With the string "1.1.\tHeading", the _first_ regexp matches the string and > the method returns ".\tHeading". > > If I use a 'printable' ascii char (like '|' for example), everything works > fine. > > Thanks. > > --- code --- > private const string reUS = "\\u0031"; > > // RE used for parsing numbered title where UnitSep. is used > static Regex reNumTitle = new Regex(@"^(?<number>.+)" + reUS + > @"(?<trailing>.)*" + reUS + @"(?<title>.*)$"); > > // if not found with the previous RE, try '<num><sep><heading>' where sep is > a tab or a semicolon > static Regex reNumSepTitle = new > Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$"); > > public static string ExtractTitle(string str) { > string title = str.Trim(); > Match m = reNumTitle.Match(title); > if ( !m.Success ) { > m = reNumSepTitle.Match(title); > } > if ( m.Success ) { > Debug.WriteLine("number:" + m.Result("${number}")); > Debug.WriteLine("Title:" + m.Result("${title}")); > if ( m.Groups.Count == 4 ) { > title = m.Groups["title"].Value; > } > } > return title; > } > > > > "jppop" wrote: > > > Hi, > > > > I want to parse string like "<number><trailing-char><title>" where <number> > > is a string containing digits and dots, <trailing-char> a whitespace or a > > semicolon and <title> any chars. > > > > I use a regexp (framework 1.1) to parse the strings. Here is the code: > > > > static Regex reNumSepTitle = new > > Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$"); > > > > public static string ExtractTitle(string str) { > > string title = str.Trim(); > > m = reNumSepTitle.Match(title); > > if ( m.Success ) { > > return m.Result("${title}"); > > } > > return title; > > } > > > > When I call the method with "1.1:\tHeading", it returns "Heading". > > > > With "1.1.1.1:\tHeading", I get "\tHeading" (and the number part is '1.'). > > > > What's wrong ? > > > > Thanks. Try using '\x31' in your Regular Expression for control character 31.
Example: ^(?<number>.+)\x31(?<trailing>.)*\x31(?<title>.*)$ The character sequence you were using ('\\u0031') is Unicode. You may be reading an ASCII document. -- Show quoteHTH, Kevin Spencer Microsoft MVP ..Net Developer To a tea you esteem a hurting back as a wallet. "jppop" <jp***@discussions.microsoft.com> wrote in message news:92B2807A-96DF-49EA-A881-CB4D10EC7C7E@microsoft.com... >I am sorry. > This not the exact behaviour. I haven't posted the exact snippet code (see > below) > > Actually, before trying to parse the string using the regexp > @"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$", the method try to parse the > string using a regexp where each group are separated by a control char > (\x31). > > Finally, the behaviour is the following: > With the string "1.1.\tHeading", the _first_ regexp matches the string > and > the method returns ".\tHeading". > > If I use a 'printable' ascii char (like '|' for example), everything works > fine. > > Thanks. > > --- code --- > private const string reUS = "\\u0031"; > > // RE used for parsing numbered title where UnitSep. is used > static Regex reNumTitle = new Regex(@"^(?<number>.+)" + reUS + > @"(?<trailing>.)*" + reUS + @"(?<title>.*)$"); > > // if not found with the previous RE, try '<num><sep><heading>' where sep > is > a tab or a semicolon > static Regex reNumSepTitle = new > Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$"); > > public static string ExtractTitle(string str) { > string title = str.Trim(); > Match m = reNumTitle.Match(title); > if ( !m.Success ) { > m = reNumSepTitle.Match(title); > } > if ( m.Success ) { > Debug.WriteLine("number:" + m.Result("${number}")); > Debug.WriteLine("Title:" + m.Result("${title}")); > if ( m.Groups.Count == 4 ) { > title = m.Groups["title"].Value; > } > } > return title; > } > > > > "jppop" wrote: > >> Hi, >> >> I want to parse string like "<number><trailing-char><title>" where >> <number> >> is a string containing digits and dots, <trailing-char> a whitespace or a >> semicolon and <title> any chars. >> >> I use a regexp (framework 1.1) to parse the strings. Here is the code: >> >> static Regex reNumSepTitle = new >> Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$"); >> >> public static string ExtractTitle(string str) { >> string title = str.Trim(); >> m = reNumSepTitle.Match(title); >> if ( m.Success ) { >> return m.Result("${title}"); >> } >> return title; >> } >> >> When I call the method with "1.1:\tHeading", it returns "Heading". >> >> With "1.1.1.1:\tHeading", I get "\tHeading" (and the number part is >> '1.'). >> >> What's wrong ? >> >> Thanks. |
|||||||||||||||||||||||