|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
I need an workaround for Regex limitationI noticed a strange behaviour, strange from my point of view anyway. Let's say I have the regex expression: "^\w+". When I do: Let's say s = "first second"; Regex r = new Regex( "^\w+", RegexOptions.CaseInsensitive...); Match m = r.Match(s) ; // here we have m.Succes = true; m = r.Match(s,6); //here we have m.Success = false; I would have expected that the ^ will match the beginning of the actual text queried, but that is not the case. The fact that ^ does not match this type of call, is possible to work around but it means a serious performance problem: to match the expression without ^ and to verify that the first match starts from 0, or to copy the string from the position I query which is not an option for a huge string. Has anyone got a clue how to make ^ working? The '^' operator works just fine. It matches only if the word character
sequence starts at the beginning of the string. Using Regex.Match(s, 6) you get all Matches that exist beyond index 6 in the string. Since there are none, you get none back. You think that it should count the index as the beginning of the string because you are only thinking about your specific problem. The index is *not* the beginning of the string. Consider the following, for example: string[] strings = new string[] {"one", "two", "three", "four", "five", "six"}; string s = "one twothree four fivesix"; string newResult; Regex r = new Regex("\\w"); for (int i = 0; i < strings.Length; i++) { newResult = r.Match(s, s.IndexOf(strings[i])); } In this case, you are looking for any match in the string that is found in the array, and your results depend upon the position in the string. If the Match returned is null, the item in the array is not in the string. The string "one" would be found, as would the string "four." But the strings "twothree" and "fivesix" would not be found. Now, if you were to use "\w+" you would not be able to find any other Match than the first. In other words, the beginning of the string is the logical beginning of the string. The index in the string is not relevant or related to the beginning of the string. Now, if you can state the business rule you're trying to satisfy, I think I can help with a solution. -- Show quoteHTH, Kevin Spencer Microsoft MVP Professional Numbskull Hard work is a medication for which there is no placebo. "Liviu Uba" <u**@totalsoft.ro> wrote in message news:Oeb9Bm8XGHA.4324@TK2MSFTNGP03.phx.gbl... > Hi, > > I noticed a strange behaviour, strange from my point of view anyway. > > Let's say I have the regex expression: "^\w+". > When I do: > > Let's say s = "first second"; > > Regex r = new Regex( "^\w+", RegexOptions.CaseInsensitive...); > Match m = r.Match(s) ; // here we have m.Succes = true; > > m = r.Match(s,6); //here we have m.Success = false; > > I would have expected that the ^ will match the beginning of the actual > text queried, but that is not the case. > The fact that ^ does not match this type of call, is possible to work > around but it means a serious performance problem: to match the expression > without ^ and to verify that the first match starts from 0, or to copy the > string from the position I query which is not an option for a huge string. > > Has anyone got a clue how to make ^ working? > > > > |
|||||||||||||||||||||||