Home All Groups Group Topic Archive Search About

strings vs regular expressions

Author
11 Apr 2007 8:26 AM
AVL
hi,
I need to comapare or check for substrings in a given string.
which would give better performance - string related comapare functions or
regualr expressions....

Author
11 Apr 2007 8:39 AM
Michael Nemtsev
Hello AVL,

AIFAIK String.Contains will be the fastest, because it's only call IndexOf
when the regexp makes really waste processing

---
WBR,  Michael  Nemtsev [.NET/C# MVP]. 
My blog: http://spaces.live.com/laflour
Team blog: http://devkids.blogspot.com/

"The greatest danger for most of us is not that our aim is too high and we
miss it, but that it is too low and we reach it" (c) Michelangelo

A> hi,
A> I need to comapare or check for substrings in a given string.
A> which would give better performance - string related comapare
A> functions or
A> regualr expressions....
Author
11 Apr 2007 8:48 AM
Henning Krause [MVP - Exchange]
Hello,

but string.IndexOf has very bad implemention. If you want a fast string
search, look for a .NET implementation of the Boyer-Moore algorithm - this
is also used in regular expressions internals.

Depending on the length of the text being searched and the frequency, you
might want to consider a precompiled regex.

Anyway, you should perform some performance testing yourself. It really
depends on the circumstances.

Best regards,
Henning Krause

Show quote
"Michael Nemtsev" <nemt***@msn.com> wrote in message
news:a279a63a3eb6c18c94a4da25d1b8e@msnews.microsoft.com...
> Hello AVL,
>
> AIFAIK String.Contains will be the fastest, because it's only call IndexOf
> when the regexp makes really waste processing
>
> ---
> WBR,  Michael  Nemtsev [.NET/C# MVP].  My blog:
> http://spaces.live.com/laflour
> Team blog: http://devkids.blogspot.com/
>
> "The greatest danger for most of us is not that our aim is too high and we
> miss it, but that it is too low and we reach it" (c) Michelangelo
>
> A> hi,
> A> I need to comapare or check for substrings in a given string.
> A> which would give better performance - string related comapare
> A> functions or
> A> regualr expressions....
>
>
Author
11 Apr 2007 8:54 AM
Michael Nemtsev
Hello Henning Krause [MVP - Exchange],


H> Anyway, you should perform some performance testing yourself. It
H> really depends on the circumstances.

That's the point, coz we dont know what the OP is looking for

Show quote
>>
>> A> hi,
>> A> I need to comapare or check for substrings in a given string.
>> A> which would give better performance - string related comapare
>> A> functions or
>> A> regualr expressions....
Author
11 Apr 2007 10:29 AM
Jon Skeet [C# MVP]
On Apr 11, 9:48 am, "Henning Krause [MVP - Exchange]"
<newsgroups_rem***@this.infinitec.de> wrote:
> but string.IndexOf has very bad implemention. If you want a fast string
> search, look for a .NET implementation of the Boyer-Moore algorithm - this
> is also used in regular expressions internals.

I wouldn't say that IndexOf has a "very bad" implementation. In *some*
cases it won't be as fast as doing the "pre-work" involved for Boyer-
Moore, but I suspect in the vast majority of cases used in the real
world, it's far quicker to use the "brute force" method, given that
you're only looking for the string once (as far as String.IndexOf is
concerned - you may be calling it multiple times, of course).

I suppose String.IndexOf could apply some heuristics and guess whether
it's worth building the tables (or whatever) for Boyer-Moore, but as I
say, in the vast majority of real cases it won't make any odds.

> Depending on the length of the text being searched and the frequency, you
> might want to consider a precompiled regex.
>
> Anyway, you should perform some performance testing yourself. It really
> depends on the circumstances.

Agreed. If you know you're going to have to search for the same string
lots of times in a performance-critical environment, it may be worth
using regular expressions. I would use Contains until I'd actually
proved it was a bottleneck though :)

Jon
Author
11 Apr 2007 1:48 PM
Henning Krause [MVP - Exchange]
Hi,

> Agreed. If you know you're going to have to search for the same string
> lots of times in a performance-critical environment, it may be worth
> using regular expressions. I would use Contains until I'd actually
> proved it was a bottleneck though :)
>

"The First Rule of Program Optimization: Don't do it. The Second Rule of
Program Optimization (for experts only!): Don't do it yet." - Michael A.
Jackson
Author
11 Apr 2007 12:01 PM
Kevin Spencer
Hi AVL,

Just to clear things up regarding regular expressions versus string
functions. Use regular expressions when looking for a *pattern* of
characters in a string, which may be different characters in the same
pattern, and string functions for looking for substrings. What I mean by
"patterns" is, for example, a hyperlink in an HTML document.

A hyperlink is a string that must follow certain rules. It must begin with
the character sequence "<a" followed by one or more white space characters,
followed by 0 or more attribute name=value pairs, followed by the ">"
character. This is followed by a string of text that is followed by the
"</a>" character sequence. Note that only several of the characters are
specified, and you don't know what the rest of them will be. So, how do you
look for a string that satisfies these rules? Example:

(?m)(?i)(?<=<a)(?:(?:\s+href=(?<href>[^>]+))|(?:\s+[^=>]+=[^>]+))*>(?<innerHtml>[^<]*)(?=</a>)

The above is a regular expression that identifies substrings that satisfy
those rules. In addition, it captures 2 groups, one for the link text, one
for the innerHtml of the anchor.

You could not use a string function to find this pattern. Generally, string
functions are faster than regular expressions, but when looking for patterns
(groups of characters that satisfy rules), regular expressions are the
fastest method.

--
HTH,

Kevin Spencer
Microsoft MVP

Printing Components, Email Components,
FTP Client Classes, Enhanced Data Controls, much more.
DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net

Show quote
"AVL" <A**@discussions.microsoft.com> wrote in message
news:A15C2E96-968B-4323-A3BF-D1F5663FB776@microsoft.com...
> hi,
> I need to comapare or check for substrings in a given string.
> which would give better performance - string related comapare functions or
> regualr expressions....

AddThis Social Bookmark Button