Home All Groups Group Topic Archive Search About

Help with Regex.replace

Author
7 Feb 2007 12:21 AM
maheshvd
Hi Group,

I've a HTML document with all sorts of HTML tags. I nned to provide
search and replace feature for text in the HTML documents. User can
enter any phrase to search and any phrase to replace it with. While
searching, I strip all HMTL tags from the HTML document and search.
User can select the document(s) s/he wants to replace the desired
text.
While replacing, I've issue. How do I replace the string with the new
one?
e.g.
The HTML document may contain:

<li>This is a test document</li> All the  <b>articles</b> here are
written for general public. <strong>Tip: <strong>If you do not find
desired articles, please mail <SPAN id="test" style="FONT-WEIGHT:
bold; COLOR: #ff0000">develo***@test.com</SPAN >

User may want to find
"All the articles here"
and replace with
"all the documents here".

The resultant document could be
<li>This is a test document</li> All the  documents here are written
for general public. <strong>Tip: <strong>If you do not find desired
articles, please mail <SPAN id="test" style="FONT-WEIGHT: bold; COLOR:
#ff0000">develo***@test.com</SPAN >

So while replacing the string, can I somehow ignore the HTML tags and
achieve replacement? Rest of the HTML tags must be retained in the
HTML doc.
Any thoughts will be appreciated.

Regards,
dev

Author
10 Feb 2007 10:53 PM
Alexey Smirnov
On Feb 7, 1:21 am, mahes***@gmail.com wrote:
Show quote
> Hi Group,
>
> I've a HTML document with all sorts of HTML tags. I nned to provide
> search and replace feature for text in the HTML documents. User can
> enter any phrase to search and any phrase to replace it with. While
> searching, I strip all HMTL tags from the HTML document and search.
> User can select the document(s) s/he wants to replace the desired
> text.
> While replacing, I've issue. How do I replace the string with the new
> one?
> e.g.
> The HTML document may contain:
>
> <li>This is a test document</li> All the  <b>articles</b> here are
> written for general public. <strong>Tip: <strong>If you do not find
> desired articles, please mail <SPAN id="test" style="FONT-WEIGHT:
> bold; COLOR: #ff0000">develo***@test.com</SPAN >
>
> User may want to find
> "All the articles here"
> and replace with
> "all the documents here".
>
> The resultant document could be
> <li>This is a test document</li> All the  documents here are written
> for general public. <strong>Tip: <strong>If you do not find desired
> articles, please mail <SPAN id="test" style="FONT-WEIGHT: bold; COLOR:
> #ff0000">develo***@test.com</SPAN >
>
> So while replacing the string, can I somehow ignore the HTML tags and
> achieve replacement? Rest of the HTML tags must be retained in the
> HTML doc.
> Any thoughts will be appreciated.
>

string sourceTxt = "....";

string searchTxt = "All the articles here";
string replaceTxt = "all the documents here";

string searchPattern = searchTxt.replace(" ","(.*?)");
string replaceString = replaceTxt;

int i = 0;

while (replaceString.indexOf(" ") > -1) {
i+=1;
replaceString = Regex.Replace(" ", "$" + i.toString(), 1);
}

string finalTxt = Regex.Replace(sourceTxt, searchTxt, replaceString);
Author
10 Feb 2007 11:46 PM
Alexey Smirnov
Show quote
On Feb 10, 11:53 pm, "Alexey Smirnov" <alexey.smir***@gmail.com>
wrote:
> On Feb 7, 1:21 am, mahes***@gmail.com wrote:
>
>
>
>
>
> > Hi Group,
>
> > I've a HTML document with all sorts of HTML tags. I nned to provide
> > search and replace feature for text in the HTML documents. User can
> > enter any phrase to search and any phrase to replace it with. While
> > searching, I strip all HMTL tags from the HTML document and search.
> > User can select the document(s) s/he wants to replace the desired
> > text.
> > While replacing, I've issue. How do I replace the string with the new
> > one?
> > e.g.
> > The HTML document may contain:
>
> > <li>This is a test document</li> All the  <b>articles</b> here are
> > written for general public. <strong>Tip: <strong>If you do not find
> > desired articles, please mail <SPAN id="test" style="FONT-WEIGHT:
> > bold; COLOR: #ff0000">develo***@test.com</SPAN >
>
> > User may want to find
> > "All the articles here"
> > and replace with
> > "all the documents here".
>
> > The resultant document could be
> > <li>This is a test document</li> All the  documents here are written
> > for general public. <strong>Tip: <strong>If you do not find desired
> > articles, please mail <SPAN id="test" style="FONT-WEIGHT: bold; COLOR:
> > #ff0000">develo***@test.com</SPAN >
>
> > So while replacing the string, can I somehow ignore the HTML tags and
> > achieve replacement? Rest of the HTML tags must be retained in the
> > HTML doc.
> > Any thoughts will be appreciated.
>
> string sourceTxt = "....";
>
> string searchTxt = "All the articles here";
> string replaceTxt = "all the documents here";
>
> string searchPattern = searchTxt.replace(" ","(.*?)");
> string replaceString = replaceTxt;
>
> int i = 0;
>
> while (replaceString.indexOf(" ") > -1) {
> i+=1;
> replaceString = Regex.Replace(" ", "$" + i.toString(), 1);
>
> }
>
> string finalTxt = Regex.Replace(sourceTxt, searchTxt, replaceString);- Hide quoted text -
>

A silly typo, sorry:

string sourceTxt = "....";

            string searchTxt = "All the articles here";
            string replaceTxt = "all the documents here";

            string searchPattern = searchTxt.Replace(" ", "(.*?)");
            string replaceString = replaceTxt;

            int i = 0;

            Regex r = new Regex(@"\s");
            while (replaceString.IndexOf(" ") > -1)
            {
                i += 1;
                replaceString = r.Replace(replaceString,  "$" +
i.ToString(), 1);
            }

            string finalTxt = Regex.Replace(sourceTxt, searchPattern,
replaceString);
Author
13 Feb 2007 6:31 PM
maheshvd
Hey Alexey,
Thanks a ton. Thats a great solution.
There is a small hitch though. If the string to be replaced is bigger
that the searched string, the replacement string carries extra $3,$4.
I'm counting the words in both the strings and whateever remains goes
in the last replacement.
Hope this is the right way.
Regards,
Mahesh
Author
14 Feb 2007 10:21 PM
Alexey Smirnov
On Feb 13, 7:31 pm, mahes***@gmail.com wrote:
> Hey Alexey,
> Thanks a ton. Thats a great solution.
> There is a small hitch though. If the string to be replaced is bigger
> that the searched string, the replacement string carries extra $3,$4.
> I'm counting the words in both the strings and whateever remains goes
> in the last replacement.
> Hope this is the right way.
> Regards,
> Mahesh

Yup, it could be a problem. Maybe we have to look for a better
approach.
Author
15 Feb 2007 2:14 AM
maheshvd
Show quote
On Feb 14, 2:21 pm, "Alexey Smirnov" <alexey.smir***@gmail.com> wrote:
> On Feb 13, 7:31 pm, mahes***@gmail.com wrote:
>
> > Hey Alexey,
> > Thanks a ton. Thats a great solution.
> > There is a small hitch though. If the string to be replaced is bigger
> > that the searched string, the replacement string carries extra $3,$4.
> > I'm counting the words in both the strings and whateever remains goes
> > in the last replacement.
> > Hope this is the right way.
> > Regards,
> > Mahesh
>
> Yup, it could be a problem. Maybe we have to look for a better
> approach.

Moreover, (.*?) will not only ignore HTML tags, it may ignore whole
sentenses. e.g. if I have something like
"This is a test where we need to replace words. Also test words"
and I search for "test words" and try to replace with "test
sentences", it will replace in 2 places because in first sentence we
have "test" and "word" seperated by many other words which we are
trying to ignore. Is there any way we can say only if its HTML tag,
replace?
Thanks for all the help. I desperately need a solution to this.
Mahesh
Author
15 Feb 2007 7:45 AM
Alexey Smirnov
On Feb 15, 3:14 am, mahes***@gmail.com wrote:
Show quote
> On Feb 14, 2:21 pm, "Alexey Smirnov" <alexey.smir***@gmail.com> wrote:
>
> > On Feb 13, 7:31 pm, mahes***@gmail.com wrote:
>
> > > Hey Alexey,
> > > Thanks a ton. Thats a great solution.
> > > There is a small hitch though. If the string to be replaced is bigger
> > > that the searched string, the replacement string carries extra $3,$4.
> > > I'm counting the words in both the strings and whateever remains goes
> > > in the last replacement.
> > > Hope this is the right way.
> > > Regards,
> > > Mahesh
>
> > Yup, it could be a problem. Maybe we have to look for a better
> > approach.
>
> Moreover, (.*?) will not only ignore HTML tags, it may ignore whole
> sentenses. e.g. if I have something like
> "This is a test where we need to replace words. Also test words"
> and I search for "test words" and try to replace with "test
> sentences", it will replace in 2 places because in first sentence we
> have "test" and "word" seperated by many other words which we are
> trying to ignore. Is there any way we can say only if its HTML tag,
> replace?
> Thanks for all the help. I desperately need a solution to this.
> Mahesh

Sure, there is a way to do that.

Use this pattern:

test(((<[^>]*>)|\s)*?)words

It will skip HTML tags and spaces between words.
Author
22 Feb 2007 12:51 AM
maheshvd
Yes, thats exactly what I was looking for. I tested it with few
strings, working fine. I'll test it thoroughly.
Thanks a ton.

AddThis Social Bookmark Button