|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Help with Regex.replaceI've a HTML document with all sorts of HTML tags. I nned to provide search and replace feature for text in the HTML documents. User can enter any phrase to search and any phrase to replace it with. While searching, I strip all HMTL tags from the HTML document and search. User can select the document(s) s/he wants to replace the desired text. While replacing, I've issue. How do I replace the string with the new one? e.g. The HTML document may contain: <li>This is a test document</li> All the <b>articles</b> here are written for general public. <strong>Tip: <strong>If you do not find desired articles, please mail <SPAN id="test" style="FONT-WEIGHT: bold; COLOR: #ff0000">develo***@test.com</SPAN > User may want to find "All the articles here" and replace with "all the documents here". The resultant document could be <li>This is a test document</li> All the documents here are written for general public. <strong>Tip: <strong>If you do not find desired articles, please mail <SPAN id="test" style="FONT-WEIGHT: bold; COLOR: #ff0000">develo***@test.com</SPAN > So while replacing the string, can I somehow ignore the HTML tags and achieve replacement? Rest of the HTML tags must be retained in the HTML doc. Any thoughts will be appreciated. Regards, dev On Feb 7, 1:21 am, mahes***@gmail.com wrote:
Show quote > Hi Group, string sourceTxt = "....";> > I've a HTML document with all sorts of HTML tags. I nned to provide > search and replace feature for text in the HTML documents. User can > enter any phrase to search and any phrase to replace it with. While > searching, I strip all HMTL tags from the HTML document and search. > User can select the document(s) s/he wants to replace the desired > text. > While replacing, I've issue. How do I replace the string with the new > one? > e.g. > The HTML document may contain: > > <li>This is a test document</li> All the <b>articles</b> here are > written for general public. <strong>Tip: <strong>If you do not find > desired articles, please mail <SPAN id="test" style="FONT-WEIGHT: > bold; COLOR: #ff0000">develo***@test.com</SPAN > > > User may want to find > "All the articles here" > and replace with > "all the documents here". > > The resultant document could be > <li>This is a test document</li> All the documents here are written > for general public. <strong>Tip: <strong>If you do not find desired > articles, please mail <SPAN id="test" style="FONT-WEIGHT: bold; COLOR: > #ff0000">develo***@test.com</SPAN > > > So while replacing the string, can I somehow ignore the HTML tags and > achieve replacement? Rest of the HTML tags must be retained in the > HTML doc. > Any thoughts will be appreciated. > string searchTxt = "All the articles here"; string replaceTxt = "all the documents here"; string searchPattern = searchTxt.replace(" ","(.*?)"); string replaceString = replaceTxt; int i = 0; while (replaceString.indexOf(" ") > -1) { i+=1; replaceString = Regex.Replace(" ", "$" + i.toString(), 1); } string finalTxt = Regex.Replace(sourceTxt, searchTxt, replaceString);
Show quote
On Feb 10, 11:53 pm, "Alexey Smirnov" <alexey.smir***@gmail.com> A silly typo, sorry:wrote: > On Feb 7, 1:21 am, mahes***@gmail.com wrote: > > > > > > > Hi Group, > > > I've a HTML document with all sorts of HTML tags. I nned to provide > > search and replace feature for text in the HTML documents. User can > > enter any phrase to search and any phrase to replace it with. While > > searching, I strip all HMTL tags from the HTML document and search. > > User can select the document(s) s/he wants to replace the desired > > text. > > While replacing, I've issue. How do I replace the string with the new > > one? > > e.g. > > The HTML document may contain: > > > <li>This is a test document</li> All the <b>articles</b> here are > > written for general public. <strong>Tip: <strong>If you do not find > > desired articles, please mail <SPAN id="test" style="FONT-WEIGHT: > > bold; COLOR: #ff0000">develo***@test.com</SPAN > > > > User may want to find > > "All the articles here" > > and replace with > > "all the documents here". > > > The resultant document could be > > <li>This is a test document</li> All the documents here are written > > for general public. <strong>Tip: <strong>If you do not find desired > > articles, please mail <SPAN id="test" style="FONT-WEIGHT: bold; COLOR: > > #ff0000">develo***@test.com</SPAN > > > > So while replacing the string, can I somehow ignore the HTML tags and > > achieve replacement? Rest of the HTML tags must be retained in the > > HTML doc. > > Any thoughts will be appreciated. > > string sourceTxt = "...."; > > string searchTxt = "All the articles here"; > string replaceTxt = "all the documents here"; > > string searchPattern = searchTxt.replace(" ","(.*?)"); > string replaceString = replaceTxt; > > int i = 0; > > while (replaceString.indexOf(" ") > -1) { > i+=1; > replaceString = Regex.Replace(" ", "$" + i.toString(), 1); > > } > > string finalTxt = Regex.Replace(sourceTxt, searchTxt, replaceString);- Hide quoted text - > string sourceTxt = "...."; string searchTxt = "All the articles here"; string replaceTxt = "all the documents here"; string searchPattern = searchTxt.Replace(" ", "(.*?)"); string replaceString = replaceTxt; int i = 0; Regex r = new Regex(@"\s"); while (replaceString.IndexOf(" ") > -1) { i += 1; replaceString = r.Replace(replaceString, "$" + i.ToString(), 1); } string finalTxt = Regex.Replace(sourceTxt, searchPattern, replaceString); Hey Alexey,
Thanks a ton. Thats a great solution. There is a small hitch though. If the string to be replaced is bigger that the searched string, the replacement string carries extra $3,$4. I'm counting the words in both the strings and whateever remains goes in the last replacement. Hope this is the right way. Regards, Mahesh On Feb 13, 7:31 pm, mahes***@gmail.com wrote:
> Hey Alexey, Yup, it could be a problem. Maybe we have to look for a better> Thanks a ton. Thats a great solution. > There is a small hitch though. If the string to be replaced is bigger > that the searched string, the replacement string carries extra $3,$4. > I'm counting the words in both the strings and whateever remains goes > in the last replacement. > Hope this is the right way. > Regards, > Mahesh approach.
Show quote
On Feb 14, 2:21 pm, "Alexey Smirnov" <alexey.smir***@gmail.com> wrote: Moreover, (.*?) will not only ignore HTML tags, it may ignore whole> On Feb 13, 7:31 pm, mahes***@gmail.com wrote: > > > Hey Alexey, > > Thanks a ton. Thats a great solution. > > There is a small hitch though. If the string to be replaced is bigger > > that the searched string, the replacement string carries extra $3,$4. > > I'm counting the words in both the strings and whateever remains goes > > in the last replacement. > > Hope this is the right way. > > Regards, > > Mahesh > > Yup, it could be a problem. Maybe we have to look for a better > approach. sentenses. e.g. if I have something like "This is a test where we need to replace words. Also test words" and I search for "test words" and try to replace with "test sentences", it will replace in 2 places because in first sentence we have "test" and "word" seperated by many other words which we are trying to ignore. Is there any way we can say only if its HTML tag, replace? Thanks for all the help. I desperately need a solution to this. Mahesh On Feb 15, 3:14 am, mahes***@gmail.com wrote:
Show quote > On Feb 14, 2:21 pm, "Alexey Smirnov" <alexey.smir***@gmail.com> wrote: Sure, there is a way to do that.> > > On Feb 13, 7:31 pm, mahes***@gmail.com wrote: > > > > Hey Alexey, > > > Thanks a ton. Thats a great solution. > > > There is a small hitch though. If the string to be replaced is bigger > > > that the searched string, the replacement string carries extra $3,$4. > > > I'm counting the words in both the strings and whateever remains goes > > > in the last replacement. > > > Hope this is the right way. > > > Regards, > > > Mahesh > > > Yup, it could be a problem. Maybe we have to look for a better > > approach. > > Moreover, (.*?) will not only ignore HTML tags, it may ignore whole > sentenses. e.g. if I have something like > "This is a test where we need to replace words. Also test words" > and I search for "test words" and try to replace with "test > sentences", it will replace in 2 places because in first sentence we > have "test" and "word" seperated by many other words which we are > trying to ignore. Is there any way we can say only if its HTML tag, > replace? > Thanks for all the help. I desperately need a solution to this. > Mahesh Use this pattern: test(((<[^>]*>)|\s)*?)words It will skip HTML tags and spaces between words. |
|||||||||||||||||||||||