Home All Groups Group Topic Archive Search About

Regex bug?? Insufficient hexadecimal digits

Author
5 Oct 2005 11:29 PM
Mori
I have a string that contains the \", \t, \r, \n.  I need to get the xml.

sample below:
"<?xml version=\"1.0\"?>\r\n<USERS
xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\"
xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"
xmlns=\"http://www.slcorp.com\\xml\\slcorp_dtd_schema.xml\">\r\n\t<ACCT>GameTek</ACCT>\r\n\t<USER>\r\n\t\t<USER_ID>Mike</USER_ID></USER>\r\n\t</USERS>\r\n"

I have tried replacing as follows so I can get the xml.  I have tried 2
approaches
(1)
str = str.Replace("\n", "").Replace("\t","").Replace("\r","").Replace("\"",
""");
This code segment (Replace("\"", """);) does not compile, the rest is okay.
-------------------------------------------------------------------------
(2)
I have also tried using Regex as follows

string str= Regex.Unescape(str);  This time the exception is "Insufficient
hexadecimal digits"


Any ideas?

Author
6 Oct 2005 6:18 AM
Jon Skeet [C# MVP]
Mori <M***@discussions.microsoft.com> wrote:
Show quote
> I have a string that contains the \", \t, \r, \n.  I need to get the xml.
>
> sample below:
> "<?xml version=\"1.0\"?>\r\n<USERS
> xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\"
> xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"
> xmlns=\"http://www.slcorp.com\\xml\\slcorp_dtd_schema.xml\">\r\n\t<ACCT>GameTek</ACCT>\r\n\t<USER>\r\n\t\t<USER_ID>Mike</USER_ID></USER>\r\n\t</USERS>\r\n"
>
> I have tried replacing as follows so I can get the xml.  I have tried 2
> approaches
> (1)
> str = str.Replace("\n", "").Replace("\t","").Replace("\r","").Replace("\"",
> """);
> This code segment (Replace("\"", """);) does not compile, the rest is okay.
> -------------------------------------------------------------------------
> (2)
> I have also tried using Regex as follows
>
> string str= Regex.Unescape(str);  This time the exception is "Insufficient
> hexadecimal digits"
>
>
> Any ideas?

""" isn't a valid string. Did you mean ""?

However, I'm not entirely sure what you mean by needing to "get the
XML" - the string *is* the XML. The \r, \n etc are only escapes as far
as C# is concerned.

See http://www.pobox.com/~skeet/csharp/strings.html

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Author
6 Oct 2005 9:48 AM
Oliver Sturm
Mori wrote:

Show quote
>I have a string that contains the \", \t, \r, \n.  I need to get the xml.
>
>sample below:
>"<?xml version=\"1.0\"?>\r\n<USERS
>xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\"
>xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"
>xmlns=\"http://www.slcorp.com\\xml\\slcorp_dtd_schema.xml\">\r\n\t<ACCT>GameTek</ACCT>\r\n\t<USER>\r\n\t\t<USER_ID>Mike</USER_ID></USER>\r\n\t</USERS>\r\n"
>
>I have tried replacing as follows so I can get the xml.  I have tried 2
>approaches
>(1)
>str = str.Replace("\n", "").Replace("\t","").Replace("\r","").Replace("\"",
>""");
>This code segment (Replace("\"", """);) does not compile, the rest is okay.
>-------------------------------------------------------------------------
>(2)
>I have also tried using Regex as follows
>
>string str= Regex.Unescape(str);  This time the exception is "Insufficient
>hexadecimal digits"

In addition to what Jon said, I understand you want to strip the escape
sequences from the XML string by replacing \r, \n and \t by nothing, but
replace \" by ". Right?

In that case, you need to make sure that the escape sequences aren't
recognized as such in the strings you are trying to use in your
replacement. The easiest way to do that is to use verbatim literals, like
this:

   str = str.Replace(@"\n", "").Replace(@"\t","").Replace @"\r","").Replace(@"\"", @"""");

Without verbatim literals, it would have to look like this:

   str = str.Replace("\\n", "").Replace("\\t","").Replace "\\r","").Replace("\\\"", "\"");

Using regular expressions is probably not the most performant way to do
this, because you'd have to do two replacements - the only advantage is
that you could replace \r, \n and \t in one go:

   Regex.Replace(str, @"\\[rnt]", "");

Using Regex.Unescape doesn't make any sense here, it's got a completely
different purpose.

Now, I'm sure I made a mistake somewhere will all this escaping -
someone's going to tell me :-)


                Oliver Sturm
--
Expert programming and consulting services available
See http://www.sturmnet.org (try /blog as well)

AddThis Social Bookmark Button