Home All Groups Group Topic Archive Search About

C# ASP .NET -- UTF-16 encoding to UTF-8

Author
28 Feb 2006 10:26 PM
davidjgonzalez
I have a web application written in ASP .NET (VS 2003) which an Adobe
Acrobat Form posts XML to. I am able to get the XML using the
Request.InputStream however the XML is UTF-16 encoded. This means that
the byte[] that i get from the Request.InputStream looks like:
[0]: 255
[1]: 254
[2]: 64
[3]: 0
[4]: 56
[5]: 0
....
essentially every other index in the array holds the value 0.
When i try to convert the byte array to a string, i get <\0x\0\m\0l\0
...  (every other character is a \0) .. I also have the '\r\n' character
before the ending tags in the xml.

my question is two fold

1) how can i elegantly convert the UTF-16 formated xml to something
more readable aka UTF-8, ASCII, etc in Visual Studio 2003 (i cant find
any UTF-16 encoding support in VS 2003)

2) If 1 doesnt get rid of the '\r\n's how can i get rid of them?
string.replace("\r\n", "") didnt seem to work.

Thanks

Author
28 Feb 2006 11:20 PM
Greg Young
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfSystemTextUnicodeEncodingClassTopic.asp
use the GetString() method ... then if you want it in UTF 8 you can use
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfSystemTextUTF8EncodingClassTopic.asp to convert it.

Greg

Show quote
"davidjgonza***@gmail.com" wrote:

> I have a web application written in ASP .NET (VS 2003) which an Adobe
> Acrobat Form posts XML to. I am able to get the XML using the
> Request.InputStream however the XML is UTF-16 encoded. This means that
> the byte[] that i get from the Request.InputStream looks like:
> [0]: 255
> [1]: 254
> [2]: 64
> [3]: 0
> [4]: 56
> [5]: 0
> ....
> essentially every other index in the array holds the value 0.
> When i try to convert the byte array to a string, i get <\0x\0\m\0l\0
> ...  (every other character is a \0) .. I also have the '\r\n' character
> before the ending tags in the xml.
>
> my question is two fold
>
> 1) how can i elegantly convert the UTF-16 formated xml to something
> more readable aka UTF-8, ASCII, etc in Visual Studio 2003 (i cant find
> any UTF-16 encoding support in VS 2003)
>
> 2) If 1 doesnt get rid of the '\r\n's how can i get rid of them?
> string.replace("\r\n", "") didnt seem to work.
>
> Thanks
>
>
Author
1 Mar 2006 4:23 PM
davidjgonzalez
Greg thanks for the reply but that didn't seem to work..

My scope has slightly changed -- i no longer need UTF-8 encoding
persay, just need to parse the values from the XML that is being sent
over, so i need to convert the byte[] to a readable string.


.. here is my code:

-------------------------------------
Request.InputStream.Read(data, 0,
Convert.ToInt32(Request.InputStream.Length));

UnicodeEncoding encoding = new UnicodeEncoding( );
string decodedString = encoding.GetString(characters);
//at this point decodedString = " ????? ??????????? ?
?????????????????????????????????????????????????????????????????????????????????????????????????????????????"

DataSet ds = new DataSet();
ds = XmlToDataSet(decodedString);
-------------------------------------

when decodedString is passed to XmlToDataSet, it crashes I assume
because XmlToDataSet does not look like the encoding on the
decodedString..

How do i get the byte[] the Request.InputStream yields into a "normal"
encoding?

Thanks
Author
1 Mar 2006 7:06 PM
Joerg Jooss
Thus wrote davidjgonza***@gmail.com,

Show quote
> Greg thanks for the reply but that didn't seem to work..
>
> My scope has slightly changed -- i no longer need UTF-8 encoding
> persay, just need to parse the values from the XML that is being sent
> over, so i need to convert the byte[] to a readable string.
>
> .. here is my code:
>
> -------------------------------------
> Request.InputStream.Read(data, 0,
> Convert.ToInt32(Request.InputStream.Length));
> UnicodeEncoding encoding = new UnicodeEncoding( );
>
> string decodedString = encoding.GetString(characters);
>
> //at this point decodedString = " ????? ??????????? ?
>
> ??????????????????????????????????????????????????????????????????????
> ???????????????????????????????????????"

Your code is not really complete. You're reading into a byte array "data",
but decode something called "characters".

Note that you don't really need to perform these steps yourself, if all you
want to do is fill a DataSet.

aDataSet.ReadXml(Request.InputStream);

should do the trick. The XML infrastructure can figure out the encoding by
itself.

Cheers,
--
Joerg Jooss
news-re***@joergjooss.de
Author
6 Mar 2006 2:23 PM
davidjgonzalez
oop - thanks for the catch ..

string decodedString = encoding.GetString(characters);
is supposed to read
string decodedString = encoding.GetString(data);

your ReadXml(...) solution from the input stream is just what i needed!
thanks

AddThis Social Bookmark Button