Home All Groups Group Topic Archive Search About
Author
24 Mar 2006 12:54 PM
PL
I'm somewhat confused about Unicode but up until now I havent really seen
much issues with using it up until recently. We recently started using an
SMS gateway that requires a unicode message to be sent as a hexadecimal
string where each byte code has been replaced with their hexadecimal value,
for example: 043104AF0442044D044...

This string according to their documentation must be in UTF-16 before
conversion to the hexadecimal form, we however are using UTF-8 on our
website and all the texts are entered as UTF-8.

When I try to send a unicode formatted message using content from our
website it shows some characters correctly but not all of them, I cannot see
another reason for this than the fact that we are using UTF-8 and they
require it to be in UTF-16.

Now to the questions:

1. How do I convert between UTF-8 and UTF-16 ? I was looking at the Decoder,
Encoder classes but it doesn't really provide a direct way to convert
between encodings that I could see.

2. Since all strings are actually UTF-16 in .NET does this mean that the
conversion already has been made or does it mean it is actually storing
UTF-8 encoded bytes into a UTF-16 string ?

Thank you
PL.

Author
24 Mar 2006 1:57 PM
Morten Wennevik
Hi PL,

You can use the System.Text.Encoding class to convert one string to a byte 
array and then back to string in another encoding.

byte[] data = System.Text.Encoding.UTF8.GetBytes(utf8string);
string unicodestring = System.Text.Encoding.Unicode.GetString(data);

Beware that UTF16 can be big endian, in which case use BigEndianUnicode to 
get the string.

As for the second question. Yes all strings are unicode, but the content 
of the string does not have to be unicode encoded.  I believe a string can 
hold UTF8 encoded data without loss, but if you plan on doing string 
manipulation I would convert it to unicode first.


On Fri, 24 Mar 2006 13:54:48 +0100, PL <pbl***@yahoo.se> wrote:

Show quote
>
> I'm somewhat confused about Unicode but up until now I havent really seen
> much issues with using it up until recently. We recently started using an
> SMS gateway that requires a unicode message to be sent as a hexadecimal
> string where each byte code has been replaced with their hexadecimal 
> value,
> for example: 043104AF0442044D044...
>
> This string according to their documentation must be in UTF-16 before
> conversion to the hexadecimal form, we however are using UTF-8 on our
> website and all the texts are entered as UTF-8.
>
> When I try to send a unicode formatted message using content from our
> website it shows some characters correctly but not all of them, I cannot 
> see
> another reason for this than the fact that we are using UTF-8 and they
> require it to be in UTF-16.
>
> Now to the questions:
>
> 1. How do I convert between UTF-8 and UTF-16 ? I was looking at the 
> Decoder,
> Encoder classes but it doesn't really provide a direct way to convert
> between encodings that I could see.
>
> 2. Since all strings are actually UTF-16 in .NET does this mean that the
> conversion already has been made or does it mean it is actually storing
> UTF-8 encoded bytes into a UTF-16 string ?
>
> Thank you
> PL.
>
>



--
Happy Coding!
Morten Wennevik [C# MVP]
Author
24 Mar 2006 3:30 PM
PL
Thank you, I was looking at the Encoding class without seeing that simple
solution :-/

PL.


Show quote
"Morten Wennevik" <MortenWenne***@hotmail.com> skrev i meddelandet
news:op.s6w7qmneklbvpo@tr024.bouvet.no...
> Hi PL,
>
> You can use the System.Text.Encoding class to convert one string to a byte
> array and then back to string in another encoding.
>
> byte[] data = System.Text.Encoding.UTF8.GetBytes(utf8string);
> string unicodestring = System.Text.Encoding.Unicode.GetString(data);
>
> Beware that UTF16 can be big endian, in which case use BigEndianUnicode to
> get the string.
>
> As for the second question. Yes all strings are unicode, but the content
> of the string does not have to be unicode encoded.  I believe a string can
> hold UTF8 encoded data without loss, but if you plan on doing string
> manipulation I would convert it to unicode first.

AddThis Social Bookmark Button