|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
UTF8 to UTF16 ?much issues with using it up until recently. We recently started using an SMS gateway that requires a unicode message to be sent as a hexadecimal string where each byte code has been replaced with their hexadecimal value, for example: 043104AF0442044D044... This string according to their documentation must be in UTF-16 before conversion to the hexadecimal form, we however are using UTF-8 on our website and all the texts are entered as UTF-8. When I try to send a unicode formatted message using content from our website it shows some characters correctly but not all of them, I cannot see another reason for this than the fact that we are using UTF-8 and they require it to be in UTF-16. Now to the questions: 1. How do I convert between UTF-8 and UTF-16 ? I was looking at the Decoder, Encoder classes but it doesn't really provide a direct way to convert between encodings that I could see. 2. Since all strings are actually UTF-16 in .NET does this mean that the conversion already has been made or does it mean it is actually storing UTF-8 encoded bytes into a UTF-16 string ? Thank you PL. Hi PL,
You can use the System.Text.Encoding class to convert one string to a byte array and then back to string in another encoding. byte[] data = System.Text.Encoding.UTF8.GetBytes(utf8string); string unicodestring = System.Text.Encoding.Unicode.GetString(data); Beware that UTF16 can be big endian, in which case use BigEndianUnicode to get the string. As for the second question. Yes all strings are unicode, but the content of the string does not have to be unicode encoded. I believe a string can hold UTF8 encoded data without loss, but if you plan on doing string manipulation I would convert it to unicode first. On Fri, 24 Mar 2006 13:54:48 +0100, PL <pbl***@yahoo.se> wrote: Show quote > > I'm somewhat confused about Unicode but up until now I havent really seen > much issues with using it up until recently. We recently started using an > SMS gateway that requires a unicode message to be sent as a hexadecimal > string where each byte code has been replaced with their hexadecimal > value, > for example: 043104AF0442044D044... > > This string according to their documentation must be in UTF-16 before > conversion to the hexadecimal form, we however are using UTF-8 on our > website and all the texts are entered as UTF-8. > > When I try to send a unicode formatted message using content from our > website it shows some characters correctly but not all of them, I cannot > see > another reason for this than the fact that we are using UTF-8 and they > require it to be in UTF-16. > > Now to the questions: > > 1. How do I convert between UTF-8 and UTF-16 ? I was looking at the > Decoder, > Encoder classes but it doesn't really provide a direct way to convert > between encodings that I could see. > > 2. Since all strings are actually UTF-16 in .NET does this mean that the > conversion already has been made or does it mean it is actually storing > UTF-8 encoded bytes into a UTF-16 string ? > > Thank you > PL. > > -- Happy Coding! Morten Wennevik [C# MVP] Thank you, I was looking at the Encoding class without seeing that simple
solution :-/ PL. Show quote "Morten Wennevik" <MortenWenne***@hotmail.com> skrev i meddelandet news:op.s6w7qmneklbvpo@tr024.bouvet.no... > Hi PL, > > You can use the System.Text.Encoding class to convert one string to a byte > array and then back to string in another encoding. > > byte[] data = System.Text.Encoding.UTF8.GetBytes(utf8string); > string unicodestring = System.Text.Encoding.Unicode.GetString(data); > > Beware that UTF16 can be big endian, in which case use BigEndianUnicode to > get the string. > > As for the second question. Yes all strings are unicode, but the content > of the string does not have to be unicode encoded. I believe a string can > hold UTF8 encoded data without loss, but if you plan on doing string > manipulation I would convert it to unicode first. |
|||||||||||||||||||||||