|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Encoding convertions... optimized two-stage table?Hi,
I want to convert UTF-16 (or unicode) to ISO-8859-1... The .Net encoding does a pretty code job, but some characters are not converted, like "O" that becomes "?"... I want it to become "oe"... So, what I want to know is the method used by the .NET encoders to convert from one encoding to the other... is it using an optimized two-stage table or a multistate table or other method? As I know the first 256 characters are the same, it's easy to convert those 256, but for the others, we have to make a correspondance, would an optimized two-stage table be the best way to go? Does somebody know where I could get such a table so I don't have to type it all myself? thanks ThunderMusic ThunderMusic <NoSpAmdanlatathotmaildotcom@NoSpAm.com> wrote:
> I want to convert UTF-16 (or unicode) to ISO-8859-1... The .Net Converting from one encoding to another is just a matter of decoding > encoding does a pretty code job, but some characters are not converted, like > "O" that becomes "?"... I want it to become "oe"... So, what I want to know > is the method used by the .NET encoders to convert from one encoding to the > other... is it using an optimized two-stage table or a multistate table or > other method? from a byte array to the Unicode (.NET's "native" UTF-16 format), then encoding from the Unicode to a byte array. > As I know the first 256 characters are the same, it's easy to convert those The process that .NET encodings are using won't help you much, > 256, but for the others, we have to make a correspondance, would an > optimized two-stage table be the best way to go? Does somebody know where I > could get such a table so I don't have to type it all myself? unfortunately. It sounds like the conversion you need is entirely within text form - from the combined character to the multi-character version. -- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too > I want to convert UTF-16 (or unicode) to ISO-8859-1... The .Net The real solution is to move everything to Unicode, not trying to "squize" > encoding does a pretty code job, but some characters are not converted, > like "O" that becomes "?"... I want it to become "oe"... So, what I want > to know > is the method used by the .NET encoders to convert from one encoding to the > other... is it using an optimized two-stage table or a multistate table or > other method? the whole Unicode thru some code page hole with a non-standard, patchy conversion. -- Mihai Nita [Microsoft MVP, Windows - SDK] http://www.mihai-nita.net ------------------------------------------ Replace _year_ with _ to get the real email actually, I just seen that the "O" character went wrong in the post (it must
be plain US ASCII), what I wanted to post is the one character "oe"... and you just seen that the conversion is not perfect because only the "O" went through... Well, I'll try my best anyway... thanks everyone... ThunderMusic Show quote "Mihai N." <nmihai_year_2***@yahoo.com> wrote in message news:Xns98CA9C99C9B49MihaiN@207.46.248.16... >> I want to convert UTF-16 (or unicode) to ISO-8859-1... The .Net >> encoding does a pretty code job, but some characters are not converted, >> like "O" that becomes "?"... I want it to become "oe"... So, what I want >> to know >> is the method used by the .NET encoders to convert from one encoding to >> the >> other... is it using an optimized two-stage table or a multistate table >> or >> other method? > > The real solution is to move everything to Unicode, not trying to "squize" > the whole Unicode thru some code page hole with a non-standard, > patchy conversion. > > > -- > Mihai Nita [Microsoft MVP, Windows - SDK] > http://www.mihai-nita.net > ------------------------------------------ > Replace _year_ with _ to get the real email |
|||||||||||||||||||||||