|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
DataFormats.HTML with foreign character questionsI recently have to process HTML clipboard format (both retrieval and posting) in C# and I have struck some problem when the HTML fragment contains foreign characters like Chinese or real UTF-8 sequence. When I examine the HTML clipboard data from Clipboard.GetData( DataFormats.Html ), the return type is System.String. For most part it is correct except in data between the <!--StartFragment--> and <!--EndFragment--> which should contain UTF-8 sequence. For some reason the return byte sequences are wrong at certain places. I then used the unmanaged code via C++/CLI to retrieve the same data and compared byte-for-byte with that returned from Clipboard.GetData() to see the real difference. They are indeed different. The one that I retrieve using Win32 API GetClipboardData() is the correct one while that from Clipboard.GetData() is corrupted. Does anyone know why Microsoft has chosen to use System.String for this rather than Byte[], which should be more appropriate? In the end, I wrote my own retrieval function in C++/CLI that returns a Byte[] to allow me to use Encoding.UTF8.GetString() to convert the fragment correctly. On posting using Clipboard.SetData(), I have also observed changes in certain byte sequences. So I am writing my own. Is there any logical explanation or has this been dealt with before. I am using .Net 2.0 Thanks. Leon |
|||||||||||||||||||||||