Home All Groups Group Topic Archive Search About

Problem writing an UTF-8 file

Author
26 Dec 2005 11:42 AM
Jaime Stuardo
Hi all...

I need to create a CSV file from an application. That file has latin
characters such as á, í, ñ and so on. The separator character of the CSV is a
semi-colon (;)

When I create the file using:
StreamWriter f = new StreamWriter(file_name, false, Encoding.UTF8);

and when the file is closed I see that latin characters ara bad encoded
(they looks bad with strange symbols). When this happens I can open the CSV
file directly in Excel what recognizes the separator character.

In order to make latin characters be well encoded, I created the file using
Encoding.Unicode. This way, the file shows data with the right encoding and
latin characters look well. In this case, if I open the file in Excel, it
doesn't recognize the separator character, so all data is placed in the first
column. To solve it partially, I have to save the file with TXT extension,
and then to use the Excel wizard to open the file.

How can I create a file where the encoding include until extended ASCII
(8-bit per character)?

Thanks
Jaime

Author
27 Dec 2005 9:24 AM
Jon Skeet [C# MVP]
Jaime Stuardo <JaimeStua***@discussions.microsoft.com> wrote:
> I need to create a CSV file from an application. That file has latin
> characters such as ?, ?, ? and so on. The separator character of the CSV is a
> semi-colon (;)
>
> When I create the file using:
> StreamWriter f = new StreamWriter(file_name, false, Encoding.UTF8);
>
> and when the file is closed I see that latin characters ara bad encoded
> (they looks bad with strange symbols).

How are you looking at the file? If you're using something which
doesn't understand UTF-8, you would indeed see rubbish.

> When this happens I can open the CSV
> file directly in Excel what recognizes the separator character.
>
> In order to make latin characters be well encoded, I created the file using
> Encoding.Unicode. This way, the file shows data with the right encoding and
> latin characters look well. In this case, if I open the file in Excel, it
> doesn't recognize the separator character, so all data is placed in the first
> column. To solve it partially, I have to save the file with TXT extension,
> and then to use the Excel wizard to open the file.
>
> How can I create a file where the encoding include until extended ASCII
> (8-bit per character)?

Well, there's no single "extended ASCII" encoding, and UTF-8 isn't what
you're thinking of anyway. You could try using Encoding.Default, but
that's not a terribly pleasant solution. I don't know whether Excel
supports Unicode/UTF-8 CSV files - you might want to ask in an Excel
newsgroup. Once you know which encoding you want to use, the .NET side
is easy.

See http://www.pobox.com/~skeet/csharp/unicode.html and
http://www.pobox.com/~skeet/csharp/debuggingunicode.html for more
information.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

AddThis Social Bookmark Button