Home All Groups Group Topic Archive Search About

System.IO.StreamWriter uses two bytes for ASCII characters with UT

Author
30 Aug 2006 10:11 PM
Joe
I am creating a text file with a StreamWriter set to UTF8 encoding like in
the following example:

        Using writer As New IO.StreamWriter("C:\temp\HelloWorld.txt", False,
System.Text.Encoding.UTF8)
            writer.Write("Hello World")
        End Using

It appears that ASCII characters are written using two bytes.  From what I
have read UTF-8 is a variable length character encoding format and standard
ASCII characters are written using only one byte.

Why is it writing two bytes?  Is there a way to change this behavior?

Thanks,
Joe

Author
30 Aug 2006 10:33 PM
Carl Daniel [VC++ MVP]
Show quote
"Joe" <campesino@community.nospam> wrote in message
news:698796BA-4A72-4D82-96DA-8BA59193FC13@microsoft.com...
>I am creating a text file with a StreamWriter set to UTF8 encoding like in
> the following example:
>
>        Using writer As New IO.StreamWriter("C:\temp\HelloWorld.txt",
> False,
> System.Text.Encoding.UTF8)
>            writer.Write("Hello World")
>        End Using
>
> It appears that ASCII characters are written using two bytes.  From what I
> have read UTF-8 is a variable length character encoding format and
> standard
> ASCII characters are written using only one byte.
>
> Why is it writing two bytes?  Is there a way to change this behavior?

How are you determining that it's writing two bytes?  It shouldn't, and it
doesn't when I just tried it.

Note that the file will have a BOM (byte order mark) prepended: 0xEF, 0xBB,
0xBF, but after that, all ASCII characters are encoded as a single byte
since their code points are all <0x80.

-cd
Author
31 Aug 2006 12:22 AM
Joe
I am using UltraEdit to view the HEX version of the text file.  If I execute:

        Using writer As New IO.StreamWriter("C:\temp\HelloWorldUtf8.txt",
False, System.Text.Encoding.UTF8)
            writer.Write("Hello World")
        End Using

Then I get the output:
FF FE FF FE 48 00 65 00 6C 00 6C 00 6F 00 20 00 57 00 6F 00 72 00 6C 00 64 00

If I execute this command:

        Using writer As New IO.StreamWriter("C:\temp\HelloWorldASCII.txt",
False, System.Text.Encoding.ASCII)
            writer.Write("Hello World")
        End Using

Then I get this output:

48 65 6C 6C 6F 20 57 6F 72 6C 64

What tool are you using to verify the actual bytes written?
Author
31 Aug 2006 3:10 AM
Carl Daniel [VC++ MVP]
Joe wrote:
Show quote
> I am using UltraEdit to view the HEX version of the text file.  If I
> execute:
>
>        Using writer As New
> IO.StreamWriter("C:\temp\HelloWorldUtf8.txt", False,
>            System.Text.Encoding.UTF8) writer.Write("Hello World")
>        End Using
>
> Then I get the output:
> FF FE FF FE 48 00 65 00 6C 00 6C 00 6F 00 20 00 57 00 6F 00 72 00 6C
> 00 64 00
>
> If I execute this command:
>
>        Using writer As New
> IO.StreamWriter("C:\temp\HelloWorldASCII.txt", False,
>            System.Text.Encoding.ASCII) writer.Write("Hello World")
>        End Using
>
> Then I get this output:
>
> 48 65 6C 6C 6F 20 57 6F 72 6C 64
>
> What tool are you using to verify the actual bytes written?

I see you figured it out.  Odd that UltraEdit would do that - apparently it
was converting the file to UCS-2 and then showing you the binary version of
that.

Incidentally, I was using Visual Studio to examine the file as binary.
(Click the little drop-down arrow on the Open button when opening a file in
VS and choose "Binary Editor" to open a text file as binary).

-cd
Author
31 Aug 2006 12:32 AM
Joe
Ah Ha.  You got me curious so I downloaded TextPad and used that to examine
the binary.  You are correct, it is writing it properly.  This must be a
quirk with UltraEdit.  Other than the BOM the ASCII and UTF8 are exactly the
same.

Thanks for helping me with my silly mistake!!!

-Joe

Show quote
"Carl Daniel [VC++ MVP]" wrote:

> "Joe" <campesino@community.nospam> wrote in message
> news:698796BA-4A72-4D82-96DA-8BA59193FC13@microsoft.com...
> >I am creating a text file with a StreamWriter set to UTF8 encoding like in
> > the following example:
> >
> >        Using writer As New IO.StreamWriter("C:\temp\HelloWorld.txt",
> > False,
> > System.Text.Encoding.UTF8)
> >            writer.Write("Hello World")
> >        End Using
> >
> > It appears that ASCII characters are written using two bytes.  From what I
> > have read UTF-8 is a variable length character encoding format and
> > standard
> > ASCII characters are written using only one byte.
> >
> > Why is it writing two bytes?  Is there a way to change this behavior?
>
> How are you determining that it's writing two bytes?  It shouldn't, and it
> doesn't when I just tried it.
>
> Note that the file will have a BOM (byte order mark) prepended: 0xEF, 0xBB,
> 0xBF, but after that, all ASCII characters are encoded as a single byte
> since their code points are all <0x80.
>
> -cd
>
>
>
Author
30 Aug 2006 10:36 PM
William Stacey [MVP]
I don't see that behavior.  It does add a utf8 preamble, but the ascii is
ascii.
            using (StreamWriter sw = new
StreamWriter(Console.OpenStandardOutput(), Encoding.UTF8))

            {

                sw.Write("Hello World");

            }



Output:

Hello World


--
William Stacey [MVP]

Show quote
"Joe" <campesino@community.nospam> wrote in message
news:698796BA-4A72-4D82-96DA-8BA59193FC13@microsoft.com...
|I am creating a text file with a StreamWriter set to UTF8 encoding like in
| the following example:
|
|        Using writer As New IO.StreamWriter("C:\temp\HelloWorld.txt",
False,
| System.Text.Encoding.UTF8)
|            writer.Write("Hello World")
|        End Using
|
| It appears that ASCII characters are written using two bytes.  From what I
| have read UTF-8 is a variable length character encoding format and
standard
| ASCII characters are written using only one byte.
|
| Why is it writing two bytes?  Is there a way to change this behavior?
|
| Thanks,
| Joe

AddThis Social Bookmark Button