|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
System.IO.StreamWriter uses two bytes for ASCII characters with UTI am creating a text file with a StreamWriter set to UTF8 encoding like in
the following example: Using writer As New IO.StreamWriter("C:\temp\HelloWorld.txt", False, System.Text.Encoding.UTF8) writer.Write("Hello World") End Using It appears that ASCII characters are written using two bytes. From what I have read UTF-8 is a variable length character encoding format and standard ASCII characters are written using only one byte. Why is it writing two bytes? Is there a way to change this behavior? Thanks, Joe
Show quote
"Joe" <campesino@community.nospam> wrote in message How are you determining that it's writing two bytes? It shouldn't, and it news:698796BA-4A72-4D82-96DA-8BA59193FC13@microsoft.com... >I am creating a text file with a StreamWriter set to UTF8 encoding like in > the following example: > > Using writer As New IO.StreamWriter("C:\temp\HelloWorld.txt", > False, > System.Text.Encoding.UTF8) > writer.Write("Hello World") > End Using > > It appears that ASCII characters are written using two bytes. From what I > have read UTF-8 is a variable length character encoding format and > standard > ASCII characters are written using only one byte. > > Why is it writing two bytes? Is there a way to change this behavior? doesn't when I just tried it. Note that the file will have a BOM (byte order mark) prepended: 0xEF, 0xBB, 0xBF, but after that, all ASCII characters are encoded as a single byte since their code points are all <0x80. -cd I am using UltraEdit to view the HEX version of the text file. If I execute:
Using writer As New IO.StreamWriter("C:\temp\HelloWorldUtf8.txt", False, System.Text.Encoding.UTF8) writer.Write("Hello World") End Using Then I get the output: FF FE FF FE 48 00 65 00 6C 00 6C 00 6F 00 20 00 57 00 6F 00 72 00 6C 00 64 00 If I execute this command: Using writer As New IO.StreamWriter("C:\temp\HelloWorldASCII.txt", False, System.Text.Encoding.ASCII) writer.Write("Hello World") End Using Then I get this output: 48 65 6C 6C 6F 20 57 6F 72 6C 64 What tool are you using to verify the actual bytes written? Joe wrote:
Show quote > I am using UltraEdit to view the HEX version of the text file. If I I see you figured it out. Odd that UltraEdit would do that - apparently it > execute: > > Using writer As New > IO.StreamWriter("C:\temp\HelloWorldUtf8.txt", False, > System.Text.Encoding.UTF8) writer.Write("Hello World") > End Using > > Then I get the output: > FF FE FF FE 48 00 65 00 6C 00 6C 00 6F 00 20 00 57 00 6F 00 72 00 6C > 00 64 00 > > If I execute this command: > > Using writer As New > IO.StreamWriter("C:\temp\HelloWorldASCII.txt", False, > System.Text.Encoding.ASCII) writer.Write("Hello World") > End Using > > Then I get this output: > > 48 65 6C 6C 6F 20 57 6F 72 6C 64 > > What tool are you using to verify the actual bytes written? was converting the file to UCS-2 and then showing you the binary version of that. Incidentally, I was using Visual Studio to examine the file as binary. (Click the little drop-down arrow on the Open button when opening a file in VS and choose "Binary Editor" to open a text file as binary). -cd Ah Ha. You got me curious so I downloaded TextPad and used that to examine
the binary. You are correct, it is writing it properly. This must be a quirk with UltraEdit. Other than the BOM the ASCII and UTF8 are exactly the same. Thanks for helping me with my silly mistake!!! -Joe Show quote "Carl Daniel [VC++ MVP]" wrote: > "Joe" <campesino@community.nospam> wrote in message > news:698796BA-4A72-4D82-96DA-8BA59193FC13@microsoft.com... > >I am creating a text file with a StreamWriter set to UTF8 encoding like in > > the following example: > > > > Using writer As New IO.StreamWriter("C:\temp\HelloWorld.txt", > > False, > > System.Text.Encoding.UTF8) > > writer.Write("Hello World") > > End Using > > > > It appears that ASCII characters are written using two bytes. From what I > > have read UTF-8 is a variable length character encoding format and > > standard > > ASCII characters are written using only one byte. > > > > Why is it writing two bytes? Is there a way to change this behavior? > > How are you determining that it's writing two bytes? It shouldn't, and it > doesn't when I just tried it. > > Note that the file will have a BOM (byte order mark) prepended: 0xEF, 0xBB, > 0xBF, but after that, all ASCII characters are encoded as a single byte > since their code points are all <0x80. > > -cd > > > I don't see that behavior. It does add a utf8 preamble, but the ascii is
ascii. using (StreamWriter sw = new StreamWriter(Console.OpenStandardOutput(), Encoding.UTF8)) { sw.Write("Hello World"); } Output: Hello World -- Show quoteWilliam Stacey [MVP] "Joe" <campesino@community.nospam> wrote in message news:698796BA-4A72-4D82-96DA-8BA59193FC13@microsoft.com... |I am creating a text file with a StreamWriter set to UTF8 encoding like in | the following example: | | Using writer As New IO.StreamWriter("C:\temp\HelloWorld.txt", False, | System.Text.Encoding.UTF8) | writer.Write("Hello World") | End Using | | It appears that ASCII characters are written using two bytes. From what I | have read UTF-8 is a variable length character encoding format and standard | ASCII characters are written using only one byte. | | Why is it writing two bytes? Is there a way to change this behavior? | | Thanks, | Joe |
|||||||||||||||||||||||