Home All Groups Group Topic Archive Search About

Text Files, Encoding and NewLine character

Author
4 Jun 2005 6:00 PM
Roby Eisenbraun Martins

hi,

    I'm creating a component to read text files.
    I'm using TextReader with Encoding Text component, because there are
several different types of text files and the encoding component translates
the bytes from this file to the correct characters.
    To which one of this text file's types almost all the time '\r\n' is the
new line character. But sometimes it could not be, that's why I'm using
NewLine property to know exactly who the next line characters are in which
case.
    At that point all work's fine with TextWrite, but hell knows why
TextReader doesn't have a NewLine property.
    Is there any way to identify the NewLine character of a determinate
Encode system? or just stop using the NewLine and work only with '\r\n'?
    Another question, TextReader copies all file from the harddisk as soon
as it is initiated, or it copies part of the file when I'm using Read();?

   Thank you,
   Roby Eisenbraun Martins
Author
4 Jun 2005 7:55 PM
Jon Skeet [C# MVP]
Roby Eisenbraun Martins
<RobyEisenbraunMart***@discussions.microsoft.com> wrote:
>     I'm creating a component to read text files.
>     I'm using TextReader with Encoding Text component, because there are
> several different types of text files and the encoding component translates
> the bytes from this file to the correct characters.
>     To which one of this text file's types almost all the time '\r\n' is the
> new line character. But sometimes it could not be, that's why I'm using
> NewLine property to know exactly who the next line characters are in which
> case.
>     At that point all work's fine with TextWrite, but hell knows why
> TextReader doesn't have a NewLine property.
>     Is there any way to identify the NewLine character of a determinate
> Encode system? or just stop using the NewLine and work only with '\r\n'?

Character encoding is a separate concern from which character or
sequence of characters represents a new line.

>     Another question, TextReader copies all file from the harddisk as soon
> as it is initiated, or it copies part of the file when I'm using Read();?

It buffers some input, but certainly not all of it. Some of the
StreamReader constructors allow you to specify the size of the buffer.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Are all your drivers up to date? click for free checkup

Author
5 Jun 2005 4:09 PM
Roby Eisenbraun Martins
Jon Skeet [C# MVP]" wrote:

Show quoteHide quote
> Roby Eisenbraun Martins
> <RobyEisenbraunMart***@discussions.microsoft.com> wrote:
> >     I'm creating a component to read text files.
> >     I'm using TextReader with Encoding Text component, because there are
> > several different types of text files and the encoding component translates
> > the bytes from this file to the correct characters.
> >     To which one of this text file's types almost all the time '\r\n' is the
> > new line character. But sometimes it could not be, that's why I'm using
> > NewLine property to know exactly who the next line characters are in which
> > case.
> >     At that point all work's fine with TextWrite, but hell knows why
> > TextReader doesn't have a NewLine property.
> >     Is there any way to identify the NewLine character of a determinate
> > Encode system? or just stop using the NewLine and work only with '\r\n'?
>
> Character encoding is a separate concern from which character or
> sequence of characters represents a new line.

   Ok, then why we have the NewLine in the TextWriter? Encoding class does
not have a NewLine to which case?

>
> >     Another question, TextReader copies all file from the harddisk as soon
> > as it is initiated, or it copies part of the file when I'm using Read();?
>
> It buffers some input, but certainly not all of it. Some of the
> StreamReader constructors allow you to specify the size of the buffer.

  Thank you

Show quoteHide quote
>
> --
> Jon Skeet - <sk***@pobox.com>
> http://www.pobox.com/~skeet
> If replying to the group, please do not mail me too
>
Author
5 Jun 2005 4:50 PM
Jon Skeet [C# MVP]
Roby Eisenbraun Martins
<RobyEisenbraunMart***@discussions.microsoft.com> wrote:
> > Character encoding is a separate concern from which character or
> > sequence of characters represents a new line.
>
>    Ok, then why we have the NewLine in the TextWriter?

Because it makes sense to be writing a file with a particular newline
string/character, whatever the encoding is.

> Encoding class does not have a NewLine to which case?

Because as I said, they're separate concerns - what you use for a
newline is entirely separate from the encoding used to represent
characters.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Author
5 Jun 2005 4:25 PM
Roby Eisenbraun Martins
"Jon Skeet [C# MVP]" wrote:
> >     Another question, TextReader copies all file from the harddisk as soon
> > as it is initiated, or it copies part of the file when I'm using Read();?
>
> It buffers some input, but certainly not all of it. Some of the
> StreamReader constructors allow you to specify the size of the buffer.

   Yes there is a constructor that allow me to inform the buffer size, but
with it I have to inform the ENCODING format from the file that I really
don't know, that's why I am using the StreamReader, because It is the only
way to identify Encoding text. In this case, how can I set Buffer size, if I
don't know the other parameters?
Author
5 Jun 2005 4:52 PM
Jon Skeet [C# MVP]
Roby Eisenbraun Martins
<RobyEisenbraunMart***@discussions.microsoft.com> wrote:
> "Jon Skeet [C# MVP]" wrote:
> > >     Another question, TextReader copies all file from the harddisk as soon
> > > as it is initiated, or it copies part of the file when I'm using Read();?
> >
> > It buffers some input, but certainly not all of it. Some of the
> > StreamReader constructors allow you to specify the size of the buffer.
>   
>    Yes there is a constructor that allow me to inform the buffer size, but
> with it I have to inform the ENCODING format from the file that I really
> don't know, that's why I am using the StreamReader, because It is the only
> way to identify Encoding text. In this case, how can I set Buffer size, if I
> don't know the other parameters?

Even if you don't set the other parameters, they have default values
effectively - if you don't set the encoding, UTF-8 is used. In other
words, using

new StreamReader (stream)

is equivalent to

new StreamReader (stream, Encoding.UTF8, true, 1024)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Bookmark and Share