Home All Groups Group Topic Archive Search About

Reading and writing a large binary file fails

Author
15 May 2006 5:21 PM
TrinityPete
Hi all,

I am reading a 240MB+ binary file performing some changes and writing it
back out. For now I have removed the code that performs changes so in its
simplistic form reading a large binary and then writing it back out.

After about 4MB I receive an exception:

The output char buffer is too small to contain the decoded characters,
encoding 'Unicode' fallback 'System.Text.DecoderReplacementFallback'.
Parameter name: chars

I really cant figure this one out. Code is below ( I put the flushes in
hoping that was the problem)

                try
                {
                    //Open the source file
                    sourcestream = SourceFile.Open(FileMode.Open,
FileAccess.Read, FileShare.None);

                    //open the output file
                    targetstream = TargetFile.Open(FileMode.CreateNew,
FileAccess.Write, FileShare.None);

                    breader = new BinaryReader(sourcestream);
                    bwriter = new BinaryWriter(targetstream);


                    for (long i = 0; i < SourceFile.Length; i++)
                    {
                        bwriter.Write(breader.Read());
                        if (i % 2000 == 0)
                        {
                            bwriter.Flush();
                            targetstream.Flush();
                        }
                    }

                    breader.Close();
                    sourcestream.Close();

                    bwriter.Close();
                    targetstream.Close();

                    //Delete the original source file
                    SourceFile.Delete();

Any help would be greatly appreciated
Regards, Pete.

Author
15 May 2006 5:34 PM
Vadym Stetsyak
Hello, TrinityPete!

T> I am reading a 240MB+ binary file performing some changes and writing it
T> back out. For now I have removed the code that performs changes so in
T> its simplistic form reading a large binary and then writing it back out.

T> After about 4MB I receive an exception:

T> The output char buffer is too small to contain the decoded characters,
T> encoding 'Unicode' fallback 'System.Text.DecoderReplacementFallback'.
T> Parameter name: chars

On what operation did you get an exception?
breader.Read() or bwriter.Write()? Maybe on the other?

--
Regards, Vadym Stetsyak
www: http://vadmyst.blogspot.com
Author
15 May 2006 5:50 PM
TrinityPete
Looks like it was on the read??

Just been messing some more with the code and if I change
bwriter.Write(breader.Read());
to
bwriter.Write(breader.ReadByte());

it works fine. It still doesn't help in understanding whats happening with
the original statement.

Full stack trace:

"   at System.Text.Encoding.ThrowCharsOverflow()\r\n   at
System.Text.Encoding.ThrowCharsOverflow(DecoderNLS decoder, Boolean
nothingDecoded)\r\n   at System.Text.UTF8Encoding.GetChars(Byte* bytes, Int32
byteCount, Char* chars, Int32 charCount, DecoderNLS baseDecoder)\r\n   at
System.Text.DecoderNLS.GetChars(Byte* bytes, Int32 byteCount, Char* chars,
Int32 charCount, Boolean flush)\r\n   at
System.Text.DecoderNLS.GetChars(Byte[] bytes, Int32 byteIndex, Int32
byteCount, Char[] chars, Int32 charIndex, Boolean flush)\r\n   at
System.Text.DecoderNLS.GetChars(Byte[] bytes, Int32 byteIndex, Int32
byteCount, Char[] chars, Int32 charIndex)\r\n   at
System.IO.BinaryReader.InternalReadOneChar()\r\n   at
System.IO.BinaryReader.Read()\r\n   at
TCS.Utilities.TCSDirMonClasses.tcsDirMonitor.MoveFile(FileInfo SourceFile,
FileInfo TargetFile) in D:\\DOTNETDEV
VS2005\\TCS.Utilities.TCSDirMon\\TCS.Utilities.TCSDirMon\\TCS.Utilities.TCSDirMonClasses\\TCSDirMonClasses.cs:line 797"

Pete.


Show quote
"Vadym Stetsyak" wrote:

> Hello, TrinityPete!
>
>  T> I am reading a 240MB+ binary file performing some changes and writing it
>  T> back out. For now I have removed the code that performs changes so in
>  T> its simplistic form reading a large binary and then writing it back out.
>
>  T> After about 4MB I receive an exception:
>
>  T> The output char buffer is too small to contain the decoded characters,
>  T> encoding 'Unicode' fallback 'System.Text.DecoderReplacementFallback'.
>  T> Parameter name: chars
>
> On what operation did you get an exception?
> breader.Read() or bwriter.Write()? Maybe on the other?
>
> --
> Regards, Vadym Stetsyak
> www: http://vadmyst.blogspot
Author
15 May 2006 5:40 PM
Barry Kelly
TrinityPete <TrinityP***@discussions.microsoft.com> wrote:

> I am reading a 240MB+ binary file performing some changes and writing it
> back out.

So it's a binary file - and doesn't contain meaningful text.

>                         bwriter.Write(breader.Read());

The documentation for BinaryReader.Read() states:

---8<---
Reads characters from the underlying stream and advances the current
position of the stream in accordance with the Encoding used and the
specific character being read from the stream.
--->8---

This reads *characters* according to the encoding associated with the
BinaryReader (which defaults to UTF8Encoding).

Having read a character (which may be more than one byte due to UTF8
being a multibyte encoding), you write to the BinaryWriter.Write(Int32)
overload, which writes out exactly 4 bytes corresponding to an int.

As near as I can make out, you should be using something more like:

---8<---
  bwriter.Write(breader.ReadInt32());
--->8---

Don't forget that you can open an existing file and seek within it, and
make changes in place - you won't be able to insert easily, though.

BTW: To reduce fragmentation, you may want to extend your target file by
calling SetLength on the output stream before doing your stream-based
editing. If you don't, Windows will end up being too optimistic and try
to squeeze the increasingly long file in all the fragmented bits of free
space on your drive. I've seen files with >1000 fragments easily created
because of this, taking a significant first-time-read hit next time
they're accessed.

-- Barry
Author
15 May 2006 5:57 PM
TrinityPete
Thanks barry, that seems to make sense now - if changed the read() to
readbyte() it works OK.

Thank you.
Pete.

Show quote
"Barry Kelly" wrote:

> TrinityPete <TrinityP***@discussions.microsoft.com> wrote:
>
> > I am reading a 240MB+ binary file performing some changes and writing it
> > back out.
>
> So it's a binary file - and doesn't contain meaningful text.
>
> >                         bwriter.Write(breader.Read());
>
> The documentation for BinaryReader.Read() states:
>
> ---8<---
> Reads characters from the underlying stream and advances the current
> position of the stream in accordance with the Encoding used and the
> specific character being read from the stream.
> --->8---
>
> This reads *characters* according to the encoding associated with the
> BinaryReader (which defaults to UTF8Encoding).
>
> Having read a character (which may be more than one byte due to UTF8
> being a multibyte encoding), you write to the BinaryWriter.Write(Int32)
> overload, which writes out exactly 4 bytes corresponding to an int.
>
> As near as I can make out, you should be using something more like:
>
> ---8<---
>   bwriter.Write(breader.ReadInt32());
> --->8---
>
> Don't forget that you can open an existing file and seek within it, and
> make changes in place - you won't be able to insert easily, though.
>
> BTW: To reduce fragmentation, you may want to extend your target file by
> calling SetLength on the output stream before doing your stream-based
> editing. If you don't, Windows will end up being too optimistic and try
> to squeeze the increasingly long file in all the fragmented bits of free
> space on your drive. I've seen files with >1000 fragments easily created
> because of this, taking a significant first-time-read hit next time
> they're accessed.
>
> -- Barry
>

AddThis Social Bookmark Button