Home All Groups Group Topic Archive Search About
Author
17 Oct 2007 6:58 AM
Benjamin Fallar III
anybody who has a better idea on parsing a very large text file faster...

and how to determine the number of lines in a textfile?

Author
17 Oct 2007 7:54 AM
Vadym Stetsiak
Hello, Benjamin!

Your parser can read file chunks with fixed size. Parser will process these
chunks and update its state.
For instance:
1) read first 512 bytes from the file
2) parse the data with parser, it's state will change (depends on the data
parser is searching for)
3) read another 512 bytes from the file and do step 2

If you're searching for specific constructs in the text you can use regular
expressions for parsing.

To count number of lines, the same approach can be taken:
1) read data chunk
2) scan form newline symbols (\r\n)
3) if found increment newline counter.
--
With best regards, Vadym Stetsiak.
Blog: http://vadmyst.blogspot.com

You wrote  on Wed, 17 Oct 2007 14:58:24 +0800:

BFI> anybody who has a better idea on parsing a very large text file
BFI> faster...

BFI> and how to determine the number of lines in a textfile?
Author
22 Oct 2007 2:41 PM
Benjamin Fallar III
i'm not interested in chunk of data (in terms of byte size),
in my text file, each line is a record... but my problem is that the size is
almost
more than 28 MB thus, reading, parsing and updating it to CRM database
took almost 1 day to finish.

is there another better solution?

Show quote
"Vadym Stetsiak" <vadm***@gmail.com> wrote in message
news:%23mQpEKJEIHA.3548@TK2MSFTNGP06.phx.gbl...
> Hello, Benjamin!
>
> Your parser can read file chunks with fixed size. Parser will process
> these chunks and update its state.
> For instance:
> 1) read first 512 bytes from the file
> 2) parse the data with parser, it's state will change (depends on the data
> parser is searching for)
> 3) read another 512 bytes from the file and do step 2
>
> If you're searching for specific constructs in the text you can use
> regular expressions for parsing.
>
> To count number of lines, the same approach can be taken:
> 1) read data chunk
> 2) scan form newline symbols (\r\n)
> 3) if found increment newline counter.
> --
> With best regards, Vadym Stetsiak.
> Blog: http://vadmyst.blogspot.com
>
> You wrote  on Wed, 17 Oct 2007 14:58:24 +0800:
>
> BFI> anybody who has a better idea on parsing a very large text file
> BFI> faster...
>
> BFI> and how to determine the number of lines in a textfile?
>
>
>
>

AddThis Social Bookmark Button