|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
parsing text fileanybody who has a better idea on parsing a very large text file faster...
and how to determine the number of lines in a textfile? Hello, Benjamin!
Your parser can read file chunks with fixed size. Parser will process these chunks and update its state. For instance: 1) read first 512 bytes from the file 2) parse the data with parser, it's state will change (depends on the data parser is searching for) 3) read another 512 bytes from the file and do step 2 If you're searching for specific constructs in the text you can use regular expressions for parsing. To count number of lines, the same approach can be taken: 1) read data chunk 2) scan form newline symbols (\r\n) 3) if found increment newline counter. -- With best regards, Vadym Stetsiak. Blog: http://vadmyst.blogspot.com You wrote on Wed, 17 Oct 2007 14:58:24 +0800: BFI> anybody who has a better idea on parsing a very large text file BFI> faster... BFI> and how to determine the number of lines in a textfile? i'm not interested in chunk of data (in terms of byte size),
in my text file, each line is a record... but my problem is that the size is almost more than 28 MB thus, reading, parsing and updating it to CRM database took almost 1 day to finish. is there another better solution? Show quote "Vadym Stetsiak" <vadm***@gmail.com> wrote in message news:%23mQpEKJEIHA.3548@TK2MSFTNGP06.phx.gbl... > Hello, Benjamin! > > Your parser can read file chunks with fixed size. Parser will process > these chunks and update its state. > For instance: > 1) read first 512 bytes from the file > 2) parse the data with parser, it's state will change (depends on the data > parser is searching for) > 3) read another 512 bytes from the file and do step 2 > > If you're searching for specific constructs in the text you can use > regular expressions for parsing. > > To count number of lines, the same approach can be taken: > 1) read data chunk > 2) scan form newline symbols (\r\n) > 3) if found increment newline counter. > -- > With best regards, Vadym Stetsiak. > Blog: http://vadmyst.blogspot.com > > You wrote on Wed, 17 Oct 2007 14:58:24 +0800: > > BFI> anybody who has a better idea on parsing a very large text file > BFI> faster... > > BFI> and how to determine the number of lines in a textfile? > > > > |
|||||||||||||||||||||||