|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Splitting a 31gb text filebulkinsert it into sql but there seems to be something wrong with the file. In order to isolate the issue I have tried splitting it into smaller files. I get an out of memory on the 7th chunk of 10,000,000 lines using the code below. The exe only seems to be consuming about 20Mb of memory. I have tried several techniques but all yield the same results. Any suggestions? I know the best answer is to not have a 31gb text file to begin with but then I have to deal with the unix guys knocking windows inabilities. Sub processfile(ByVal ReadFile As String) Dim oread As IO.StreamReader Dim owrite As IO.StreamWriter Dim x As Int32 Dim readline As String oread = IO.File.OpenText(ReadFile) owrite = IO.File.CreateText("c:\output\" & System.IO.Path.GetFileName(ReadFile) & "_0") While oread.Peek <> -1 readline = oread.ReadLine owrite.WriteLine(readline) x = x + 1 If x Mod 10000000 = 0 Then owrite.Close() owrite.Dispose() owrite = IO.File.CreateText("c:\output\" & System.IO.Path.GetFileName(ReadFile) & "_" & (x / 10000000).ToString) End If If x Mod 1000000 = 0 Then owrite.Flush() oread.DiscardBufferedData() TextBox1.Text = x TextBox1.Refresh() End If End While End Sub web33 <we***@discussions.microsoft.com> wrote:
> I have a 31gb file that was exported from a unix box. I was trying to Is it possible that there's a "line" at about that point which is > bulkinsert it into sql but there seems to be something wrong with the file. > In order to isolate the issue I have tried splitting it into smaller files. > I get an out of memory on the 7th chunk of 10,000,000 lines using the code > below. The exe only seems to be consuming about 20Mb of memory. I have > tried several techniques but all yield the same results. Any suggestions? actually vast (bigger than can fit into memory)? Try writing a loop which *just* reads the lines, and doesn't write them anywhere, maybe writing out a warning if the line is greater than a few thousand characters in length. Are you doing all of this within the UI thread, by the way? If so, it's a bit nasty - and if not, you should be using Control.Invoke to access the text box. I'm also very unclear why you're calling DiscardBufferedData on the StreamReader. That will mean you skip some of the data in the file. -- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too I have worked with files that were about this large on a project and had no
problems. The first step you have to take is determine where the problem is. Is the line too long? Are certain fields too long so they would truncate? Did the Unix output miss a line terminator (yes, it happens, despite what the Unix guys might tell you)? Until you know why the SQL load failed, you will not be able to continue. I am not against splitting files, as you can set up Bulk Insert to multi-thread by handling multiple files. ANother option is BCP, where you can specifiy a number of errors to bypass. If the file stopped adding line terms, you will still overload the pipeline and it will fail, however, so it is still wise to find out what is going wrong. Finally, why are you putting the information in a textbox. You can split the file without putting the data in the UI. -- Show quoteGregory A. Beamer MVP; MCP: +I, SE, SD, DBA ************************************************* Think outside of the box! ************************************************* "web33" <we***@discussions.microsoft.com> wrote in message news:194BC612-972A-4909-B8B7-8C4B8E8B430C@microsoft.com... >I have a 31gb file that was exported from a unix box. I was trying to > bulkinsert it into sql but there seems to be something wrong with the > file. > In order to isolate the issue I have tried splitting it into smaller > files. > I get an out of memory on the 7th chunk of 10,000,000 lines using the code > below. The exe only seems to be consuming about 20Mb of memory. I have > tried several techniques but all yield the same results. Any suggestions? > > I know the best answer is to not have a 31gb text file to begin with but > then I have to deal with the unix guys knocking windows inabilities. > > Sub processfile(ByVal ReadFile As String) > > Dim oread As IO.StreamReader > Dim owrite As IO.StreamWriter > Dim x As Int32 > Dim readline As String > > oread = IO.File.OpenText(ReadFile) > owrite = IO.File.CreateText("c:\output\" & > System.IO.Path.GetFileName(ReadFile) & "_0") > > While oread.Peek <> -1 > readline = oread.ReadLine > owrite.WriteLine(readline) > x = x + 1 > If x Mod 10000000 = 0 Then > owrite.Close() > owrite.Dispose() > owrite = IO.File.CreateText("c:\output\" & > System.IO.Path.GetFileName(ReadFile) & "_" & (x / 10000000).ToString) > End If > > If x Mod 1000000 = 0 Then > owrite.Flush() > oread.DiscardBufferedData() > TextBox1.Text = x > TextBox1.Refresh() > End If > > End While > > End Sub > |
|||||||||||||||||||||||