Home All Groups Group Topic Archive Search About

What is the quickest/most efficient way to insert 100,000 Rows

Author
16 Feb 2006 6:50 PM
Fiddelm3742
Well here is what I want to do.  I'm writting an aplication that will run
through a data file, and either insert the records if they do not exist in
the database, or update them if they do exist.  I'm inserting/updating over
100,000 records.  The way I'm currently doing this is creating a temp table,
Checking if a row exists in the Actual database, if it does I am bringing it
to my temp table via a Fill.  I'm then running a
LoadDataRow(strTokens,false); to add the row to my temporary table.  If it's
an insert I'm just directly running LoadDataRow etc. to add it to my table. 
Once all of my data has been added/updated to my temp table i run, the table
update.
da.Update(dsInfo,strLoadTable);

This process takes a long time to run.  Is there a more efficient way i
should be doing this?

Author
16 Feb 2006 8:22 PM
Li-fan Chen
Hi Fiddel3742,

A DTS Package might serve your interest. You will have to add a DTS Task
pre- and post- work to remove and add the indexes as needed.

Another way is to import the table using DTS/BCP, and then doing a set join
to produce a new table. Again this can be done from within DTS to add ease
of customization for future jobs.

Best personal regards,
-- Li-fan

--
Li-fan Chen
Software analyst/developer, Entrepreneur
Markham, Ontario, Canada
Show quote
"Fiddelm3742" <Fiddelm3***@discussions.microsoft.com> wrote in message
news:56C0144C-AEBC-4F63-9078-9D07F51010E7@microsoft.com...
> Well here is what I want to do.  I'm writting an aplication that will run
> through a data file, and either insert the records if they do not exist in
> the database, or update them if they do exist.  I'm inserting/updating
> over
> 100,000 records.  The way I'm currently doing this is creating a temp
> table,
> Checking if a row exists in the Actual database, if it does I am bringing
> it
> to my temp table via a Fill.  I'm then running a
> LoadDataRow(strTokens,false); to add the row to my temporary table.  If
> it's
> an insert I'm just directly running LoadDataRow etc. to add it to my
> table.
> Once all of my data has been added/updated to my temp table i run, the
> table
> update.
> da.Update(dsInfo,strLoadTable);
>
> This process takes a long time to run.  Is there a more efficient way i
> should be doing this?
> --
> www.Fiddelke.org
Author
16 Feb 2006 9:20 PM
Matt Fiddelke
Not sure that is going to work for me.  What I'm developing is a generic
application to Load data files (semicolon delimited files) into our tables. 
Our datafiles contain BOTH rows to insert and update intermingled.  From what
I know BCP will not load if there are updates (same as bulk insert)  Another
problem with this idea is that these programs need to run once the data files
become available.  Not on a timed schedule.  When our data file shows up then
it will run my program which will insert/update the data from the file to the
database for whichever table/datafile becomes available.  Maybe i'm totally
missing your point.  If so please correct me.
Thanks

Show quote
"Li-fan Chen" wrote:

> Hi Fiddel3742,
>
> A DTS Package might serve your interest. You will have to add a DTS Task
> pre- and post- work to remove and add the indexes as needed.
>
> Another way is to import the table using DTS/BCP, and then doing a set join
> to produce a new table. Again this can be done from within DTS to add ease
> of customization for future jobs.
>
> Best personal regards,
> -- Li-fan
>
> --
> Li-fan Chen
> Software analyst/developer, Entrepreneur
> Markham, Ontario, Canada
> "Fiddelm3742" <Fiddelm3***@discussions.microsoft.com> wrote in message
> news:56C0144C-AEBC-4F63-9078-9D07F51010E7@microsoft.com...
> > Well here is what I want to do.  I'm writting an aplication that will run
> > through a data file, and either insert the records if they do not exist in
> > the database, or update them if they do exist.  I'm inserting/updating
> > over
> > 100,000 records.  The way I'm currently doing this is creating a temp
> > table,
> > Checking if a row exists in the Actual database, if it does I am bringing
> > it
> > to my temp table via a Fill.  I'm then running a
> > LoadDataRow(strTokens,false); to add the row to my temporary table.  If
> > it's
> > an insert I'm just directly running LoadDataRow etc. to add it to my
> > table.
> > Once all of my data has been added/updated to my temp table i run, the
> > table
> > update.
> > da.Update(dsInfo,strLoadTable);
> >
> > This process takes a long time to run.  Is there a more efficient way i
> > should be doing this?
> > --
> > www.Fiddelke.org
>
>
>
Author
16 Feb 2006 9:31 PM
Robert Simpson
The fastest thing you can possibly do is bulk insert all the data into a
temporary table, and then merge both tables together afterwards.

Show quote
"Matt Fiddelke" <Matt Fidde***@discussions.microsoft.com> wrote in message
news:E9E4ACAD-7D57-4033-BBBA-207C5FABEF16@microsoft.com...
> Not sure that is going to work for me.  What I'm developing is a generic
> application to Load data files (semicolon delimited files) into our
> tables.
> Our datafiles contain BOTH rows to insert and update intermingled.  From
> what
> I know BCP will not load if there are updates (same as bulk insert)
> Another
> problem with this idea is that these programs need to run once the data
> files
> become available.  Not on a timed schedule.  When our data file shows up
> then
> it will run my program which will insert/update the data from the file to
> the
> database for whichever table/datafile becomes available.  Maybe i'm
> totally
> missing your point.  If so please correct me.
> Thanks
>
> "Li-fan Chen" wrote:
>
>> Hi Fiddel3742,
>>
>> A DTS Package might serve your interest. You will have to add a DTS Task
>> pre- and post- work to remove and add the indexes as needed.
>>
>> Another way is to import the table using DTS/BCP, and then doing a set
>> join
>> to produce a new table. Again this can be done from within DTS to add
>> ease
>> of customization for future jobs.
>>
>> Best personal regards,
>> -- Li-fan
>>
>> --
>> Li-fan Chen
>> Software analyst/developer, Entrepreneur
>> Markham, Ontario, Canada
>> "Fiddelm3742" <Fiddelm3***@discussions.microsoft.com> wrote in message
>> news:56C0144C-AEBC-4F63-9078-9D07F51010E7@microsoft.com...
>> > Well here is what I want to do.  I'm writting an aplication that will
>> > run
>> > through a data file, and either insert the records if they do not exist
>> > in
>> > the database, or update them if they do exist.  I'm inserting/updating
>> > over
>> > 100,000 records.  The way I'm currently doing this is creating a temp
>> > table,
>> > Checking if a row exists in the Actual database, if it does I am
>> > bringing
>> > it
>> > to my temp table via a Fill.  I'm then running a
>> > LoadDataRow(strTokens,false); to add the row to my temporary table.  If
>> > it's
>> > an insert I'm just directly running LoadDataRow etc. to add it to my
>> > table.
>> > Once all of my data has been added/updated to my temp table i run, the
>> > table
>> > update.
>> > da.Update(dsInfo,strLoadTable);
>> >
>> > This process takes a long time to run.  Is there a more efficient way i
>> > should be doing this?
>> > --
>> > www.Fiddelke.org
>>
>>
>>
Author
16 Feb 2006 10:55 PM
Pablo Castro [MS]
If you're using ADO.NET 2.0, you can use the SqlBulkCopy class to
bulk-insert rows into SQL Server. If you also have updates, you can copy
updated values to a temp table and then do a set-based update to move the
changes to the actual table as Robert suggested below.

I walked in detail through a technique to do that in the last PDC. You can
see the talk and get the sample code here:
http://blogs.msdn.com/dataaccess/archive/2006/01/09/510083.aspx

btw - that sample code also contains a very small query processor that shows
how to do queries (including joins) over DataSets :)

--
Pablo Castro
Program Manager - ADO.NET Team
Microsoft Corp.

This posting is provided "AS IS" with no warranties, and confers no rights.


Show quote
"Robert Simpson" <rmsimpson@noemail.noemail> wrote in message
news:et3PfB0MGHA.1032@TK2MSFTNGP11.phx.gbl...
> The fastest thing you can possibly do is bulk insert all the data into a
> temporary table, and then merge both tables together afterwards.
>
> "Matt Fiddelke" <Matt Fidde***@discussions.microsoft.com> wrote in message
> news:E9E4ACAD-7D57-4033-BBBA-207C5FABEF16@microsoft.com...
>> Not sure that is going to work for me.  What I'm developing is a generic
>> application to Load data files (semicolon delimited files) into our
>> tables.
>> Our datafiles contain BOTH rows to insert and update intermingled.  From
>> what
>> I know BCP will not load if there are updates (same as bulk insert)
>> Another
>> problem with this idea is that these programs need to run once the data
>> files
>> become available.  Not on a timed schedule.  When our data file shows up
>> then
>> it will run my program which will insert/update the data from the file to
>> the
>> database for whichever table/datafile becomes available.  Maybe i'm
>> totally
>> missing your point.  If so please correct me.
>> Thanks
>>
>> "Li-fan Chen" wrote:
>>
>>> Hi Fiddel3742,
>>>
>>> A DTS Package might serve your interest. You will have to add a DTS Task
>>> pre- and post- work to remove and add the indexes as needed.
>>>
>>> Another way is to import the table using DTS/BCP, and then doing a set
>>> join
>>> to produce a new table. Again this can be done from within DTS to add
>>> ease
>>> of customization for future jobs.
>>>
>>> Best personal regards,
>>> -- Li-fan
>>>
>>> --
>>> Li-fan Chen
>>> Software analyst/developer, Entrepreneur
>>> Markham, Ontario, Canada
>>> "Fiddelm3742" <Fiddelm3***@discussions.microsoft.com> wrote in message
>>> news:56C0144C-AEBC-4F63-9078-9D07F51010E7@microsoft.com...
>>> > Well here is what I want to do.  I'm writting an aplication that will
>>> > run
>>> > through a data file, and either insert the records if they do not
>>> > exist in
>>> > the database, or update them if they do exist.  I'm inserting/updating
>>> > over
>>> > 100,000 records.  The way I'm currently doing this is creating a temp
>>> > table,
>>> > Checking if a row exists in the Actual database, if it does I am
>>> > bringing
>>> > it
>>> > to my temp table via a Fill.  I'm then running a
>>> > LoadDataRow(strTokens,false); to add the row to my temporary table.
>>> > If
>>> > it's
>>> > an insert I'm just directly running LoadDataRow etc. to add it to my
>>> > table.
>>> > Once all of my data has been added/updated to my temp table i run, the
>>> > table
>>> > update.
>>> > da.Update(dsInfo,strLoadTable);
>>> >
>>> > This process takes a long time to run.  Is there a more efficient way
>>> > i
>>> > should be doing this?
>>> > --
>>> > www.Fiddelke.org
>>>
>>>
>>>
>
>
Author
16 Feb 2006 11:37 PM
Li-fan Chen
Good call Pablo,

Just to clarify, the solution I proposed is basically the same as Pablos but
intended for older SQL Servers. The whole exercise of building a table using
a set JOIN with the updated value is in the interest of uptime.

Master Table = 50,000,000 records
Updates = 2,750,000 records

You don't want to do a set update if you are serving Master Table to the
web.

You make join of the Updates table and the Master table and make an off-line
table from it.
And then you swap the online Master with the new updated off-line Master.

Fast. Everyone wins.

Am I wrong? Please let me know.

If you are running this at 4am, it's a small update, and no one will miss
the site? Do what Pablo showed you.

Best personal regards,
-- Li-fan

--
Li-fan Chen
Software analyst/developer, Entrepreneur
Markham, Ontario, Canada
Show quote
"Pablo Castro [MS]" <pablo***@online.microsoft.com> wrote in message
news:Ovfojw0MGHA.3064@TK2MSFTNGP10.phx.gbl...
> If you're using ADO.NET 2.0, you can use the SqlBulkCopy class to
> bulk-insert rows into SQL Server. If you also have updates, you can copy
> updated values to a temp table and then do a set-based update to move the
> changes to the actual table as Robert suggested below.
>
> I walked in detail through a technique to do that in the last PDC. You can
> see the talk and get the sample code here:
> http://blogs.msdn.com/dataaccess/archive/2006/01/09/510083.aspx
>
> btw - that sample code also contains a very small query processor that
> shows how to do queries (including joins) over DataSets :)
>
> --
> Pablo Castro
> Program Manager - ADO.NET Team
> Microsoft Corp.
>
> This posting is provided "AS IS" with no warranties, and confers no
> rights.
>
>
> "Robert Simpson" <rmsimpson@noemail.noemail> wrote in message
> news:et3PfB0MGHA.1032@TK2MSFTNGP11.phx.gbl...
>> The fastest thing you can possibly do is bulk insert all the data into a
>> temporary table, and then merge both tables together afterwards.
>>
>> "Matt Fiddelke" <Matt Fidde***@discussions.microsoft.com> wrote in
>> message news:E9E4ACAD-7D57-4033-BBBA-207C5FABEF16@microsoft.com...
>>> Not sure that is going to work for me.  What I'm developing is a generic
>>> application to Load data files (semicolon delimited files) into our
>>> tables.
>>> Our datafiles contain BOTH rows to insert and update intermingled.  From
>>> what
>>> I know BCP will not load if there are updates (same as bulk insert)
>>> Another
>>> problem with this idea is that these programs need to run once the data
>>> files
>>> become available.  Not on a timed schedule.  When our data file shows up
>>> then
>>> it will run my program which will insert/update the data from the file
>>> to the
>>> database for whichever table/datafile becomes available.  Maybe i'm
>>> totally
>>> missing your point.  If so please correct me.
>>> Thanks
>>>
>>> "Li-fan Chen" wrote:
>>>
>>>> Hi Fiddel3742,
>>>>
>>>> A DTS Package might serve your interest. You will have to add a DTS
>>>> Task
>>>> pre- and post- work to remove and add the indexes as needed.
>>>>
>>>> Another way is to import the table using DTS/BCP, and then doing a set
>>>> join
>>>> to produce a new table. Again this can be done from within DTS to add
>>>> ease
>>>> of customization for future jobs.
>>>>
>>>> Best personal regards,
>>>> -- Li-fan
>>>>
>>>> --
>>>> Li-fan Chen
>>>> Software analyst/developer, Entrepreneur
>>>> Markham, Ontario, Canada
>>>> "Fiddelm3742" <Fiddelm3***@discussions.microsoft.com> wrote in message
>>>> news:56C0144C-AEBC-4F63-9078-9D07F51010E7@microsoft.com...
>>>> > Well here is what I want to do.  I'm writting an aplication that will
>>>> > run
>>>> > through a data file, and either insert the records if they do not
>>>> > exist in
>>>> > the database, or update them if they do exist.  I'm
>>>> > inserting/updating
>>>> > over
>>>> > 100,000 records.  The way I'm currently doing this is creating a temp
>>>> > table,
>>>> > Checking if a row exists in the Actual database, if it does I am
>>>> > bringing
>>>> > it
>>>> > to my temp table via a Fill.  I'm then running a
>>>> > LoadDataRow(strTokens,false); to add the row to my temporary table.
>>>> > If
>>>> > it's
>>>> > an insert I'm just directly running LoadDataRow etc. to add it to my
>>>> > table.
>>>> > Once all of my data has been added/updated to my temp table i run,
>>>> > the
>>>> > table
>>>> > update.
>>>> > da.Update(dsInfo,strLoadTable);
>>>> >
>>>> > This process takes a long time to run.  Is there a more efficient way
>>>> > i
>>>> > should be doing this?
>>>> > --
>>>> > www.Fiddelke.org
>>>>
>>>>
>>>>
>>
>>
>
>
Author
17 Feb 2006 5:50 PM
Fiddelm3742
Interesting approach that I hadn't looked at yet.  Probably because I'm not
on ADO.Net 2.0 as of yet.  That means I'll be making a quick trip into
managments office to make my plea for it ;)
Also, Great examples/presentation.
Thanks for all of the advice, all of you!
Matt Fiddelke
Show quote
"Pablo Castro [MS]" wrote:

> If you're using ADO.NET 2.0, you can use the SqlBulkCopy class to
> bulk-insert rows into SQL Server. If you also have updates, you can copy
> updated values to a temp table and then do a set-based update to move the
> changes to the actual table as Robert suggested below.
>
> I walked in detail through a technique to do that in the last PDC. You can
> see the talk and get the sample code here:
> http://blogs.msdn.com/dataaccess/archive/2006/01/09/510083.aspx
>
> btw - that sample code also contains a very small query processor that shows
> how to do queries (including joins) over DataSets :)
>
> --
> Pablo Castro
> Program Manager - ADO.NET Team
> Microsoft Corp.
>
> This posting is provided "AS IS" with no warranties, and confers no rights.
>
>
> "Robert Simpson" <rmsimpson@noemail.noemail> wrote in message
> news:et3PfB0MGHA.1032@TK2MSFTNGP11.phx.gbl...
> > The fastest thing you can possibly do is bulk insert all the data into a
> > temporary table, and then merge both tables together afterwards.
> >
> > "Matt Fiddelke" <Matt Fidde***@discussions.microsoft.com> wrote in message
> > news:E9E4ACAD-7D57-4033-BBBA-207C5FABEF16@microsoft.com...
> >> Not sure that is going to work for me.  What I'm developing is a generic
> >> application to Load data files (semicolon delimited files) into our
> >> tables.
> >> Our datafiles contain BOTH rows to insert and update intermingled.  From
> >> what
> >> I know BCP will not load if there are updates (same as bulk insert)
> >> Another
> >> problem with this idea is that these programs need to run once the data
> >> files
> >> become available.  Not on a timed schedule.  When our data file shows up
> >> then
> >> it will run my program which will insert/update the data from the file to
> >> the
> >> database for whichever table/datafile becomes available.  Maybe i'm
> >> totally
> >> missing your point.  If so please correct me.
> >> Thanks
> >>
> >> "Li-fan Chen" wrote:
> >>
> >>> Hi Fiddel3742,
> >>>
> >>> A DTS Package might serve your interest. You will have to add a DTS Task
> >>> pre- and post- work to remove and add the indexes as needed.
> >>>
> >>> Another way is to import the table using DTS/BCP, and then doing a set
> >>> join
> >>> to produce a new table. Again this can be done from within DTS to add
> >>> ease
> >>> of customization for future jobs.
> >>>
> >>> Best personal regards,
> >>> -- Li-fan
> >>>
> >>> --
> >>> Li-fan Chen
> >>> Software analyst/developer, Entrepreneur
> >>> Markham, Ontario, Canada
> >>> "Fiddelm3742" <Fiddelm3***@discussions.microsoft.com> wrote in message
> >>> news:56C0144C-AEBC-4F63-9078-9D07F51010E7@microsoft.com...
> >>> > Well here is what I want to do.  I'm writting an aplication that will
> >>> > run
> >>> > through a data file, and either insert the records if they do not
> >>> > exist in
> >>> > the database, or update them if they do exist.  I'm inserting/updating
> >>> > over
> >>> > 100,000 records.  The way I'm currently doing this is creating a temp
> >>> > table,
> >>> > Checking if a row exists in the Actual database, if it does I am
> >>> > bringing
> >>> > it
> >>> > to my temp table via a Fill.  I'm then running a
> >>> > LoadDataRow(strTokens,false); to add the row to my temporary table.
> >>> > If
> >>> > it's
> >>> > an insert I'm just directly running LoadDataRow etc. to add it to my
> >>> > table.
> >>> > Once all of my data has been added/updated to my temp table i run, the
> >>> > table
> >>> > update.
> >>> > da.Update(dsInfo,strLoadTable);
> >>> >
> >>> > This process takes a long time to run.  Is there a more efficient way
> >>> > i
> >>> > should be doing this?
> >>> > --
> >>> > www.Fiddelke.org
> >>>
> >>>
> >>>
> >
> >
>
>
>

AddThis Social Bookmark Button