|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
What is the quickest/most efficient way to insert 100,000 RowsWell here is what I want to do. I'm writting an aplication that will run
through a data file, and either insert the records if they do not exist in the database, or update them if they do exist. I'm inserting/updating over 100,000 records. The way I'm currently doing this is creating a temp table, Checking if a row exists in the Actual database, if it does I am bringing it to my temp table via a Fill. I'm then running a LoadDataRow(strTokens,false); to add the row to my temporary table. If it's an insert I'm just directly running LoadDataRow etc. to add it to my table. Once all of my data has been added/updated to my temp table i run, the table update. da.Update(dsInfo,strLoadTable); This process takes a long time to run. Is there a more efficient way i should be doing this? Hi Fiddel3742,
A DTS Package might serve your interest. You will have to add a DTS Task pre- and post- work to remove and add the indexes as needed. Another way is to import the table using DTS/BCP, and then doing a set join to produce a new table. Again this can be done from within DTS to add ease of customization for future jobs. Best personal regards, -- Li-fan -- Show quoteLi-fan Chen Software analyst/developer, Entrepreneur Markham, Ontario, Canada "Fiddelm3742" <Fiddelm3***@discussions.microsoft.com> wrote in message news:56C0144C-AEBC-4F63-9078-9D07F51010E7@microsoft.com... > Well here is what I want to do. I'm writting an aplication that will run > through a data file, and either insert the records if they do not exist in > the database, or update them if they do exist. I'm inserting/updating > over > 100,000 records. The way I'm currently doing this is creating a temp > table, > Checking if a row exists in the Actual database, if it does I am bringing > it > to my temp table via a Fill. I'm then running a > LoadDataRow(strTokens,false); to add the row to my temporary table. If > it's > an insert I'm just directly running LoadDataRow etc. to add it to my > table. > Once all of my data has been added/updated to my temp table i run, the > table > update. > da.Update(dsInfo,strLoadTable); > > This process takes a long time to run. Is there a more efficient way i > should be doing this? > -- > www.Fiddelke.org Not sure that is going to work for me. What I'm developing is a generic
application to Load data files (semicolon delimited files) into our tables. Our datafiles contain BOTH rows to insert and update intermingled. From what I know BCP will not load if there are updates (same as bulk insert) Another problem with this idea is that these programs need to run once the data files become available. Not on a timed schedule. When our data file shows up then it will run my program which will insert/update the data from the file to the database for whichever table/datafile becomes available. Maybe i'm totally missing your point. If so please correct me. Thanks Show quote "Li-fan Chen" wrote: > Hi Fiddel3742, > > A DTS Package might serve your interest. You will have to add a DTS Task > pre- and post- work to remove and add the indexes as needed. > > Another way is to import the table using DTS/BCP, and then doing a set join > to produce a new table. Again this can be done from within DTS to add ease > of customization for future jobs. > > Best personal regards, > -- Li-fan > > -- > Li-fan Chen > Software analyst/developer, Entrepreneur > Markham, Ontario, Canada > "Fiddelm3742" <Fiddelm3***@discussions.microsoft.com> wrote in message > news:56C0144C-AEBC-4F63-9078-9D07F51010E7@microsoft.com... > > Well here is what I want to do. I'm writting an aplication that will run > > through a data file, and either insert the records if they do not exist in > > the database, or update them if they do exist. I'm inserting/updating > > over > > 100,000 records. The way I'm currently doing this is creating a temp > > table, > > Checking if a row exists in the Actual database, if it does I am bringing > > it > > to my temp table via a Fill. I'm then running a > > LoadDataRow(strTokens,false); to add the row to my temporary table. If > > it's > > an insert I'm just directly running LoadDataRow etc. to add it to my > > table. > > Once all of my data has been added/updated to my temp table i run, the > > table > > update. > > da.Update(dsInfo,strLoadTable); > > > > This process takes a long time to run. Is there a more efficient way i > > should be doing this? > > -- > > www.Fiddelke.org > > > The fastest thing you can possibly do is bulk insert all the data into a
temporary table, and then merge both tables together afterwards. Show quote "Matt Fiddelke" <Matt Fidde***@discussions.microsoft.com> wrote in message news:E9E4ACAD-7D57-4033-BBBA-207C5FABEF16@microsoft.com... > Not sure that is going to work for me. What I'm developing is a generic > application to Load data files (semicolon delimited files) into our > tables. > Our datafiles contain BOTH rows to insert and update intermingled. From > what > I know BCP will not load if there are updates (same as bulk insert) > Another > problem with this idea is that these programs need to run once the data > files > become available. Not on a timed schedule. When our data file shows up > then > it will run my program which will insert/update the data from the file to > the > database for whichever table/datafile becomes available. Maybe i'm > totally > missing your point. If so please correct me. > Thanks > > "Li-fan Chen" wrote: > >> Hi Fiddel3742, >> >> A DTS Package might serve your interest. You will have to add a DTS Task >> pre- and post- work to remove and add the indexes as needed. >> >> Another way is to import the table using DTS/BCP, and then doing a set >> join >> to produce a new table. Again this can be done from within DTS to add >> ease >> of customization for future jobs. >> >> Best personal regards, >> -- Li-fan >> >> -- >> Li-fan Chen >> Software analyst/developer, Entrepreneur >> Markham, Ontario, Canada >> "Fiddelm3742" <Fiddelm3***@discussions.microsoft.com> wrote in message >> news:56C0144C-AEBC-4F63-9078-9D07F51010E7@microsoft.com... >> > Well here is what I want to do. I'm writting an aplication that will >> > run >> > through a data file, and either insert the records if they do not exist >> > in >> > the database, or update them if they do exist. I'm inserting/updating >> > over >> > 100,000 records. The way I'm currently doing this is creating a temp >> > table, >> > Checking if a row exists in the Actual database, if it does I am >> > bringing >> > it >> > to my temp table via a Fill. I'm then running a >> > LoadDataRow(strTokens,false); to add the row to my temporary table. If >> > it's >> > an insert I'm just directly running LoadDataRow etc. to add it to my >> > table. >> > Once all of my data has been added/updated to my temp table i run, the >> > table >> > update. >> > da.Update(dsInfo,strLoadTable); >> > >> > This process takes a long time to run. Is there a more efficient way i >> > should be doing this? >> > -- >> > www.Fiddelke.org >> >> >> If you're using ADO.NET 2.0, you can use the SqlBulkCopy class to
bulk-insert rows into SQL Server. If you also have updates, you can copy updated values to a temp table and then do a set-based update to move the changes to the actual table as Robert suggested below. I walked in detail through a technique to do that in the last PDC. You can see the talk and get the sample code here: http://blogs.msdn.com/dataaccess/archive/2006/01/09/510083.aspx btw - that sample code also contains a very small query processor that shows how to do queries (including joins) over DataSets :) -- Show quotePablo Castro Program Manager - ADO.NET Team Microsoft Corp. This posting is provided "AS IS" with no warranties, and confers no rights. "Robert Simpson" <rmsimpson@noemail.noemail> wrote in message news:et3PfB0MGHA.1032@TK2MSFTNGP11.phx.gbl... > The fastest thing you can possibly do is bulk insert all the data into a > temporary table, and then merge both tables together afterwards. > > "Matt Fiddelke" <Matt Fidde***@discussions.microsoft.com> wrote in message > news:E9E4ACAD-7D57-4033-BBBA-207C5FABEF16@microsoft.com... >> Not sure that is going to work for me. What I'm developing is a generic >> application to Load data files (semicolon delimited files) into our >> tables. >> Our datafiles contain BOTH rows to insert and update intermingled. From >> what >> I know BCP will not load if there are updates (same as bulk insert) >> Another >> problem with this idea is that these programs need to run once the data >> files >> become available. Not on a timed schedule. When our data file shows up >> then >> it will run my program which will insert/update the data from the file to >> the >> database for whichever table/datafile becomes available. Maybe i'm >> totally >> missing your point. If so please correct me. >> Thanks >> >> "Li-fan Chen" wrote: >> >>> Hi Fiddel3742, >>> >>> A DTS Package might serve your interest. You will have to add a DTS Task >>> pre- and post- work to remove and add the indexes as needed. >>> >>> Another way is to import the table using DTS/BCP, and then doing a set >>> join >>> to produce a new table. Again this can be done from within DTS to add >>> ease >>> of customization for future jobs. >>> >>> Best personal regards, >>> -- Li-fan >>> >>> -- >>> Li-fan Chen >>> Software analyst/developer, Entrepreneur >>> Markham, Ontario, Canada >>> "Fiddelm3742" <Fiddelm3***@discussions.microsoft.com> wrote in message >>> news:56C0144C-AEBC-4F63-9078-9D07F51010E7@microsoft.com... >>> > Well here is what I want to do. I'm writting an aplication that will >>> > run >>> > through a data file, and either insert the records if they do not >>> > exist in >>> > the database, or update them if they do exist. I'm inserting/updating >>> > over >>> > 100,000 records. The way I'm currently doing this is creating a temp >>> > table, >>> > Checking if a row exists in the Actual database, if it does I am >>> > bringing >>> > it >>> > to my temp table via a Fill. I'm then running a >>> > LoadDataRow(strTokens,false); to add the row to my temporary table. >>> > If >>> > it's >>> > an insert I'm just directly running LoadDataRow etc. to add it to my >>> > table. >>> > Once all of my data has been added/updated to my temp table i run, the >>> > table >>> > update. >>> > da.Update(dsInfo,strLoadTable); >>> > >>> > This process takes a long time to run. Is there a more efficient way >>> > i >>> > should be doing this? >>> > -- >>> > www.Fiddelke.org >>> >>> >>> > > Good call Pablo,
Just to clarify, the solution I proposed is basically the same as Pablos but intended for older SQL Servers. The whole exercise of building a table using a set JOIN with the updated value is in the interest of uptime. Master Table = 50,000,000 records Updates = 2,750,000 records You don't want to do a set update if you are serving Master Table to the web. You make join of the Updates table and the Master table and make an off-line table from it. And then you swap the online Master with the new updated off-line Master. Fast. Everyone wins. Am I wrong? Please let me know. If you are running this at 4am, it's a small update, and no one will miss the site? Do what Pablo showed you. Best personal regards, -- Li-fan -- Show quoteLi-fan Chen Software analyst/developer, Entrepreneur Markham, Ontario, Canada "Pablo Castro [MS]" <pablo***@online.microsoft.com> wrote in message news:Ovfojw0MGHA.3064@TK2MSFTNGP10.phx.gbl... > If you're using ADO.NET 2.0, you can use the SqlBulkCopy class to > bulk-insert rows into SQL Server. If you also have updates, you can copy > updated values to a temp table and then do a set-based update to move the > changes to the actual table as Robert suggested below. > > I walked in detail through a technique to do that in the last PDC. You can > see the talk and get the sample code here: > http://blogs.msdn.com/dataaccess/archive/2006/01/09/510083.aspx > > btw - that sample code also contains a very small query processor that > shows how to do queries (including joins) over DataSets :) > > -- > Pablo Castro > Program Manager - ADO.NET Team > Microsoft Corp. > > This posting is provided "AS IS" with no warranties, and confers no > rights. > > > "Robert Simpson" <rmsimpson@noemail.noemail> wrote in message > news:et3PfB0MGHA.1032@TK2MSFTNGP11.phx.gbl... >> The fastest thing you can possibly do is bulk insert all the data into a >> temporary table, and then merge both tables together afterwards. >> >> "Matt Fiddelke" <Matt Fidde***@discussions.microsoft.com> wrote in >> message news:E9E4ACAD-7D57-4033-BBBA-207C5FABEF16@microsoft.com... >>> Not sure that is going to work for me. What I'm developing is a generic >>> application to Load data files (semicolon delimited files) into our >>> tables. >>> Our datafiles contain BOTH rows to insert and update intermingled. From >>> what >>> I know BCP will not load if there are updates (same as bulk insert) >>> Another >>> problem with this idea is that these programs need to run once the data >>> files >>> become available. Not on a timed schedule. When our data file shows up >>> then >>> it will run my program which will insert/update the data from the file >>> to the >>> database for whichever table/datafile becomes available. Maybe i'm >>> totally >>> missing your point. If so please correct me. >>> Thanks >>> >>> "Li-fan Chen" wrote: >>> >>>> Hi Fiddel3742, >>>> >>>> A DTS Package might serve your interest. You will have to add a DTS >>>> Task >>>> pre- and post- work to remove and add the indexes as needed. >>>> >>>> Another way is to import the table using DTS/BCP, and then doing a set >>>> join >>>> to produce a new table. Again this can be done from within DTS to add >>>> ease >>>> of customization for future jobs. >>>> >>>> Best personal regards, >>>> -- Li-fan >>>> >>>> -- >>>> Li-fan Chen >>>> Software analyst/developer, Entrepreneur >>>> Markham, Ontario, Canada >>>> "Fiddelm3742" <Fiddelm3***@discussions.microsoft.com> wrote in message >>>> news:56C0144C-AEBC-4F63-9078-9D07F51010E7@microsoft.com... >>>> > Well here is what I want to do. I'm writting an aplication that will >>>> > run >>>> > through a data file, and either insert the records if they do not >>>> > exist in >>>> > the database, or update them if they do exist. I'm >>>> > inserting/updating >>>> > over >>>> > 100,000 records. The way I'm currently doing this is creating a temp >>>> > table, >>>> > Checking if a row exists in the Actual database, if it does I am >>>> > bringing >>>> > it >>>> > to my temp table via a Fill. I'm then running a >>>> > LoadDataRow(strTokens,false); to add the row to my temporary table. >>>> > If >>>> > it's >>>> > an insert I'm just directly running LoadDataRow etc. to add it to my >>>> > table. >>>> > Once all of my data has been added/updated to my temp table i run, >>>> > the >>>> > table >>>> > update. >>>> > da.Update(dsInfo,strLoadTable); >>>> > >>>> > This process takes a long time to run. Is there a more efficient way >>>> > i >>>> > should be doing this? >>>> > -- >>>> > www.Fiddelke.org >>>> >>>> >>>> >> >> > > Interesting approach that I hadn't looked at yet. Probably because I'm not
on ADO.Net 2.0 as of yet. That means I'll be making a quick trip into managments office to make my plea for it ;) Also, Great examples/presentation. Thanks for all of the advice, all of you! Matt Fiddelke Show quote "Pablo Castro [MS]" wrote: > If you're using ADO.NET 2.0, you can use the SqlBulkCopy class to > bulk-insert rows into SQL Server. If you also have updates, you can copy > updated values to a temp table and then do a set-based update to move the > changes to the actual table as Robert suggested below. > > I walked in detail through a technique to do that in the last PDC. You can > see the talk and get the sample code here: > http://blogs.msdn.com/dataaccess/archive/2006/01/09/510083.aspx > > btw - that sample code also contains a very small query processor that shows > how to do queries (including joins) over DataSets :) > > -- > Pablo Castro > Program Manager - ADO.NET Team > Microsoft Corp. > > This posting is provided "AS IS" with no warranties, and confers no rights. > > > "Robert Simpson" <rmsimpson@noemail.noemail> wrote in message > news:et3PfB0MGHA.1032@TK2MSFTNGP11.phx.gbl... > > The fastest thing you can possibly do is bulk insert all the data into a > > temporary table, and then merge both tables together afterwards. > > > > "Matt Fiddelke" <Matt Fidde***@discussions.microsoft.com> wrote in message > > news:E9E4ACAD-7D57-4033-BBBA-207C5FABEF16@microsoft.com... > >> Not sure that is going to work for me. What I'm developing is a generic > >> application to Load data files (semicolon delimited files) into our > >> tables. > >> Our datafiles contain BOTH rows to insert and update intermingled. From > >> what > >> I know BCP will not load if there are updates (same as bulk insert) > >> Another > >> problem with this idea is that these programs need to run once the data > >> files > >> become available. Not on a timed schedule. When our data file shows up > >> then > >> it will run my program which will insert/update the data from the file to > >> the > >> database for whichever table/datafile becomes available. Maybe i'm > >> totally > >> missing your point. If so please correct me. > >> Thanks > >> > >> "Li-fan Chen" wrote: > >> > >>> Hi Fiddel3742, > >>> > >>> A DTS Package might serve your interest. You will have to add a DTS Task > >>> pre- and post- work to remove and add the indexes as needed. > >>> > >>> Another way is to import the table using DTS/BCP, and then doing a set > >>> join > >>> to produce a new table. Again this can be done from within DTS to add > >>> ease > >>> of customization for future jobs. > >>> > >>> Best personal regards, > >>> -- Li-fan > >>> > >>> -- > >>> Li-fan Chen > >>> Software analyst/developer, Entrepreneur > >>> Markham, Ontario, Canada > >>> "Fiddelm3742" <Fiddelm3***@discussions.microsoft.com> wrote in message > >>> news:56C0144C-AEBC-4F63-9078-9D07F51010E7@microsoft.com... > >>> > Well here is what I want to do. I'm writting an aplication that will > >>> > run > >>> > through a data file, and either insert the records if they do not > >>> > exist in > >>> > the database, or update them if they do exist. I'm inserting/updating > >>> > over > >>> > 100,000 records. The way I'm currently doing this is creating a temp > >>> > table, > >>> > Checking if a row exists in the Actual database, if it does I am > >>> > bringing > >>> > it > >>> > to my temp table via a Fill. I'm then running a > >>> > LoadDataRow(strTokens,false); to add the row to my temporary table. > >>> > If > >>> > it's > >>> > an insert I'm just directly running LoadDataRow etc. to add it to my > >>> > table. > >>> > Once all of my data has been added/updated to my temp table i run, the > >>> > table > >>> > update. > >>> > da.Update(dsInfo,strLoadTable); > >>> > > >>> > This process takes a long time to run. Is there a more efficient way > >>> > i > >>> > should be doing this? > >>> > -- > >>> > www.Fiddelke.org > >>> > >>> > >>> > > > > > > > |
|||||||||||||||||||||||