|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
DataAdapter.Update performance?I have a major performance problem using a SqlDataAdapter's Update() to record a datatable into our database. And it's no easy feat. We basically record data for about 2-3 minutes, generating about 4000 records on a particular table. When it comes time to update the dataset, it can take between 1-2 minutes for the Update() routine to run. The table contains only three fields (unique id of parent, position, and data value). In the database, we can already have on the order of 4 million records or so, easily. the unique id of parent field has an index on it. Basically, I'm stuck on how to solve this. One option would be to log each piece of data as it appears, but we're optimizing that to the millisecond because this is dealing with radiation detection and our process loop needs to be as spry as possible. Without an index on the parent ID, it noticeably decreases, but then retrieving the data becomes problematic when it searches through 5 million records for a set of 36 matching a particular id. We need it quick on the processing/reporting end too. So, I'm stuck. Does anybody have any suggestions on how to optimize something like this? We'd like it to be under 10 seconds, and even that may be annoying in the field. Appreciate any help I can get. Are you using .NET 2.0? Try using a bulk insert.
Cheers, Greg Young MVP - C# http://codebetter.com/blogs/gregyoung Show quote "David Harris" <DavidHar***@discussions.microsoft.com> wrote in message news:AF4BD2C0-A2F5-4F48-BE40-2D693097C264@microsoft.com... > Please redirect me to a better newsgroup if this one isn't correct. > > I have a major performance problem using a SqlDataAdapter's Update() to > record a datatable into our database. And it's no easy feat. We basically > record data for about 2-3 minutes, generating about 4000 records on a > particular table. When it comes time to update the dataset, it can take > between 1-2 minutes for the Update() routine to run. The table contains > only > three fields (unique id of parent, position, and data value). In the > database, we can already have on the order of 4 million records or so, > easily. the unique id of parent field has an index on it. > > Basically, I'm stuck on how to solve this. One option would be to log each > piece of data as it appears, but we're optimizing that to the millisecond > because this is dealing with radiation detection and our process loop > needs > to be as spry as possible. Without an index on the parent ID, it > noticeably > decreases, but then retrieving the data becomes problematic when it > searches > through 5 million records for a set of 36 matching a particular id. We > need > it quick on the processing/reporting end too. > > So, I'm stuck. Does anybody have any suggestions on how to optimize > something like this? We'd like it to be under 10 seconds, and even that > may > be annoying in the field. Appreciate any help I can get. I'm sorry, I should have specified. We're forced to use the 1.1 framework for
now. Only have Visual Studio 2003. We may be upgrading in about 5-6 months, but this one can't wait that long... Show quote "Greg Young" wrote: > Are you using .NET 2.0? Try using a bulk insert. > > Cheers, > > Greg Young > MVP - C# > http://codebetter.com/blogs/gregyoung Then you will have problems ... it is not indexing etc as suggested but the
sheer number of round trips being used. For 1.x I would recommend a strategy similar to http://www.codeproject.com/cs/database/generic_OpenXml.asp There is example code there. If this is too slow you will probably have to write out a file and use bulk insert from the db. Cheers, Greg Young MVP - C# http://codebetter.com/blogs/gregyoung Show quote "David Harris" <DavidHar***@discussions.microsoft.com> wrote in message news:8DA83289-3A1E-4992-A567-059962A7EE97@microsoft.com... > I'm sorry, I should have specified. We're forced to use the 1.1 framework > for > now. Only have Visual Studio 2003. We may be upgrading in about 5-6 > months, > but this one can't wait that long... > > "Greg Young" wrote: > >> Are you using .NET 2.0? Try using a bulk insert. >> >> Cheers, >> >> Greg Young >> MVP - C# >> http://codebetter.com/blogs/gregyoung > Actually something I just read about was to also try putting things in a
transaction as will batch all of the transaction log writes until the transaction is completed. Info from http://www.eggheadcafe.com/articles/20050405.asp Cheers, Greg Show quote "David Harris" <DavidHar***@discussions.microsoft.com> wrote in message news:8DA83289-3A1E-4992-A567-059962A7EE97@microsoft.com... > I'm sorry, I should have specified. We're forced to use the 1.1 framework > for > now. Only have Visual Studio 2003. We may be upgrading in about 5-6 > months, > but this one can't wait that long... > > "Greg Young" wrote: > >> Are you using .NET 2.0? Try using a bulk insert. >> >> Cheers, >> >> Greg Young >> MVP - C# >> http://codebetter.com/blogs/gregyoung > David -
You are bottlenecking on your index. Try sorting the records before inserting them into the database. This could improve the cache coherency of the index's B+-tree pages. It sounds like your having lots of cache misses on the index to me. You probably need a lot more memory so that the database can cache more index pages. Look at how B+-trees are implemented, because in all likelihood this is what your index is. At 5 million records, you've got several levels of keys. At some point, each add requires several physical disk seeks which take a few milliseconds to complete. If your cache is so small that it has to seek over 5 levels, which is entirely likely, do the math: 4000 * 6ms * 5 = 120 seconds. David Harris wrote: Show quote > Please redirect me to a better newsgroup if this one isn't correct. > > I have a major performance problem using a SqlDataAdapter's Update() to > record a datatable into our database. And it's no easy feat. We basically > record data for about 2-3 minutes, generating about 4000 records on a > particular table. When it comes time to update the dataset, it can take > between 1-2 minutes for the Update() routine to run. The table contains only > three fields (unique id of parent, position, and data value). In the > database, we can already have on the order of 4 million records or so, > easily. the unique id of parent field has an index on it. > > Basically, I'm stuck on how to solve this. One option would be to log each > piece of data as it appears, but we're optimizing that to the millisecond > because this is dealing with radiation detection and our process loop needs > to be as spry as possible. Without an index on the parent ID, it noticeably > decreases, but then retrieving the data becomes problematic when it searches > through 5 million records for a set of 36 matching a particular id. We need > it quick on the processing/reporting end too. > > So, I'm stuck. Does anybody have any suggestions on how to optimize > something like this? We'd like it to be under 10 seconds, and even that may > be annoying in the field. Appreciate any help I can get. Well, I redid the "with index, without index" test, with different results.
With an index, it takes about 3 more seconds on average than without. The data is already sorted as it is inserted by nature of how the data is taken. Thanks for the post though, it inspired me to go through and optimize all the indexes we currently have in the DB, which has given us several extra ms in speed in the data acquisition process. David Show quote "LaserMan" wrote: > David - > > You are bottlenecking on your index. Try sorting the records before > inserting them into the database. This could improve the cache > coherency of the index's B+-tree pages. > > It sounds like your having lots of cache misses on the index to me. You > probably need a lot more memory so that the database can cache more > index pages. > > Look at how B+-trees are implemented, because in all likelihood this is > what your index is. At 5 million records, you've got several levels of > keys. At some point, each add requires several physical disk seeks > which take a few milliseconds to complete. If your cache is so small > that it has to seek over 5 levels, which is entirely likely, do the > math: 4000 * 6ms * 5 = 120 seconds. David -
You are bottlenecking on your index. Try sorting the records before inserting them into the database. This could improve the cache coherency of the index's B+-tree pages. It sounds like your having lots of cache misses on the index to me. You probably need a lot more memory so that the database can cache more index pages. Look at how B+-trees are implemented, because in all likelihood this is what your index is. At 5 million records, you've got several levels of keys. At some point, each add requires several physical disk seeks which take a few milliseconds to complete. If your cache is so small that it has to seek over 5 levels, which is entirely likely, do the math: 4000 * 6ms * 5 = 120 seconds. David Harris wrote: Show quote > Please redirect me to a better newsgroup if this one isn't correct. > > I have a major performance problem using a SqlDataAdapter's Update() to > record a datatable into our database. And it's no easy feat. We basically > record data for about 2-3 minutes, generating about 4000 records on a > particular table. When it comes time to update the dataset, it can take > between 1-2 minutes for the Update() routine to run. The table contains only > three fields (unique id of parent, position, and data value). In the > database, we can already have on the order of 4 million records or so, > easily. the unique id of parent field has an index on it. > > Basically, I'm stuck on how to solve this. One option would be to log each > piece of data as it appears, but we're optimizing that to the millisecond > because this is dealing with radiation detection and our process loop needs > to be as spry as possible. Without an index on the parent ID, it noticeably > decreases, but then retrieving the data becomes problematic when it searches > through 5 million records for a set of 36 matching a particular id. We need > it quick on the processing/reporting end too. > > So, I'm stuck. Does anybody have any suggestions on how to optimize > something like this? We'd like it to be under 10 seconds, and even that may > be annoying in the field. Appreciate any help I can get.
Show quote
"David Harris" <DavidHar***@discussions.microsoft.com> wrote in message Forget about DataAdapter.Update and write your own code.news:AF4BD2C0-A2F5-4F48-BE40-2D693097C264@microsoft.com... > Please redirect me to a better newsgroup if this one isn't correct. > > I have a major performance problem using a SqlDataAdapter's Update() to > record a datatable into our database. And it's no easy feat. We basically > record data for about 2-3 minutes, generating about 4000 records on a > particular table. When it comes time to update the dataset, it can take > between 1-2 minutes for the Update() routine to run. The table contains > only > three fields (unique id of parent, position, and data value). In the > database, we can already have on the order of 4 million records or so, > easily. the unique id of parent field has an index on it. > > Basically, I'm stuck on how to solve this. One option would be to log each > piece of data as it appears, but we're optimizing that to the millisecond > because this is dealing with radiation detection and our process loop > needs > to be as spry as possible. Without an index on the parent ID, it > noticeably > decreases, but then retrieving the data becomes problematic when it > searches > through 5 million records for a set of 36 matching a particular id. We > need > it quick on the processing/reporting end too. > > So, I'm stuck. Does anybody have any suggestions on how to optimize > something like this? We'd like it to be under 10 seconds, and even that > may > be annoying in the field. Appreciate any help I can get. I've found that the best performance comes using SQL like this: -- assume the following table definition -- create table t(c1 int not null primary key, c2 int not null, c3 int not null); insert into t (c1,c2,c3) exec(' select 1 as c1, 2 as c2, 3 as c3; select 4,5,6; select 7,8,9; -- ... '); insert into t(c1,c2,c3) exec(' -- ... ); Batch up 10-20 rows in each insert/exec block, and 10-20 insert/exec blocks into a single SQL batch. Commit the transaction after each batch, unless you need the entire insert to go in one transaction. Experiment with different block/batch sizes to find the one that gives best performance for your table. You should easily be able to insert 4000 small records in just a few seconds. You ARE running on a full-strength version of SQL Server, right? (Standard Ed. or better over a network connection). -cd "Carl Daniel [VC++ MVP]" wrote: This may well be the route I take if Greg's links don't fan out. My first > Forget about DataAdapter.Update and write your own code. > <snip> language was C, so loop unrolling was an obvious optimization. I'd much rather have a "maintainable" solution though, if possible. Or rather, code that someone else could look at and know what it's doing easily. Or I might suck it up and comment. > -- create table t(c1 int not null primary key, c2 int not null, c3 int not Ok, the table I am using has no primary key. The table definition is > null); (ParentID varchar(60) not null, Position int not null, DataValue int not null). Would putting a primary key speed anything up at all? I'm mainly concerned with space, because this can easily grow to several hundred million records, which would be a lot of BigInts (scared to use ints in case we exceed 2 billion records). Our hard drive size is limited to 10GB total (see below for why). > You ARE running on a full-strength version of SQL Server, right? (Standard Unfortunately, no. We're stuck using SQL Express 2005, which has shown to be > Ed. or better over a network connection). fairly fast but obviously not like full strength. The reason for this is that we have 12-15 machines, each which need to be able to enter secure government facilities (we deal with radiation detection). Thus, they need their own data storage on the box itself. Our computers for it are handhelds currently limited to 10GB. I would love to have a network and a central repository, but most sites do not allow wireless or even infrared or bluetooth, much less actual networking. I currently have to use a sneakernet using a USB drive to sync data between the central server and the handheld machines. So, we don't have a lot working for us in all this. :) Hence half my frustration into trying to make a decent app under those conditions... :) Thanks though, if nothing else works I'll definitely be making custom batches.
Show quote
"David Harris" <DavidHar***@discussions.microsoft.com> wrote in message Adding a primary key wouldn't help at all. At best it would have no effect. news:A77DB242-4450-4D4D-B0EA-E691330570B5@microsoft.com... > "Carl Daniel [VC++ MVP]" wrote: >> Forget about DataAdapter.Update and write your own code. >> <snip> > > This may well be the route I take if Greg's links don't fan out. My first > language was C, so loop unrolling was an obvious optimization. I'd much > rather have a "maintainable" solution though, if possible. Or rather, code > that someone else could look at and know what it's doing easily. Or I > might > suck it up and comment. > >> -- create table t(c1 int not null primary key, c2 int not null, c3 int >> not >> null); > > Ok, the table I am using has no primary key. The table definition is > (ParentID varchar(60) not null, Position int not null, DataValue int not > null). Would putting a primary key speed anything up at all? At worst it would significantly slow things down. Show quote > I'm mainly Ouch! One thing you might want to check into (or likely have already?) - > concerned with space, because this can easily grow to several hundred > million > records, which would be a lot of BigInts (scared to use ints in case we > exceed 2 billion records). Our hard drive size is limited to 10GB total > (see > below for why). > >> You ARE running on a full-strength version of SQL Server, right? >> (Standard >> Ed. or better over a network connection). > > Unfortunately, no. We're stuck using SQL Express 2005, which has shown to > be > fairly fast but obviously not like full strength. The reason for this is > that > we have 12-15 machines, each which need to be able to enter secure > government > facilities (we deal with radiation detection). Thus, they need their own > data > storage on the box itself. Our computers for it are handhelds currently > limited to 10GB. I would love to have a network and a central repository, > but > most sites do not allow wireless or even infrared or bluetooth, much less > actual networking. I currently have to use a sneakernet using a USB drive > to > sync data between the central server and the handheld machines. So, we > don't > have a lot working for us in all this. :) Hence half my frustration into > trying to make a decent app under those conditions... :) abandon the database entirely for the data collection app and just write out flat files. You can then load the files into your central repository using a tool like bcp. > Good luck!> Thanks though, if nothing else works I'll definitely be making custom > batches. -cd |
|||||||||||||||||||||||