|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Memory usage increases with ADO.NET v2.0 RTMimprovement in test cases. Using the RTM version of Whidbey and the code from November 2005 MSDN article (http://msdn.microsoft.com/msdnmag/issues/05/11/DataPoints/default.aspx), I ran the same tests in my environment with ADO.NET 1.1 and 2.0. I'd like to raise as point that the memory usage is significantly higher (2.3x) than 2003 for loading the same data. Tested load of 1,000,000 rows using code from this article. Made two modifications, Unique = false (to speed up the ADO v1.1 load, since it takes 30 minutes), and a Console.ReadLine at the end. Results (using Process Explorer v9.25 for memory usage): ..NET v1.1 Time to load: 6.8s Mem Usage (Private, Working, Virtual) : 73,592K, 72,828K, 147,368K - - - - - - - ..NET v2.0 Time to load: 11.3s Mem usage: 168,220K, 161,264K, 241,792K When digging a little deeper using .NET Memory Profiler v2.5, I found these major differences: ADO.NET 1.1 (top 5 classes by Bytes): Class Total Instances Total Bytes ----------------------------------------------------- DataRow 500,000 20,000,000 Int32[] 10,293 8,717,856 Object[] 10,551 3,379,952 DataRow[] 2 3,145,760 ArrayListEnumerator... 20,530 497,720 ---------- 35,741,288 ADO.NET 2.0 (top 5 classes by Total Bytes): Class Total Instances Total Bytes ----------------------------------------------------- DataRow 500,000 32,000,000 RBTree<int>.Node[] 225 16,095,884 RBTree<DataRow>.Node[] 225 16,095,884 Int32[] 472 4,457,268 DataRow[] 2 2,097,184 ---------- 70,746,220 The instance size of DataRow has increased by 60% Introduced 2 new objects, RBTree. For the massive performance improvements, I'm sure these binary trees are necessary, and it appears they hold references to all the rows in the data set, as they are about 32 bytes in size for each instance of Node, and amount to a figure close enough to 500000 if you divide 16,095,884 by 32. Anyways, I just wanted to bring this up, as it could have an impact for some, if memory is tight. Cheers, Stuart Just wanted to add that the memory pressure in 2.0 was quite a bit higher:
ADO.NET v1.1: Gen #0 GCs: 39 Gen #1 GCs: 30 Gen #2 GCs: 3 ADO.NET v2.0: Gen #0 GCs: 269 Gen #1 GCs: 99 Gen #2 GCs: 4 Thoughts? Cheers, Stu Show quote "Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message news:%23RI5ri93FHA.3588@TK2MSFTNGP15.phx.gbl... > Firstly, the index performance improvement is awesome, I've seen a 75x > improvement in test cases. > > Using the RTM version of Whidbey and the code from November 2005 MSDN > article > (http://msdn.microsoft.com/msdnmag/issues/05/11/DataPoints/default.aspx), > I ran the same tests in my environment with ADO.NET 1.1 and 2.0. I'd like > to raise as point that the memory usage is significantly higher (2.3x) > than 2003 for loading the same data. > > Tested load of 1,000,000 rows using code from this article. Made two > modifications, Unique = false (to speed up the ADO v1.1 load, since it > takes 30 minutes), and a Console.ReadLine at the end. > > Results (using Process Explorer v9.25 for memory usage): > > .NET v1.1 > > Time to load: 6.8s > Mem Usage (Private, Working, Virtual) : 73,592K, 72,828K, 147,368K > > - - - - - - - > > .NET v2.0 > > Time to load: 11.3s > Mem usage: 168,220K, 161,264K, 241,792K > > > When digging a little deeper using .NET Memory Profiler v2.5, I found > these major differences: > > ADO.NET 1.1 (top 5 classes by Bytes): > > Class Total Instances Total Bytes > ----------------------------------------------------- > DataRow 500,000 20,000,000 > Int32[] 10,293 8,717,856 > Object[] 10,551 3,379,952 > DataRow[] 2 3,145,760 > ArrayListEnumerator... 20,530 497,720 > ---------- > 35,741,288 > > > ADO.NET 2.0 (top 5 classes by Total Bytes): > > Class Total Instances Total Bytes > ----------------------------------------------------- > DataRow 500,000 32,000,000 > RBTree<int>.Node[] 225 16,095,884 > RBTree<DataRow>.Node[] 225 16,095,884 > Int32[] 472 4,457,268 > DataRow[] 2 2,097,184 > ---------- > 70,746,220 > > > The instance size of DataRow has increased by 60% > > Introduced 2 new objects, RBTree. For the massive performance > improvements, I'm sure these binary trees are necessary, and it appears > they hold references to all the rows in the data set, as they are about 32 > bytes in size for each instance of Node, and amount to a figure close > enough to 500000 if you divide 16,095,884 by 32. > > Anyways, I just wanted to bring this up, as it could have an impact for > some, if memory is tight. > > Cheers, > > Stuart > The Collection mechanism inside ADO.NET 2.0 is much superior in terms of
performance - but to gain something you gotta give up something. Since it is a bit more sophisticated than Arraylist (as in .NET 1.1), it may result in a higher memory usage (it uses a Red black tree). Frankly, considering the benefits, I'd much rather go with the tradeoff. - Sahil Malik [MVP] ADO.NET 2.0 book - http://codebetter.com/blogs/sahil.malik/archive/2005/05/13/63199.aspx ------------------------------------------------------------------------------------------- Show quote "Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message news:%23Vo$Ba$3FHA.3244@tk2msftngp13.phx.gbl... > Just wanted to add that the memory pressure in 2.0 was quite a bit higher: > > ADO.NET v1.1: > > Gen #0 GCs: 39 > Gen #1 GCs: 30 > Gen #2 GCs: 3 > > > ADO.NET v2.0: > > Gen #0 GCs: 269 > Gen #1 GCs: 99 > Gen #2 GCs: 4 > > > Thoughts? > > Cheers, > > Stu > > "Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message > news:%23RI5ri93FHA.3588@TK2MSFTNGP15.phx.gbl... >> Firstly, the index performance improvement is awesome, I've seen a 75x >> improvement in test cases. >> >> Using the RTM version of Whidbey and the code from November 2005 MSDN >> article >> (http://msdn.microsoft.com/msdnmag/issues/05/11/DataPoints/default.aspx), >> I ran the same tests in my environment with ADO.NET 1.1 and 2.0. I'd >> like to raise as point that the memory usage is significantly higher >> (2.3x) than 2003 for loading the same data. >> >> Tested load of 1,000,000 rows using code from this article. Made two >> modifications, Unique = false (to speed up the ADO v1.1 load, since it >> takes 30 minutes), and a Console.ReadLine at the end. >> >> Results (using Process Explorer v9.25 for memory usage): >> >> .NET v1.1 >> >> Time to load: 6.8s >> Mem Usage (Private, Working, Virtual) : 73,592K, 72,828K, 147,368K >> >> - - - - - - - >> >> .NET v2.0 >> >> Time to load: 11.3s >> Mem usage: 168,220K, 161,264K, 241,792K >> >> >> When digging a little deeper using .NET Memory Profiler v2.5, I found >> these major differences: >> >> ADO.NET 1.1 (top 5 classes by Bytes): >> >> Class Total Instances Total Bytes >> ----------------------------------------------------- >> DataRow 500,000 20,000,000 >> Int32[] 10,293 8,717,856 >> Object[] 10,551 3,379,952 >> DataRow[] 2 3,145,760 >> ArrayListEnumerator... 20,530 497,720 >> ---------- >> 35,741,288 >> >> >> ADO.NET 2.0 (top 5 classes by Total Bytes): >> >> Class Total Instances Total Bytes >> ----------------------------------------------------- >> DataRow 500,000 32,000,000 >> RBTree<int>.Node[] 225 16,095,884 >> RBTree<DataRow>.Node[] 225 16,095,884 >> Int32[] 472 4,457,268 >> DataRow[] 2 2,097,184 >> ---------- >> 70,746,220 >> >> >> The instance size of DataRow has increased by 60% >> >> Introduced 2 new objects, RBTree. For the massive performance >> improvements, I'm sure these binary trees are necessary, and it appears >> they hold references to all the rows in the data set, as they are about >> 32 bytes in size for each instance of Node, and amount to a figure close >> enough to 500000 if you divide 16,095,884 by 32. >> >> Anyways, I just wanted to bring this up, as it could have an impact for >> some, if memory is tight. >> >> Cheers, >> >> Stuart >> > > Don't get me wrong, I am not complaining of the additional memory usage - I
too am in favour of the performance and understand the trade-offs, which I clearly point out in my second last paragraph, by referencing the fact a new RBTree structure is used, for performance reasons. I am merely raising the point that for people working with large datasets, they will potentially see increased memory usage, and I've provided them a first place to look. Cheers, Stuart Show quote "Sahil Malik [MVP]" <contactmethrumyblog@nospam.com> wrote in message news:O4GJabI4FHA.2364@TK2MSFTNGP12.phx.gbl... > The Collection mechanism inside ADO.NET 2.0 is much superior in terms of > performance - but to gain something you gotta give up something. Since it > is a bit more sophisticated than Arraylist (as in .NET 1.1), it may result > in a higher memory usage (it uses a Red black tree). > > Frankly, considering the benefits, I'd much rather go with the tradeoff. > > - Sahil Malik [MVP] > ADO.NET 2.0 book - > http://codebetter.com/blogs/sahil.malik/archive/2005/05/13/63199.aspx > ------------------------------------------------------------------------------------------- > > > > "Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message > news:%23Vo$Ba$3FHA.3244@tk2msftngp13.phx.gbl... >> Just wanted to add that the memory pressure in 2.0 was quite a bit >> higher: >> >> ADO.NET v1.1: >> >> Gen #0 GCs: 39 >> Gen #1 GCs: 30 >> Gen #2 GCs: 3 >> >> >> ADO.NET v2.0: >> >> Gen #0 GCs: 269 >> Gen #1 GCs: 99 >> Gen #2 GCs: 4 >> >> >> Thoughts? >> >> Cheers, >> >> Stu >> >> "Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message >> news:%23RI5ri93FHA.3588@TK2MSFTNGP15.phx.gbl... >>> Firstly, the index performance improvement is awesome, I've seen a 75x >>> improvement in test cases. >>> >>> Using the RTM version of Whidbey and the code from November 2005 MSDN >>> article >>> (http://msdn.microsoft.com/msdnmag/issues/05/11/DataPoints/default.aspx), >>> I ran the same tests in my environment with ADO.NET 1.1 and 2.0. I'd >>> like to raise as point that the memory usage is significantly higher >>> (2.3x) than 2003 for loading the same data. >>> >>> Tested load of 1,000,000 rows using code from this article. Made two >>> modifications, Unique = false (to speed up the ADO v1.1 load, since it >>> takes 30 minutes), and a Console.ReadLine at the end. >>> >>> Results (using Process Explorer v9.25 for memory usage): >>> >>> .NET v1.1 >>> >>> Time to load: 6.8s >>> Mem Usage (Private, Working, Virtual) : 73,592K, 72,828K, 147,368K >>> >>> - - - - - - - >>> >>> .NET v2.0 >>> >>> Time to load: 11.3s >>> Mem usage: 168,220K, 161,264K, 241,792K >>> >>> >>> When digging a little deeper using .NET Memory Profiler v2.5, I found >>> these major differences: >>> >>> ADO.NET 1.1 (top 5 classes by Bytes): >>> >>> Class Total Instances Total Bytes >>> ----------------------------------------------------- >>> DataRow 500,000 20,000,000 >>> Int32[] 10,293 8,717,856 >>> Object[] 10,551 3,379,952 >>> DataRow[] 2 3,145,760 >>> ArrayListEnumerator... 20,530 497,720 >>> ---------- >>> 35,741,288 >>> >>> >>> ADO.NET 2.0 (top 5 classes by Total Bytes): >>> >>> Class Total Instances Total Bytes >>> ----------------------------------------------------- >>> DataRow 500,000 32,000,000 >>> RBTree<int>.Node[] 225 16,095,884 >>> RBTree<DataRow>.Node[] 225 16,095,884 >>> Int32[] 472 4,457,268 >>> DataRow[] 2 2,097,184 >>> ---------- >>> 70,746,220 >>> >>> >>> The instance size of DataRow has increased by 60% >>> >>> Introduced 2 new objects, RBTree. For the massive performance >>> improvements, I'm sure these binary trees are necessary, and it appears >>> they hold references to all the rows in the data set, as they are about >>> 32 bytes in size for each instance of Node, and amount to a figure close >>> enough to 500000 if you divide 16,095,884 by 32. >>> >>> Anyways, I just wanted to bring this up, as it could have an impact for >>> some, if memory is tight. >>> >>> Cheers, >>> >>> Stuart >>> >> >> > > Stuart,
Process Explorer is really not a good tool to measure memory usage. You should use one of the memory profilers out there instead. -- Show quoteMiha Markic [MVP C#] RightHand .NET consulting & development www.rthand.com Blog: http://cs.rthand.com/blogs/blog_with_righthand/ "Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message news:%23RI5ri93FHA.3588@TK2MSFTNGP15.phx.gbl... > Firstly, the index performance improvement is awesome, I've seen a 75x > improvement in test cases. > > Using the RTM version of Whidbey and the code from November 2005 MSDN > article > (http://msdn.microsoft.com/msdnmag/issues/05/11/DataPoints/default.aspx), > I ran the same tests in my environment with ADO.NET 1.1 and 2.0. I'd like > to raise as point that the memory usage is significantly higher (2.3x) > than 2003 for loading the same data. > > Tested load of 1,000,000 rows using code from this article. Made two > modifications, Unique = false (to speed up the ADO v1.1 load, since it > takes 30 minutes), and a Console.ReadLine at the end. > > Results (using Process Explorer v9.25 for memory usage): > > .NET v1.1 > > Time to load: 6.8s > Mem Usage (Private, Working, Virtual) : 73,592K, 72,828K, 147,368K > > - - - - - - - > > .NET v2.0 > > Time to load: 11.3s > Mem usage: 168,220K, 161,264K, 241,792K > > > When digging a little deeper using .NET Memory Profiler v2.5, I found > these major differences: > > ADO.NET 1.1 (top 5 classes by Bytes): > > Class Total Instances Total Bytes > ----------------------------------------------------- > DataRow 500,000 20,000,000 > Int32[] 10,293 8,717,856 > Object[] 10,551 3,379,952 > DataRow[] 2 3,145,760 > ArrayListEnumerator... 20,530 497,720 > ---------- > 35,741,288 > > > ADO.NET 2.0 (top 5 classes by Total Bytes): > > Class Total Instances Total Bytes > ----------------------------------------------------- > DataRow 500,000 32,000,000 > RBTree<int>.Node[] 225 16,095,884 > RBTree<DataRow>.Node[] 225 16,095,884 > Int32[] 472 4,457,268 > DataRow[] 2 2,097,184 > ---------- > 70,746,220 > > > The instance size of DataRow has increased by 60% > > Introduced 2 new objects, RBTree. For the massive performance > improvements, I'm sure these binary trees are necessary, and it appears > they hold references to all the rows in the data set, as they are about 32 > bytes in size for each instance of Node, and amount to a figure close > enough to 500000 if you divide 16,095,884 by 32. > > Anyways, I just wanted to bring this up, as it could have an impact for > some, if memory is tight. > > Cheers, > > Stuart > Why would you say Process Explorer isn't a good indicator of overall process
memory usage? It shows an increase in Private and Working set bytes. If you read my entire email, I posted all the numbers from ".NET Memory Profiler v2.5" too (which is a .NET specific memory profiler tool), and internally the class usage shows a 2 fold increase in raw memory usage. Cheers, Stu Show quote "Miha Markic [MVP C#]" <miha at rthand com> wrote in message news:e5A8GkE4FHA.3684@TK2MSFTNGP10.phx.gbl... > Stuart, > > Process Explorer is really not a good tool to measure memory usage. You > should use one of the memory profilers out there instead. > > -- > Miha Markic [MVP C#] > RightHand .NET consulting & development www.rthand.com > Blog: http://cs.rthand.com/blogs/blog_with_righthand/ > > "Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message > news:%23RI5ri93FHA.3588@TK2MSFTNGP15.phx.gbl... >> Firstly, the index performance improvement is awesome, I've seen a 75x >> improvement in test cases. >> >> Using the RTM version of Whidbey and the code from November 2005 MSDN >> article >> (http://msdn.microsoft.com/msdnmag/issues/05/11/DataPoints/default.aspx), >> I ran the same tests in my environment with ADO.NET 1.1 and 2.0. I'd >> like to raise as point that the memory usage is significantly higher >> (2.3x) than 2003 for loading the same data. >> >> Tested load of 1,000,000 rows using code from this article. Made two >> modifications, Unique = false (to speed up the ADO v1.1 load, since it >> takes 30 minutes), and a Console.ReadLine at the end. >> >> Results (using Process Explorer v9.25 for memory usage): >> >> .NET v1.1 >> >> Time to load: 6.8s >> Mem Usage (Private, Working, Virtual) : 73,592K, 72,828K, 147,368K >> >> - - - - - - - >> >> .NET v2.0 >> >> Time to load: 11.3s >> Mem usage: 168,220K, 161,264K, 241,792K >> >> >> When digging a little deeper using .NET Memory Profiler v2.5, I found >> these major differences: >> >> ADO.NET 1.1 (top 5 classes by Bytes): >> >> Class Total Instances Total Bytes >> ----------------------------------------------------- >> DataRow 500,000 20,000,000 >> Int32[] 10,293 8,717,856 >> Object[] 10,551 3,379,952 >> DataRow[] 2 3,145,760 >> ArrayListEnumerator... 20,530 497,720 >> ---------- >> 35,741,288 >> >> >> ADO.NET 2.0 (top 5 classes by Total Bytes): >> >> Class Total Instances Total Bytes >> ----------------------------------------------------- >> DataRow 500,000 32,000,000 >> RBTree<int>.Node[] 225 16,095,884 >> RBTree<DataRow>.Node[] 225 16,095,884 >> Int32[] 472 4,457,268 >> DataRow[] 2 2,097,184 >> ---------- >> 70,746,220 >> >> >> The instance size of DataRow has increased by 60% >> >> Introduced 2 new objects, RBTree. For the massive performance >> improvements, I'm sure these binary trees are necessary, and it appears >> they hold references to all the rows in the data set, as they are about >> 32 bytes in size for each instance of Node, and amount to a figure close >> enough to 500000 if you divide 16,095,884 by 32. >> >> Anyways, I just wanted to bring this up, as it could have an impact for >> some, if memory is tight. >> >> Cheers, >> >> Stuart >> > > Hi Stuart,
Sorry, you are right - I've missed the bottom part. As Sahil said this is probably the cost of performance. Anyway,I don't see it as a problem since is bad habit to load plenty of rows anyway. -- Show quoteMiha Markic [MVP C#] RightHand .NET consulting & development www.rthand.com Blog: http://cs.rthand.com/blogs/blog_with_righthand/ "Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message news:%23JIzTRK4FHA.268@TK2MSFTNGP10.phx.gbl... > Why would you say Process Explorer isn't a good indicator of overall > process memory usage? It shows an increase in Private and Working set > bytes. > > If you read my entire email, I posted all the numbers from ".NET Memory > Profiler v2.5" too (which is a .NET specific memory profiler tool), and > internally the class usage shows a 2 fold increase in raw memory usage. > > Cheers, > > Stu > > "Miha Markic [MVP C#]" <miha at rthand com> wrote in message > news:e5A8GkE4FHA.3684@TK2MSFTNGP10.phx.gbl... >> Stuart, >> >> Process Explorer is really not a good tool to measure memory usage. You >> should use one of the memory profilers out there instead. >> >> -- >> Miha Markic [MVP C#] >> RightHand .NET consulting & development www.rthand.com >> Blog: http://cs.rthand.com/blogs/blog_with_righthand/ >> >> "Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message >> news:%23RI5ri93FHA.3588@TK2MSFTNGP15.phx.gbl... >>> Firstly, the index performance improvement is awesome, I've seen a 75x >>> improvement in test cases. >>> >>> Using the RTM version of Whidbey and the code from November 2005 MSDN >>> article >>> (http://msdn.microsoft.com/msdnmag/issues/05/11/DataPoints/default.aspx), >>> I ran the same tests in my environment with ADO.NET 1.1 and 2.0. I'd >>> like to raise as point that the memory usage is significantly higher >>> (2.3x) than 2003 for loading the same data. >>> >>> Tested load of 1,000,000 rows using code from this article. Made two >>> modifications, Unique = false (to speed up the ADO v1.1 load, since it >>> takes 30 minutes), and a Console.ReadLine at the end. >>> >>> Results (using Process Explorer v9.25 for memory usage): >>> >>> .NET v1.1 >>> >>> Time to load: 6.8s >>> Mem Usage (Private, Working, Virtual) : 73,592K, 72,828K, 147,368K >>> >>> - - - - - - - >>> >>> .NET v2.0 >>> >>> Time to load: 11.3s >>> Mem usage: 168,220K, 161,264K, 241,792K >>> >>> >>> When digging a little deeper using .NET Memory Profiler v2.5, I found >>> these major differences: >>> >>> ADO.NET 1.1 (top 5 classes by Bytes): >>> >>> Class Total Instances Total Bytes >>> ----------------------------------------------------- >>> DataRow 500,000 20,000,000 >>> Int32[] 10,293 8,717,856 >>> Object[] 10,551 3,379,952 >>> DataRow[] 2 3,145,760 >>> ArrayListEnumerator... 20,530 497,720 >>> ---------- >>> 35,741,288 >>> >>> >>> ADO.NET 2.0 (top 5 classes by Total Bytes): >>> >>> Class Total Instances Total Bytes >>> ----------------------------------------------------- >>> DataRow 500,000 32,000,000 >>> RBTree<int>.Node[] 225 16,095,884 >>> RBTree<DataRow>.Node[] 225 16,095,884 >>> Int32[] 472 4,457,268 >>> DataRow[] 2 2,097,184 >>> ---------- >>> 70,746,220 >>> >>> >>> The instance size of DataRow has increased by 60% >>> >>> Introduced 2 new objects, RBTree. For the massive performance >>> improvements, I'm sure these binary trees are necessary, and it appears >>> they hold references to all the rows in the data set, as they are about >>> 32 bytes in size for each instance of Node, and amount to a figure close >>> enough to 500000 if you divide 16,095,884 by 32. >>> >>> Anyways, I just wanted to bring this up, as it could have an impact for >>> some, if memory is tight. >>> >>> Cheers, >>> >>> Stuart >>> >> >> > > The way we store data is now different and our structures take more space
and "maintenance" than on everett. Those structures are needed so that we can provide better performance and scalability. We are looking for ways to improve mem consumption. The case in question is almost a worst case for whidbey because the data stored per row is minimal and the operation is a sequential insert, which on whidbey requires more time to rearrange the RBTree and on everett it's just sequential. If you run it with random insert ids [1] instead of sequential, you'll notice that whidbey is faster and behaves almost linearly, whereas everett as the nb of rows increase becomes much slower. If you insert & delete data from the table, the difference is even larger. Thanks for taking the time to investigate dataset perf and report your results, and if you find other scenarios please let us know. -- Show quote--VV [MS] [1] public class LFSR { uint _n; public LFSR(uint n) { _n = n; } public uint Next() { _n = _n >> 1 | (((_n ^ (_n >> 3)) & 1) << 30); return _n; } } static void RandomInsert() { DataTable dt = new DataTable("foo"); DataColumn pkCol = new DataColumn("ID", Type.GetType("System.Int32")); dt.Columns.Add(pkCol); dt.PrimaryKey = new DataColumn[] { pkCol }; dt.Columns.Add("SomeNumber", Type.GetType("System.Int32")); LFSR _seq = new LFSR(1); int limit = 50000; int someNumber = limit; DateTime startTime = DateTime.Now; for (int i = 1; i <= limit; i++) { DataRow row = dt.NewRow(); row["ID"] = _seq.Next(); row["SomeNumber"] = someNumber--; dt.Rows.Add(row); } TimeSpan elapsedTime = DateTime.Now - startTime; Console.WriteLine(dt.Rows.Count.ToString() + " rows loaded in " + elapsedTime.TotalSeconds + " seconds."); } "Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message news:%23JIzTRK4FHA.268@TK2MSFTNGP10.phx.gbl... > Why would you say Process Explorer isn't a good indicator of overall > process memory usage? It shows an increase in Private and Working set > bytes. > > If you read my entire email, I posted all the numbers from ".NET Memory > Profiler v2.5" too (which is a .NET specific memory profiler tool), and > internally the class usage shows a 2 fold increase in raw memory usage. > > Cheers, > > Stu > > "Miha Markic [MVP C#]" <miha at rthand com> wrote in message > news:e5A8GkE4FHA.3684@TK2MSFTNGP10.phx.gbl... >> Stuart, >> >> Process Explorer is really not a good tool to measure memory usage. You >> should use one of the memory profilers out there instead. >> >> -- >> Miha Markic [MVP C#] >> RightHand .NET consulting & development www.rthand.com >> Blog: http://cs.rthand.com/blogs/blog_with_righthand/ >> >> "Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message >> news:%23RI5ri93FHA.3588@TK2MSFTNGP15.phx.gbl... >>> Firstly, the index performance improvement is awesome, I've seen a 75x >>> improvement in test cases. >>> >>> Using the RTM version of Whidbey and the code from November 2005 MSDN >>> article >>> (http://msdn.microsoft.com/msdnmag/issues/05/11/DataPoints/default.aspx), >>> I ran the same tests in my environment with ADO.NET 1.1 and 2.0. I'd >>> like to raise as point that the memory usage is significantly higher >>> (2.3x) than 2003 for loading the same data. >>> >>> Tested load of 1,000,000 rows using code from this article. Made two >>> modifications, Unique = false (to speed up the ADO v1.1 load, since it >>> takes 30 minutes), and a Console.ReadLine at the end. >>> >>> Results (using Process Explorer v9.25 for memory usage): >>> >>> .NET v1.1 >>> >>> Time to load: 6.8s >>> Mem Usage (Private, Working, Virtual) : 73,592K, 72,828K, 147,368K >>> >>> - - - - - - - >>> >>> .NET v2.0 >>> >>> Time to load: 11.3s >>> Mem usage: 168,220K, 161,264K, 241,792K >>> >>> >>> When digging a little deeper using .NET Memory Profiler v2.5, I found >>> these major differences: >>> >>> ADO.NET 1.1 (top 5 classes by Bytes): >>> >>> Class Total Instances Total Bytes >>> ----------------------------------------------------- >>> DataRow 500,000 20,000,000 >>> Int32[] 10,293 8,717,856 >>> Object[] 10,551 3,379,952 >>> DataRow[] 2 3,145,760 >>> ArrayListEnumerator... 20,530 497,720 >>> ---------- >>> 35,741,288 >>> >>> >>> ADO.NET 2.0 (top 5 classes by Total Bytes): >>> >>> Class Total Instances Total Bytes >>> ----------------------------------------------------- >>> DataRow 500,000 32,000,000 >>> RBTree<int>.Node[] 225 16,095,884 >>> RBTree<DataRow>.Node[] 225 16,095,884 >>> Int32[] 472 4,457,268 >>> DataRow[] 2 2,097,184 >>> ---------- >>> 70,746,220 >>> >>> >>> The instance size of DataRow has increased by 60% >>> >>> Introduced 2 new objects, RBTree. For the massive performance >>> improvements, I'm sure these binary trees are necessary, and it appears >>> they hold references to all the rows in the data set, as they are about >>> 32 bytes in size for each instance of Node, and amount to a figure close >>> enough to 500000 if you divide 16,095,884 by 32. >>> >>> Anyways, I just wanted to bring this up, as it could have an impact for >>> some, if memory is tight. >>> >>> Cheers, >>> >>> Stuart >>> >> >> > > Thanks Vasco - I realise the example I gave is only one case, and admittedly
very contrived - I profile a lot of code, and generally create numerous scenarios to prove out overall performance. I figured if I posted 10 pages of results, people wouldn't read it :) I did post that index perf was immensely faster first ;-) Still, given this is worst case, we're sitting in a great position I feel. I understand that RBTree's certainly get stressed the most with sequential adds of key values, there one of my preferred data structures. Anyway, great job with regards to the improved perf and again, my posting was merely observations - as I'll take the perf over the mem usuage too. Cheers, Stu Show quote "Vasco Veiga [MS]" <vas***@online.microsoft.com> wrote in message news:u2tKlnM4FHA.400@TK2MSFTNGP09.phx.gbl... > The way we store data is now different and our structures take more space > and "maintenance" than on everett. Those structures are needed so that we > can provide better performance and scalability. We are looking for ways to > improve mem consumption. > > The case in question is almost a worst case for whidbey because the data > stored per row is minimal and the operation is a sequential insert, which > on whidbey requires more time to rearrange the RBTree and on everett it's > just sequential. > > If you run it with random insert ids [1] instead of sequential, you'll > notice that whidbey is faster and behaves almost linearly, whereas everett > as the nb of rows increase becomes much slower. If you insert & delete > data from the table, the difference is even larger. > > Thanks for taking the time to investigate dataset perf and report your > results, and if you find other scenarios please let us know. > > -- > --VV [MS] > > [1] > public class LFSR > { > uint _n; > > public LFSR(uint n) > { > _n = n; > } > > public uint Next() > { > _n = _n >> 1 | (((_n ^ (_n >> 3)) & 1) << 30); > > return _n; > } > } > > > static void RandomInsert() > { > DataTable dt = new DataTable("foo"); > DataColumn pkCol = new DataColumn("ID", > Type.GetType("System.Int32")); > dt.Columns.Add(pkCol); > dt.PrimaryKey = new DataColumn[] { pkCol }; > dt.Columns.Add("SomeNumber", Type.GetType("System.Int32")); > > LFSR _seq = new LFSR(1); > > int limit = 50000; > int someNumber = limit; > DateTime startTime = DateTime.Now; > for (int i = 1; i <= limit; i++) > { > DataRow row = dt.NewRow(); > > row["ID"] = _seq.Next(); > row["SomeNumber"] = someNumber--; > dt.Rows.Add(row); > } > > TimeSpan elapsedTime = DateTime.Now - startTime; > Console.WriteLine(dt.Rows.Count.ToString() + " rows loaded in " > + elapsedTime.TotalSeconds + " seconds."); > > } > > > "Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message > news:%23JIzTRK4FHA.268@TK2MSFTNGP10.phx.gbl... >> Why would you say Process Explorer isn't a good indicator of overall >> process memory usage? It shows an increase in Private and Working set >> bytes. >> >> If you read my entire email, I posted all the numbers from ".NET Memory >> Profiler v2.5" too (which is a .NET specific memory profiler tool), and >> internally the class usage shows a 2 fold increase in raw memory usage. >> >> Cheers, >> >> Stu >> >> "Miha Markic [MVP C#]" <miha at rthand com> wrote in message >> news:e5A8GkE4FHA.3684@TK2MSFTNGP10.phx.gbl... >>> Stuart, >>> >>> Process Explorer is really not a good tool to measure memory usage. You >>> should use one of the memory profilers out there instead. >>> >>> -- >>> Miha Markic [MVP C#] >>> RightHand .NET consulting & development www.rthand.com >>> Blog: http://cs.rthand.com/blogs/blog_with_righthand/ >>> >>> "Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message >>> news:%23RI5ri93FHA.3588@TK2MSFTNGP15.phx.gbl... >>>> Firstly, the index performance improvement is awesome, I've seen a 75x >>>> improvement in test cases. >>>> >>>> Using the RTM version of Whidbey and the code from November 2005 MSDN >>>> article >>>> (http://msdn.microsoft.com/msdnmag/issues/05/11/DataPoints/default.aspx), >>>> I ran the same tests in my environment with ADO.NET 1.1 and 2.0. I'd >>>> like to raise as point that the memory usage is significantly higher >>>> (2.3x) than 2003 for loading the same data. >>>> >>>> Tested load of 1,000,000 rows using code from this article. Made two >>>> modifications, Unique = false (to speed up the ADO v1.1 load, since it >>>> takes 30 minutes), and a Console.ReadLine at the end. >>>> >>>> Results (using Process Explorer v9.25 for memory usage): >>>> >>>> .NET v1.1 >>>> >>>> Time to load: 6.8s >>>> Mem Usage (Private, Working, Virtual) : 73,592K, 72,828K, 147,368K >>>> >>>> - - - - - - - >>>> >>>> .NET v2.0 >>>> >>>> Time to load: 11.3s >>>> Mem usage: 168,220K, 161,264K, 241,792K >>>> >>>> >>>> When digging a little deeper using .NET Memory Profiler v2.5, I found >>>> these major differences: >>>> >>>> ADO.NET 1.1 (top 5 classes by Bytes): >>>> >>>> Class Total Instances Total Bytes >>>> ----------------------------------------------------- >>>> DataRow 500,000 20,000,000 >>>> Int32[] 10,293 8,717,856 >>>> Object[] 10,551 3,379,952 >>>> DataRow[] 2 3,145,760 >>>> ArrayListEnumerator... 20,530 497,720 >>>> ---------- >>>> 35,741,288 >>>> >>>> >>>> ADO.NET 2.0 (top 5 classes by Total Bytes): >>>> >>>> Class Total Instances Total Bytes >>>> ----------------------------------------------------- >>>> DataRow 500,000 32,000,000 >>>> RBTree<int>.Node[] 225 16,095,884 >>>> RBTree<DataRow>.Node[] 225 16,095,884 >>>> Int32[] 472 4,457,268 >>>> DataRow[] 2 2,097,184 >>>> ---------- >>>> 70,746,220 >>>> >>>> >>>> The instance size of DataRow has increased by 60% >>>> >>>> Introduced 2 new objects, RBTree. For the massive performance >>>> improvements, I'm sure these binary trees are necessary, and it appears >>>> they hold references to all the rows in the data set, as they are about >>>> 32 bytes in size for each instance of Node, and amount to a figure >>>> close enough to 500000 if you divide 16,095,884 by 32. >>>> >>>> Anyways, I just wanted to bring this up, as it could have an impact for >>>> some, if memory is tight. >>>> >>>> Cheers, >>>> >>>> Stuart >>>> >>> >>> >> >> > > |
|||||||||||||||||||||||