Home All Groups Group Topic Archive Search About

Memory usage increases with ADO.NET v2.0 RTM

Author
2 Nov 2005 6:10 PM
Stuart Carnie
Firstly, the index performance improvement is awesome, I've seen a 75x
improvement in test cases.

Using the RTM version of Whidbey and the code from November 2005 MSDN
article
(http://msdn.microsoft.com/msdnmag/issues/05/11/DataPoints/default.aspx), I
ran the same tests in my environment with ADO.NET 1.1 and 2.0.  I'd like to
raise as point that the memory usage is significantly higher (2.3x) than
2003 for loading the same data.

Tested load of 1,000,000 rows using code from this article. Made two
modifications, Unique = false (to speed up the ADO v1.1 load, since it takes
30 minutes), and a Console.ReadLine at the end.

Results (using Process Explorer v9.25 for memory usage):

..NET v1.1

Time to load: 6.8s
Mem Usage (Private, Working, Virtual) : 73,592K, 72,828K, 147,368K

- - - - - - -

..NET v2.0

Time to load: 11.3s
Mem usage: 168,220K, 161,264K, 241,792K


When digging a little deeper using .NET Memory Profiler v2.5, I found these
major differences:

ADO.NET 1.1 (top 5 classes by Bytes):

Class                  Total Instances    Total Bytes
-----------------------------------------------------
DataRow                500,000            20,000,000
Int32[]                10,293              8,717,856
Object[]               10,551              3,379,952
DataRow[]              2                   3,145,760
ArrayListEnumerator... 20,530                497,720
                                          ----------
                                          35,741,288


ADO.NET 2.0 (top 5 classes by Total Bytes):

Class                  Total Instances    Total Bytes
-----------------------------------------------------
DataRow                500,000            32,000,000
RBTree<int>.Node[]     225                16,095,884
RBTree<DataRow>.Node[] 225                16,095,884
Int32[]                472                 4,457,268
DataRow[]              2                   2,097,184
                                          ----------
                                          70,746,220


The instance size of DataRow has increased by 60%

Introduced 2 new objects, RBTree.  For the massive performance improvements,
I'm sure these binary trees are necessary, and it appears they hold
references to all the rows in the data set, as they are about 32 bytes in
size for each instance of Node, and amount to a figure close enough to
500000 if you divide 16,095,884 by 32.

Anyways, I just wanted to bring this up, as it could have an impact for
some, if memory is tight.

Cheers,

Stuart

Author
2 Nov 2005 9:43 PM
Stuart Carnie
Just wanted to add that the memory pressure in 2.0 was quite a bit higher:

ADO.NET v1.1:

Gen #0 GCs: 39
Gen #1 GCs: 30
Gen #2 GCs: 3


ADO.NET v2.0:

Gen #0 GCs: 269
Gen #1 GCs: 99
Gen #2 GCs: 4


Thoughts?

Cheers,

Stu

Show quote
"Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message
news:%23RI5ri93FHA.3588@TK2MSFTNGP15.phx.gbl...
> Firstly, the index performance improvement is awesome, I've seen a 75x
> improvement in test cases.
>
> Using the RTM version of Whidbey and the code from November 2005 MSDN
> article
> (http://msdn.microsoft.com/msdnmag/issues/05/11/DataPoints/default.aspx),
> I ran the same tests in my environment with ADO.NET 1.1 and 2.0.  I'd like
> to raise as point that the memory usage is significantly higher (2.3x)
> than 2003 for loading the same data.
>
> Tested load of 1,000,000 rows using code from this article. Made two
> modifications, Unique = false (to speed up the ADO v1.1 load, since it
> takes 30 minutes), and a Console.ReadLine at the end.
>
> Results (using Process Explorer v9.25 for memory usage):
>
> .NET v1.1
>
> Time to load: 6.8s
> Mem Usage (Private, Working, Virtual) : 73,592K, 72,828K, 147,368K
>
> - - - - - - -
>
> .NET v2.0
>
> Time to load: 11.3s
> Mem usage: 168,220K, 161,264K, 241,792K
>
>
> When digging a little deeper using .NET Memory Profiler v2.5, I found
> these major differences:
>
> ADO.NET 1.1 (top 5 classes by Bytes):
>
> Class                  Total Instances    Total Bytes
> -----------------------------------------------------
> DataRow                500,000            20,000,000
> Int32[]                10,293              8,717,856
> Object[]               10,551              3,379,952
> DataRow[]              2                   3,145,760
> ArrayListEnumerator... 20,530                497,720
>                                          ----------
>                                          35,741,288
>
>
> ADO.NET 2.0 (top 5 classes by Total Bytes):
>
> Class                  Total Instances    Total Bytes
> -----------------------------------------------------
> DataRow                500,000            32,000,000
> RBTree<int>.Node[]     225                16,095,884
> RBTree<DataRow>.Node[] 225                16,095,884
> Int32[]                472                 4,457,268
> DataRow[]              2                   2,097,184
>                                          ----------
>                                          70,746,220
>
>
> The instance size of DataRow has increased by 60%
>
> Introduced 2 new objects, RBTree.  For the massive performance
> improvements, I'm sure these binary trees are necessary, and it appears
> they hold references to all the rows in the data set, as they are about 32
> bytes in size for each instance of Node, and amount to a figure close
> enough to 500000 if you divide 16,095,884 by 32.
>
> Anyways, I just wanted to bring this up, as it could have an impact for
> some, if memory is tight.
>
> Cheers,
>
> Stuart
>
Author
2 Nov 2005 11:12 PM
Sahil Malik [MVP]
The Collection mechanism inside ADO.NET 2.0 is much superior in terms of
performance - but to gain something you gotta give up something. Since it is
a bit more sophisticated than Arraylist (as in .NET 1.1), it may result in a
higher memory usage (it uses a Red black tree).

Frankly, considering the benefits, I'd much rather go with the tradeoff.

- Sahil Malik [MVP]
ADO.NET 2.0 book -
http://codebetter.com/blogs/sahil.malik/archive/2005/05/13/63199.aspx
-------------------------------------------------------------------------------------------



Show quote
"Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message
news:%23Vo$Ba$3FHA.3244@tk2msftngp13.phx.gbl...
> Just wanted to add that the memory pressure in 2.0 was quite a bit higher:
>
> ADO.NET v1.1:
>
> Gen #0 GCs: 39
> Gen #1 GCs: 30
> Gen #2 GCs: 3
>
>
> ADO.NET v2.0:
>
> Gen #0 GCs: 269
> Gen #1 GCs: 99
> Gen #2 GCs: 4
>
>
> Thoughts?
>
> Cheers,
>
> Stu
>
> "Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message
> news:%23RI5ri93FHA.3588@TK2MSFTNGP15.phx.gbl...
>> Firstly, the index performance improvement is awesome, I've seen a 75x
>> improvement in test cases.
>>
>> Using the RTM version of Whidbey and the code from November 2005 MSDN
>> article
>> (http://msdn.microsoft.com/msdnmag/issues/05/11/DataPoints/default.aspx),
>> I ran the same tests in my environment with ADO.NET 1.1 and 2.0.  I'd
>> like to raise as point that the memory usage is significantly higher
>> (2.3x) than 2003 for loading the same data.
>>
>> Tested load of 1,000,000 rows using code from this article. Made two
>> modifications, Unique = false (to speed up the ADO v1.1 load, since it
>> takes 30 minutes), and a Console.ReadLine at the end.
>>
>> Results (using Process Explorer v9.25 for memory usage):
>>
>> .NET v1.1
>>
>> Time to load: 6.8s
>> Mem Usage (Private, Working, Virtual) : 73,592K, 72,828K, 147,368K
>>
>> - - - - - - -
>>
>> .NET v2.0
>>
>> Time to load: 11.3s
>> Mem usage: 168,220K, 161,264K, 241,792K
>>
>>
>> When digging a little deeper using .NET Memory Profiler v2.5, I found
>> these major differences:
>>
>> ADO.NET 1.1 (top 5 classes by Bytes):
>>
>> Class                  Total Instances    Total Bytes
>> -----------------------------------------------------
>> DataRow                500,000            20,000,000
>> Int32[]                10,293              8,717,856
>> Object[]               10,551              3,379,952
>> DataRow[]              2                   3,145,760
>> ArrayListEnumerator... 20,530                497,720
>>                                          ----------
>>                                          35,741,288
>>
>>
>> ADO.NET 2.0 (top 5 classes by Total Bytes):
>>
>> Class                  Total Instances    Total Bytes
>> -----------------------------------------------------
>> DataRow                500,000            32,000,000
>> RBTree<int>.Node[]     225                16,095,884
>> RBTree<DataRow>.Node[] 225                16,095,884
>> Int32[]                472                 4,457,268
>> DataRow[]              2                   2,097,184
>>                                          ----------
>>                                          70,746,220
>>
>>
>> The instance size of DataRow has increased by 60%
>>
>> Introduced 2 new objects, RBTree.  For the massive performance
>> improvements, I'm sure these binary trees are necessary, and it appears
>> they hold references to all the rows in the data set, as they are about
>> 32 bytes in size for each instance of Node, and amount to a figure close
>> enough to 500000 if you divide 16,095,884 by 32.
>>
>> Anyways, I just wanted to bring this up, as it could have an impact for
>> some, if memory is tight.
>>
>> Cheers,
>>
>> Stuart
>>
>
>
Author
3 Nov 2005 6:30 PM
Stuart Carnie
Don't get me wrong, I am not complaining of the additional memory usage - I
too am in favour of the performance and understand the trade-offs, which I
clearly point out in my second last paragraph, by referencing the fact a new
RBTree structure is used, for performance reasons.

I am merely raising the point that for people working with large datasets,
they will potentially see increased memory usage, and I've provided them a
first place to look.

Cheers,

Stuart

Show quote
"Sahil Malik [MVP]" <contactmethrumyblog@nospam.com> wrote in message
news:O4GJabI4FHA.2364@TK2MSFTNGP12.phx.gbl...
> The Collection mechanism inside ADO.NET 2.0 is much superior in terms of
> performance - but to gain something you gotta give up something. Since it
> is a bit more sophisticated than Arraylist (as in .NET 1.1), it may result
> in a higher memory usage (it uses a Red black tree).
>
> Frankly, considering the benefits, I'd much rather go with the tradeoff.
>
> - Sahil Malik [MVP]
> ADO.NET 2.0 book -
> http://codebetter.com/blogs/sahil.malik/archive/2005/05/13/63199.aspx
> -------------------------------------------------------------------------------------------
>
>
>
> "Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message
> news:%23Vo$Ba$3FHA.3244@tk2msftngp13.phx.gbl...
>> Just wanted to add that the memory pressure in 2.0 was quite a bit
>> higher:
>>
>> ADO.NET v1.1:
>>
>> Gen #0 GCs: 39
>> Gen #1 GCs: 30
>> Gen #2 GCs: 3
>>
>>
>> ADO.NET v2.0:
>>
>> Gen #0 GCs: 269
>> Gen #1 GCs: 99
>> Gen #2 GCs: 4
>>
>>
>> Thoughts?
>>
>> Cheers,
>>
>> Stu
>>
>> "Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message
>> news:%23RI5ri93FHA.3588@TK2MSFTNGP15.phx.gbl...
>>> Firstly, the index performance improvement is awesome, I've seen a 75x
>>> improvement in test cases.
>>>
>>> Using the RTM version of Whidbey and the code from November 2005 MSDN
>>> article
>>> (http://msdn.microsoft.com/msdnmag/issues/05/11/DataPoints/default.aspx),
>>> I ran the same tests in my environment with ADO.NET 1.1 and 2.0.  I'd
>>> like to raise as point that the memory usage is significantly higher
>>> (2.3x) than 2003 for loading the same data.
>>>
>>> Tested load of 1,000,000 rows using code from this article. Made two
>>> modifications, Unique = false (to speed up the ADO v1.1 load, since it
>>> takes 30 minutes), and a Console.ReadLine at the end.
>>>
>>> Results (using Process Explorer v9.25 for memory usage):
>>>
>>> .NET v1.1
>>>
>>> Time to load: 6.8s
>>> Mem Usage (Private, Working, Virtual) : 73,592K, 72,828K, 147,368K
>>>
>>> - - - - - - -
>>>
>>> .NET v2.0
>>>
>>> Time to load: 11.3s
>>> Mem usage: 168,220K, 161,264K, 241,792K
>>>
>>>
>>> When digging a little deeper using .NET Memory Profiler v2.5, I found
>>> these major differences:
>>>
>>> ADO.NET 1.1 (top 5 classes by Bytes):
>>>
>>> Class                  Total Instances    Total Bytes
>>> -----------------------------------------------------
>>> DataRow                500,000            20,000,000
>>> Int32[]                10,293              8,717,856
>>> Object[]               10,551              3,379,952
>>> DataRow[]              2                   3,145,760
>>> ArrayListEnumerator... 20,530                497,720
>>>                                          ----------
>>>                                          35,741,288
>>>
>>>
>>> ADO.NET 2.0 (top 5 classes by Total Bytes):
>>>
>>> Class                  Total Instances    Total Bytes
>>> -----------------------------------------------------
>>> DataRow                500,000            32,000,000
>>> RBTree<int>.Node[]     225                16,095,884
>>> RBTree<DataRow>.Node[] 225                16,095,884
>>> Int32[]                472                 4,457,268
>>> DataRow[]              2                   2,097,184
>>>                                          ----------
>>>                                          70,746,220
>>>
>>>
>>> The instance size of DataRow has increased by 60%
>>>
>>> Introduced 2 new objects, RBTree.  For the massive performance
>>> improvements, I'm sure these binary trees are necessary, and it appears
>>> they hold references to all the rows in the data set, as they are about
>>> 32 bytes in size for each instance of Node, and amount to a figure close
>>> enough to 500000 if you divide 16,095,884 by 32.
>>>
>>> Anyways, I just wanted to bring this up, as it could have an impact for
>>> some, if memory is tight.
>>>
>>> Cheers,
>>>
>>> Stuart
>>>
>>
>>
>
>
Author
3 Nov 2005 7:34 AM
Miha Markic [MVP C#]
Stuart,

Process Explorer is really not a good tool to measure memory usage. You
should use one of the memory profilers out there instead.

--
Miha Markic [MVP C#]
RightHand .NET consulting & development www.rthand.com
Blog: http://cs.rthand.com/blogs/blog_with_righthand/

Show quote
"Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message
news:%23RI5ri93FHA.3588@TK2MSFTNGP15.phx.gbl...
> Firstly, the index performance improvement is awesome, I've seen a 75x
> improvement in test cases.
>
> Using the RTM version of Whidbey and the code from November 2005 MSDN
> article
> (http://msdn.microsoft.com/msdnmag/issues/05/11/DataPoints/default.aspx),
> I ran the same tests in my environment with ADO.NET 1.1 and 2.0.  I'd like
> to raise as point that the memory usage is significantly higher (2.3x)
> than 2003 for loading the same data.
>
> Tested load of 1,000,000 rows using code from this article. Made two
> modifications, Unique = false (to speed up the ADO v1.1 load, since it
> takes 30 minutes), and a Console.ReadLine at the end.
>
> Results (using Process Explorer v9.25 for memory usage):
>
> .NET v1.1
>
> Time to load: 6.8s
> Mem Usage (Private, Working, Virtual) : 73,592K, 72,828K, 147,368K
>
> - - - - - - -
>
> .NET v2.0
>
> Time to load: 11.3s
> Mem usage: 168,220K, 161,264K, 241,792K
>
>
> When digging a little deeper using .NET Memory Profiler v2.5, I found
> these major differences:
>
> ADO.NET 1.1 (top 5 classes by Bytes):
>
> Class                  Total Instances    Total Bytes
> -----------------------------------------------------
> DataRow                500,000            20,000,000
> Int32[]                10,293              8,717,856
> Object[]               10,551              3,379,952
> DataRow[]              2                   3,145,760
> ArrayListEnumerator... 20,530                497,720
>                                          ----------
>                                          35,741,288
>
>
> ADO.NET 2.0 (top 5 classes by Total Bytes):
>
> Class                  Total Instances    Total Bytes
> -----------------------------------------------------
> DataRow                500,000            32,000,000
> RBTree<int>.Node[]     225                16,095,884
> RBTree<DataRow>.Node[] 225                16,095,884
> Int32[]                472                 4,457,268
> DataRow[]              2                   2,097,184
>                                          ----------
>                                          70,746,220
>
>
> The instance size of DataRow has increased by 60%
>
> Introduced 2 new objects, RBTree.  For the massive performance
> improvements, I'm sure these binary trees are necessary, and it appears
> they hold references to all the rows in the data set, as they are about 32
> bytes in size for each instance of Node, and amount to a figure close
> enough to 500000 if you divide 16,095,884 by 32.
>
> Anyways, I just wanted to bring this up, as it could have an impact for
> some, if memory is tight.
>
> Cheers,
>
> Stuart
>
Author
3 Nov 2005 6:28 PM
Stuart Carnie
Why would you say Process Explorer isn't a good indicator of overall process
memory usage?  It shows an increase in Private and Working set bytes.

If you read my entire email, I posted all the numbers from ".NET Memory
Profiler v2.5" too (which is a .NET specific memory profiler tool), and
internally the class usage shows a 2 fold increase in raw memory usage.

Cheers,

Stu

Show quote
"Miha Markic [MVP C#]" <miha at rthand com> wrote in message
news:e5A8GkE4FHA.3684@TK2MSFTNGP10.phx.gbl...
> Stuart,
>
> Process Explorer is really not a good tool to measure memory usage. You
> should use one of the memory profilers out there instead.
>
> --
> Miha Markic [MVP C#]
> RightHand .NET consulting & development www.rthand.com
> Blog: http://cs.rthand.com/blogs/blog_with_righthand/
>
> "Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message
> news:%23RI5ri93FHA.3588@TK2MSFTNGP15.phx.gbl...
>> Firstly, the index performance improvement is awesome, I've seen a 75x
>> improvement in test cases.
>>
>> Using the RTM version of Whidbey and the code from November 2005 MSDN
>> article
>> (http://msdn.microsoft.com/msdnmag/issues/05/11/DataPoints/default.aspx),
>> I ran the same tests in my environment with ADO.NET 1.1 and 2.0.  I'd
>> like to raise as point that the memory usage is significantly higher
>> (2.3x) than 2003 for loading the same data.
>>
>> Tested load of 1,000,000 rows using code from this article. Made two
>> modifications, Unique = false (to speed up the ADO v1.1 load, since it
>> takes 30 minutes), and a Console.ReadLine at the end.
>>
>> Results (using Process Explorer v9.25 for memory usage):
>>
>> .NET v1.1
>>
>> Time to load: 6.8s
>> Mem Usage (Private, Working, Virtual) : 73,592K, 72,828K, 147,368K
>>
>> - - - - - - -
>>
>> .NET v2.0
>>
>> Time to load: 11.3s
>> Mem usage: 168,220K, 161,264K, 241,792K
>>
>>
>> When digging a little deeper using .NET Memory Profiler v2.5, I found
>> these major differences:
>>
>> ADO.NET 1.1 (top 5 classes by Bytes):
>>
>> Class                  Total Instances    Total Bytes
>> -----------------------------------------------------
>> DataRow                500,000            20,000,000
>> Int32[]                10,293              8,717,856
>> Object[]               10,551              3,379,952
>> DataRow[]              2                   3,145,760
>> ArrayListEnumerator... 20,530                497,720
>>                                          ----------
>>                                          35,741,288
>>
>>
>> ADO.NET 2.0 (top 5 classes by Total Bytes):
>>
>> Class                  Total Instances    Total Bytes
>> -----------------------------------------------------
>> DataRow                500,000            32,000,000
>> RBTree<int>.Node[]     225                16,095,884
>> RBTree<DataRow>.Node[] 225                16,095,884
>> Int32[]                472                 4,457,268
>> DataRow[]              2                   2,097,184
>>                                          ----------
>>                                          70,746,220
>>
>>
>> The instance size of DataRow has increased by 60%
>>
>> Introduced 2 new objects, RBTree.  For the massive performance
>> improvements, I'm sure these binary trees are necessary, and it appears
>> they hold references to all the rows in the data set, as they are about
>> 32 bytes in size for each instance of Node, and amount to a figure close
>> enough to 500000 if you divide 16,095,884 by 32.
>>
>> Anyways, I just wanted to bring this up, as it could have an impact for
>> some, if memory is tight.
>>
>> Cheers,
>>
>> Stuart
>>
>
>
Author
3 Nov 2005 9:27 PM
Miha Markic [MVP C#]
Hi Stuart,

Sorry, you are right - I've missed the bottom part.
As Sahil said this is probably the cost of performance.
Anyway,I don't see it as a problem since is bad habit to load plenty of rows
anyway.

--
Miha Markic [MVP C#]
RightHand .NET consulting & development www.rthand.com
Blog: http://cs.rthand.com/blogs/blog_with_righthand/

Show quote
"Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message
news:%23JIzTRK4FHA.268@TK2MSFTNGP10.phx.gbl...
> Why would you say Process Explorer isn't a good indicator of overall
> process memory usage?  It shows an increase in Private and Working set
> bytes.
>
> If you read my entire email, I posted all the numbers from ".NET Memory
> Profiler v2.5" too (which is a .NET specific memory profiler tool), and
> internally the class usage shows a 2 fold increase in raw memory usage.
>
> Cheers,
>
> Stu
>
> "Miha Markic [MVP C#]" <miha at rthand com> wrote in message
> news:e5A8GkE4FHA.3684@TK2MSFTNGP10.phx.gbl...
>> Stuart,
>>
>> Process Explorer is really not a good tool to measure memory usage. You
>> should use one of the memory profilers out there instead.
>>
>> --
>> Miha Markic [MVP C#]
>> RightHand .NET consulting & development www.rthand.com
>> Blog: http://cs.rthand.com/blogs/blog_with_righthand/
>>
>> "Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message
>> news:%23RI5ri93FHA.3588@TK2MSFTNGP15.phx.gbl...
>>> Firstly, the index performance improvement is awesome, I've seen a 75x
>>> improvement in test cases.
>>>
>>> Using the RTM version of Whidbey and the code from November 2005 MSDN
>>> article
>>> (http://msdn.microsoft.com/msdnmag/issues/05/11/DataPoints/default.aspx),
>>> I ran the same tests in my environment with ADO.NET 1.1 and 2.0.  I'd
>>> like to raise as point that the memory usage is significantly higher
>>> (2.3x) than 2003 for loading the same data.
>>>
>>> Tested load of 1,000,000 rows using code from this article. Made two
>>> modifications, Unique = false (to speed up the ADO v1.1 load, since it
>>> takes 30 minutes), and a Console.ReadLine at the end.
>>>
>>> Results (using Process Explorer v9.25 for memory usage):
>>>
>>> .NET v1.1
>>>
>>> Time to load: 6.8s
>>> Mem Usage (Private, Working, Virtual) : 73,592K, 72,828K, 147,368K
>>>
>>> - - - - - - -
>>>
>>> .NET v2.0
>>>
>>> Time to load: 11.3s
>>> Mem usage: 168,220K, 161,264K, 241,792K
>>>
>>>
>>> When digging a little deeper using .NET Memory Profiler v2.5, I found
>>> these major differences:
>>>
>>> ADO.NET 1.1 (top 5 classes by Bytes):
>>>
>>> Class                  Total Instances    Total Bytes
>>> -----------------------------------------------------
>>> DataRow                500,000            20,000,000
>>> Int32[]                10,293              8,717,856
>>> Object[]               10,551              3,379,952
>>> DataRow[]              2                   3,145,760
>>> ArrayListEnumerator... 20,530                497,720
>>>                                          ----------
>>>                                          35,741,288
>>>
>>>
>>> ADO.NET 2.0 (top 5 classes by Total Bytes):
>>>
>>> Class                  Total Instances    Total Bytes
>>> -----------------------------------------------------
>>> DataRow                500,000            32,000,000
>>> RBTree<int>.Node[]     225                16,095,884
>>> RBTree<DataRow>.Node[] 225                16,095,884
>>> Int32[]                472                 4,457,268
>>> DataRow[]              2                   2,097,184
>>>                                          ----------
>>>                                          70,746,220
>>>
>>>
>>> The instance size of DataRow has increased by 60%
>>>
>>> Introduced 2 new objects, RBTree.  For the massive performance
>>> improvements, I'm sure these binary trees are necessary, and it appears
>>> they hold references to all the rows in the data set, as they are about
>>> 32 bytes in size for each instance of Node, and amount to a figure close
>>> enough to 500000 if you divide 16,095,884 by 32.
>>>
>>> Anyways, I just wanted to bring this up, as it could have an impact for
>>> some, if memory is tight.
>>>
>>> Cheers,
>>>
>>> Stuart
>>>
>>
>>
>
>
Author
3 Nov 2005 10:56 PM
Vasco Veiga [MS]
The way we store data is now different and our structures take more space
and "maintenance" than on everett. Those structures are needed so that we
can provide better performance and scalability. We are looking for ways to
improve mem consumption.

The case in question is almost a worst case for whidbey because the data
stored per row is minimal and the operation is a sequential insert, which on
whidbey requires more time to rearrange the RBTree and on everett it's just
sequential.

If you run it with random insert ids [1] instead of sequential, you'll
notice that whidbey is faster and behaves almost linearly, whereas everett
as the nb of rows increase becomes much slower. If you insert & delete data
from the table, the difference is even larger.

Thanks for taking the time to investigate dataset perf and report your
results, and if you find other scenarios please let us know.

--
--VV [MS]

[1]
    public class LFSR
    {
        uint _n;

        public LFSR(uint n)
        {
            _n = n;
        }

        public uint Next()
        {
            _n = _n >> 1 | (((_n ^ (_n >> 3)) & 1) << 30);

            return _n;
        }
    }


        static void RandomInsert()
        {
            DataTable dt = new DataTable("foo");
            DataColumn pkCol = new DataColumn("ID",
Type.GetType("System.Int32"));
            dt.Columns.Add(pkCol);
            dt.PrimaryKey = new DataColumn[] { pkCol };
            dt.Columns.Add("SomeNumber", Type.GetType("System.Int32"));

            LFSR _seq = new LFSR(1);

            int limit = 50000;
            int someNumber = limit;
            DateTime startTime = DateTime.Now;
            for (int i = 1; i <= limit; i++)
            {
                DataRow row = dt.NewRow();

                row["ID"] = _seq.Next();
                row["SomeNumber"] = someNumber--;
                dt.Rows.Add(row);
            }

            TimeSpan elapsedTime = DateTime.Now - startTime;
            Console.WriteLine(dt.Rows.Count.ToString() + " rows loaded in "
+ elapsedTime.TotalSeconds + " seconds.");

        }


Show quote
"Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message
news:%23JIzTRK4FHA.268@TK2MSFTNGP10.phx.gbl...
> Why would you say Process Explorer isn't a good indicator of overall
> process memory usage?  It shows an increase in Private and Working set
> bytes.
>
> If you read my entire email, I posted all the numbers from ".NET Memory
> Profiler v2.5" too (which is a .NET specific memory profiler tool), and
> internally the class usage shows a 2 fold increase in raw memory usage.
>
> Cheers,
>
> Stu
>
> "Miha Markic [MVP C#]" <miha at rthand com> wrote in message
> news:e5A8GkE4FHA.3684@TK2MSFTNGP10.phx.gbl...
>> Stuart,
>>
>> Process Explorer is really not a good tool to measure memory usage. You
>> should use one of the memory profilers out there instead.
>>
>> --
>> Miha Markic [MVP C#]
>> RightHand .NET consulting & development www.rthand.com
>> Blog: http://cs.rthand.com/blogs/blog_with_righthand/
>>
>> "Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message
>> news:%23RI5ri93FHA.3588@TK2MSFTNGP15.phx.gbl...
>>> Firstly, the index performance improvement is awesome, I've seen a 75x
>>> improvement in test cases.
>>>
>>> Using the RTM version of Whidbey and the code from November 2005 MSDN
>>> article
>>> (http://msdn.microsoft.com/msdnmag/issues/05/11/DataPoints/default.aspx),
>>> I ran the same tests in my environment with ADO.NET 1.1 and 2.0.  I'd
>>> like to raise as point that the memory usage is significantly higher
>>> (2.3x) than 2003 for loading the same data.
>>>
>>> Tested load of 1,000,000 rows using code from this article. Made two
>>> modifications, Unique = false (to speed up the ADO v1.1 load, since it
>>> takes 30 minutes), and a Console.ReadLine at the end.
>>>
>>> Results (using Process Explorer v9.25 for memory usage):
>>>
>>> .NET v1.1
>>>
>>> Time to load: 6.8s
>>> Mem Usage (Private, Working, Virtual) : 73,592K, 72,828K, 147,368K
>>>
>>> - - - - - - -
>>>
>>> .NET v2.0
>>>
>>> Time to load: 11.3s
>>> Mem usage: 168,220K, 161,264K, 241,792K
>>>
>>>
>>> When digging a little deeper using .NET Memory Profiler v2.5, I found
>>> these major differences:
>>>
>>> ADO.NET 1.1 (top 5 classes by Bytes):
>>>
>>> Class                  Total Instances    Total Bytes
>>> -----------------------------------------------------
>>> DataRow                500,000            20,000,000
>>> Int32[]                10,293              8,717,856
>>> Object[]               10,551              3,379,952
>>> DataRow[]              2                   3,145,760
>>> ArrayListEnumerator... 20,530                497,720
>>>                                          ----------
>>>                                          35,741,288
>>>
>>>
>>> ADO.NET 2.0 (top 5 classes by Total Bytes):
>>>
>>> Class                  Total Instances    Total Bytes
>>> -----------------------------------------------------
>>> DataRow                500,000            32,000,000
>>> RBTree<int>.Node[]     225                16,095,884
>>> RBTree<DataRow>.Node[] 225                16,095,884
>>> Int32[]                472                 4,457,268
>>> DataRow[]              2                   2,097,184
>>>                                          ----------
>>>                                          70,746,220
>>>
>>>
>>> The instance size of DataRow has increased by 60%
>>>
>>> Introduced 2 new objects, RBTree.  For the massive performance
>>> improvements, I'm sure these binary trees are necessary, and it appears
>>> they hold references to all the rows in the data set, as they are about
>>> 32 bytes in size for each instance of Node, and amount to a figure close
>>> enough to 500000 if you divide 16,095,884 by 32.
>>>
>>> Anyways, I just wanted to bring this up, as it could have an impact for
>>> some, if memory is tight.
>>>
>>> Cheers,
>>>
>>> Stuart
>>>
>>
>>
>
>
Author
4 Nov 2005 6:26 PM
Stuart Carnie
Thanks Vasco - I realise the example I gave is only one case, and admittedly
very contrived - I profile a lot of code, and generally create numerous
scenarios to prove out overall performance.  I figured if I posted 10 pages
of results, people wouldn't read it :)   I did post that index perf was
immensely faster first ;-)

Still, given this is worst case, we're sitting in a great position I feel.

I understand that RBTree's certainly get stressed the most with sequential
adds of key values, there one of my preferred data structures.

Anyway, great job with regards to the improved perf and again, my posting
was merely observations - as I'll take the perf over the mem usuage too.

Cheers,

Stu

Show quote
"Vasco Veiga [MS]" <vas***@online.microsoft.com> wrote in message
news:u2tKlnM4FHA.400@TK2MSFTNGP09.phx.gbl...
> The way we store data is now different and our structures take more space
> and "maintenance" than on everett. Those structures are needed so that we
> can provide better performance and scalability. We are looking for ways to
> improve mem consumption.
>
> The case in question is almost a worst case for whidbey because the data
> stored per row is minimal and the operation is a sequential insert, which
> on whidbey requires more time to rearrange the RBTree and on everett it's
> just sequential.
>
> If you run it with random insert ids [1] instead of sequential, you'll
> notice that whidbey is faster and behaves almost linearly, whereas everett
> as the nb of rows increase becomes much slower. If you insert & delete
> data from the table, the difference is even larger.
>
> Thanks for taking the time to investigate dataset perf and report your
> results, and if you find other scenarios please let us know.
>
> --
> --VV [MS]
>
> [1]
>    public class LFSR
>    {
>        uint _n;
>
>        public LFSR(uint n)
>        {
>            _n = n;
>        }
>
>        public uint Next()
>        {
>            _n = _n >> 1 | (((_n ^ (_n >> 3)) & 1) << 30);
>
>            return _n;
>        }
>    }
>
>
>        static void RandomInsert()
>        {
>            DataTable dt = new DataTable("foo");
>            DataColumn pkCol = new DataColumn("ID",
> Type.GetType("System.Int32"));
>            dt.Columns.Add(pkCol);
>            dt.PrimaryKey = new DataColumn[] { pkCol };
>            dt.Columns.Add("SomeNumber", Type.GetType("System.Int32"));
>
>            LFSR _seq = new LFSR(1);
>
>            int limit = 50000;
>            int someNumber = limit;
>            DateTime startTime = DateTime.Now;
>            for (int i = 1; i <= limit; i++)
>            {
>                DataRow row = dt.NewRow();
>
>                row["ID"] = _seq.Next();
>                row["SomeNumber"] = someNumber--;
>                dt.Rows.Add(row);
>            }
>
>            TimeSpan elapsedTime = DateTime.Now - startTime;
>            Console.WriteLine(dt.Rows.Count.ToString() + " rows loaded in "
> + elapsedTime.TotalSeconds + " seconds.");
>
>        }
>
>
> "Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message
> news:%23JIzTRK4FHA.268@TK2MSFTNGP10.phx.gbl...
>> Why would you say Process Explorer isn't a good indicator of overall
>> process memory usage?  It shows an increase in Private and Working set
>> bytes.
>>
>> If you read my entire email, I posted all the numbers from ".NET Memory
>> Profiler v2.5" too (which is a .NET specific memory profiler tool), and
>> internally the class usage shows a 2 fold increase in raw memory usage.
>>
>> Cheers,
>>
>> Stu
>>
>> "Miha Markic [MVP C#]" <miha at rthand com> wrote in message
>> news:e5A8GkE4FHA.3684@TK2MSFTNGP10.phx.gbl...
>>> Stuart,
>>>
>>> Process Explorer is really not a good tool to measure memory usage. You
>>> should use one of the memory profilers out there instead.
>>>
>>> --
>>> Miha Markic [MVP C#]
>>> RightHand .NET consulting & development www.rthand.com
>>> Blog: http://cs.rthand.com/blogs/blog_with_righthand/
>>>
>>> "Stuart Carnie" <stuart.carnie@nospam.nospam> wrote in message
>>> news:%23RI5ri93FHA.3588@TK2MSFTNGP15.phx.gbl...
>>>> Firstly, the index performance improvement is awesome, I've seen a 75x
>>>> improvement in test cases.
>>>>
>>>> Using the RTM version of Whidbey and the code from November 2005 MSDN
>>>> article
>>>> (http://msdn.microsoft.com/msdnmag/issues/05/11/DataPoints/default.aspx),
>>>> I ran the same tests in my environment with ADO.NET 1.1 and 2.0.  I'd
>>>> like to raise as point that the memory usage is significantly higher
>>>> (2.3x) than 2003 for loading the same data.
>>>>
>>>> Tested load of 1,000,000 rows using code from this article. Made two
>>>> modifications, Unique = false (to speed up the ADO v1.1 load, since it
>>>> takes 30 minutes), and a Console.ReadLine at the end.
>>>>
>>>> Results (using Process Explorer v9.25 for memory usage):
>>>>
>>>> .NET v1.1
>>>>
>>>> Time to load: 6.8s
>>>> Mem Usage (Private, Working, Virtual) : 73,592K, 72,828K, 147,368K
>>>>
>>>> - - - - - - -
>>>>
>>>> .NET v2.0
>>>>
>>>> Time to load: 11.3s
>>>> Mem usage: 168,220K, 161,264K, 241,792K
>>>>
>>>>
>>>> When digging a little deeper using .NET Memory Profiler v2.5, I found
>>>> these major differences:
>>>>
>>>> ADO.NET 1.1 (top 5 classes by Bytes):
>>>>
>>>> Class                  Total Instances    Total Bytes
>>>> -----------------------------------------------------
>>>> DataRow                500,000            20,000,000
>>>> Int32[]                10,293              8,717,856
>>>> Object[]               10,551              3,379,952
>>>> DataRow[]              2                   3,145,760
>>>> ArrayListEnumerator... 20,530                497,720
>>>>                                          ----------
>>>>                                          35,741,288
>>>>
>>>>
>>>> ADO.NET 2.0 (top 5 classes by Total Bytes):
>>>>
>>>> Class                  Total Instances    Total Bytes
>>>> -----------------------------------------------------
>>>> DataRow                500,000            32,000,000
>>>> RBTree<int>.Node[]     225                16,095,884
>>>> RBTree<DataRow>.Node[] 225                16,095,884
>>>> Int32[]                472                 4,457,268
>>>> DataRow[]              2                   2,097,184
>>>>                                          ----------
>>>>                                          70,746,220
>>>>
>>>>
>>>> The instance size of DataRow has increased by 60%
>>>>
>>>> Introduced 2 new objects, RBTree.  For the massive performance
>>>> improvements, I'm sure these binary trees are necessary, and it appears
>>>> they hold references to all the rows in the data set, as they are about
>>>> 32 bytes in size for each instance of Node, and amount to a figure
>>>> close enough to 500000 if you divide 16,095,884 by 32.
>>>>
>>>> Anyways, I just wanted to bring this up, as it could have an impact for
>>>> some, if memory is tight.
>>>>
>>>> Cheers,
>>>>
>>>> Stuart
>>>>
>>>
>>>
>>
>>
>
>

AddThis Social Bookmark Button