|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
High volume data access architectureWe currently have a very high volume ASP.NET application. The web server is processing anywhere between 500-750 web hits per second. These hits include webservices and .aspx pages. For performance reasons, we cache most everything in a bunch of in-memory object caches. We are now looking to scale out to multiple servers, so need to start pushing some of the data to a database (for reasons I can't get into, use of a persisted session object is not an option.. it is not session information that will need to be shared across servers). Anyway, the model we are moving to is as follows: 1. We are going to start storing a number of core data in the database, primarily in 3 tables. 2. These tables are going to be updated directly via our datalayer/ADO.NET calls. 3. We are going to keep an in-memory snapshot of each of these tables on each server. 4. We are going to have a background thread update this snapshot every 10 seconds. (the server app can actually survive without real-time data, near-time 10 second works fine). 5. Because these tables can hold upwards of 10,000 rows, the update thread is going to be doing optimized updates that query only for new data since the last update, then merge that information with the in-memory table. Then, every 60 seconds, we are going to do a refresh of the full table and reload the entire thing. Again, this process works fine for us. So, here's the question and the rub: Updates to the data aren't going to be a problem. The application will never update the in memory table directly. However, my concern is in the reads. While we don't do a lot of updates, the servers are going to be accessing these in memory tables pretty much on every one of the 750/calls per second. What I need some advice on, is what are the best practices for retrieving information from the table to manage concurrency, and assure that rows dont get deleted or updated while the application tries to read information from these in-memory tables. Should we create an interim layer that lets the app query the in-mem table and return a DataRow? should it be a copy of the DataRow or the original? should it return variable data (not in a DataRow format) so we don't have to worry about access? What portions of the table access, searching, DataRow information gathering should we place lock() around? do we have to lock the entire table? etc. etc. Any suggestions would be incredibly helpful. thanks, Jasen
Show quote
"Jasen" <jasen_f***@hotmail.com> wrote in message Instead of updating the existing dataset, copy it (or create a new one), news:96F9DC77-8EE5-4C32-AB9F-C50FFCA17B6F@microsoft.com... > Hi, > > We currently have a very high volume ASP.NET application. The web server > is > processing anywhere between 500-750 web hits per second. These hits > include > webservices and .aspx pages. For performance reasons, we cache most > everything in a bunch of in-memory object caches. We are now looking to > scale out to multiple servers, so need to start pushing some of the data > to a > database (for reasons I can't get into, use of a persisted session object > is > not an option.. it is not session information that will need to be shared > across servers). Anyway, the model we are moving to is as follows: > > 1. We are going to start storing a number of core data in the database, > primarily in 3 tables. > > 2. These tables are going to be updated directly via our datalayer/ADO.NET > calls. > > 3. We are going to keep an in-memory snapshot of each of these tables on > each server. > > 4. We are going to have a background thread update this snapshot every 10 > seconds. (the server app can actually survive without real-time data, > near-time 10 second works fine). > > 5. Because these tables can hold upwards of 10,000 rows, the update > thread > is going to be doing optimized updates that query only for new data since > the > last update, then merge that information with the in-memory table. Then, > every 60 seconds, we are going to do a refresh of the full table and > reload > the entire thing. Again, this process works fine for us. > > So, here's the question and the rub: Updates to the data aren't going to > be > a problem. The application will never update the in memory table > directly. > However, my concern is in the reads. While we don't do a lot of updates, > the > servers are going to be accessing these in memory tables pretty much on > every > one of the 750/calls per second. What I need some advice on, is what are > the best practices for retrieving information from the table to manage > concurrency, and assure that rows dont get deleted or updated while the > application tries to read information from these in-memory tables. > > Should we create an interim layer that lets the app query the in-mem table > and return a DataRow? should it be a copy of the DataRow or the original? > should it return variable data (not in a DataRow format) so we don't have > to > worry about access? What portions of the table access, searching, DataRow > information gathering should we place lock() around? do we have to lock > the > entire table? etc. etc. > > Any suggestions would be incredibly helpful. > update that, and then switch it out for the one the clients are reading. That way sessions continue to have read access to the old data until the new data is ready, and you cut over with a simple, atomic variable assignment. David David Jasen wrote:
Show quote > Hi, Why are you caching at such a low level? Isn't it more efficient to> > We currently have a very high volume ASP.NET application. The web > server is processing anywhere between 500-750 web hits per second. > These hits include webservices and .aspx pages. For performance > reasons, we cache most everything in a bunch of in-memory object > caches. We are now looking to scale out to multiple servers, so need > to start pushing some of the data to a database (for reasons I can't > get into, use of a persisted session object is not an option.. it is > not session information that will need to be shared across servers). > Anyway, the model we are moving to is as follows: > > 1. We are going to start storing a number of core data in the > database, primarily in 3 tables. > > 2. These tables are going to be updated directly via our > datalayer/ADO.NET calls. > > 3. We are going to keep an in-memory snapshot of each of these tables > on each server. > > 4. We are going to have a background thread update this snapshot > every 10 seconds. (the server app can actually survive without > real-time data, near-time 10 second works fine). > > 5. Because these tables can hold upwards of 10,000 rows, the update > thread is going to be doing optimized updates that query only for new > data since the last update, then merge that information with the > in-memory table. Then, every 60 seconds, we are going to do a > refresh of the full table and reload the entire thing. Again, this > process works fine for us. > > So, here's the question and the rub: Updates to the data aren't > going to be a problem. The application will never update the in > memory table directly. However, my concern is in the reads. While > we don't do a lot of updates, the servers are going to be accessing > these in memory tables pretty much on every one of the 750/calls per > second. What I need some advice on, is what are the best practices > for retrieving information from the table to manage concurrency, and > assure that rows dont get deleted or updated while the application > tries to read information from these in-memory tables. > > Should we create an interim layer that lets the app query the in-mem > table and return a DataRow? should it be a copy of the DataRow or the > original? should it return variable data (not in a DataRow format) > so we don't have to worry about access? What portions of the table > access, searching, DataRow information gathering should we place > lock() around? do we have to lock the entire table? etc. etc. cache at a more higher level? Take for example this approach: Say your hardware/software can render the whole site in 3 seconds. So if you render it every 4 seconds, and cache everything, it will be responsive no matter what. The cached results are served to the visitors, the rendering is done at a scheduled interval. If the site can't keep up, you enlarge this interval, say to every 5 or 10 seconds. If you do this modularly, so you cache on elements inside a page, you can also decide which elements to render every 10 seconds or which elements to render every minute. The reason why this is way more efficient is that you also save the data processing time with caching the end-result. This is for example how high profile sites like slashdot work (partly). The main issue you'll run into with caching data in a multi-server setup is that the cache intervals have to be running the same otherwise you can have different values for the same entity in different cached sets on different servers: when you cache data in the middle-tier, you effectively make the middle-tier the habitat of the application state, which is a bit cumbersome as the application state is scattered across multiple systems. Most websites are reading a lot and writing rarely. This means that if you have such a website, caching the processed data in the form of rendered elements, is much more efficient than any caching scenario you will use because you won't perform processing actions on out-of-sync data and you'll save the processing time when processing the same data over and over again, as that will result in the same rendered output anyway. FB -- ------------------------------------------------------------------------ Lead developer of LLBLGen Pro, the productive O/R mapper for .NET LLBLGen Pro website: http://www.llblgen.com My .NET blog: http://weblogs.asp.net/fbouma Microsoft MVP (C#) ------------------------------------------------------------------------ |
|||||||||||||||||||||||