Home All Groups Group Topic Archive Search About

Spltting large datasets for parallel processing

Author
20 Apr 2006 3:38 PM
Andy Furnival
Hi people,

I have a problem, where I'm trying to find a fast effective means of
continually polling a database for messages to process which are grouped
based on content and then processing these groups in multiple threads and
avoid any waiting. 

The processing is essentially, pick up all messages from the database, split
into batches and post via http call to and end service that will deal with
the messages. Each post will return a unique result code, which I will then
store against all messages in that batch for tracking purposes.

The recorsets returned from the database at anyone time can be from 1 to
1,000,000 messages and processing at a maximum in batches of 250.

My problem is that I dont want to update a recordset everytime I load new
data to give it a processing state so it will not be loaded more than once,
I've tried this before and can be a lengthy process, also if my service
terminated I have then lost which actual messages have been processed or not.

Thanks for any help

Author
20 Apr 2006 4:27 PM
William (Bill) Vaughn
I would investigate Notification services or SQL Dependency.

--
____________________________________
William (Bill) Vaughn
Author, Mentor, Consultant
Microsoft MVP
INETA Speaker
www.betav.com/blog/billva
www.betav.com
Please reply only to the newsgroup so that others can benefit.
This posting is provided "AS IS" with no warranties, and confers no rights.
__________________________________

Show quote
"Andy Furnival" <AndyFurni***@discussions.microsoft.com> wrote in message
news:D9BD77F1-FB08-4E0E-AEBF-DAB95266EB71@microsoft.com...
> Hi people,
>
> I have a problem, where I'm trying to find a fast effective means of
> continually polling a database for messages to process which are grouped
> based on content and then processing these groups in multiple threads and
> avoid any waiting.
>
> The processing is essentially, pick up all messages from the database,
> split
> into batches and post via http call to and end service that will deal with
> the messages. Each post will return a unique result code, which I will
> then
> store against all messages in that batch for tracking purposes.
>
> The recorsets returned from the database at anyone time can be from 1 to
> 1,000,000 messages and processing at a maximum in batches of 250.
>
> My problem is that I dont want to update a recordset everytime I load new
> data to give it a processing state so it will not be loaded more than
> once,
> I've tried this before and can be a lengthy process, also if my service
> terminated I have then lost which actual messages have been processed or
> not.
>
> Thanks for any help
>
Author
20 Apr 2006 4:46 PM
Andy Furnival
Thanks for that, unfortinatly I'm using SQL Server 2000 Enterprise which
doesn't support Notification Services.  Also, SQL Dependency is .Net 2 which
also I'm not using yet.  I'm reduced to finding a suitable custom soution for
my problem

Thanks again..

Andy

Show quote
"William (Bill) Vaughn" wrote:

> I would investigate Notification services or SQL Dependency.
>
> --
> ____________________________________
> William (Bill) Vaughn
> Author, Mentor, Consultant
> Microsoft MVP
> INETA Speaker
> www.betav.com/blog/billva
> www.betav.com
> Please reply only to the newsgroup so that others can benefit.
> This posting is provided "AS IS" with no warranties, and confers no rights.
> __________________________________
>
> "Andy Furnival" <AndyFurni***@discussions.microsoft.com> wrote in message
> news:D9BD77F1-FB08-4E0E-AEBF-DAB95266EB71@microsoft.com...
> > Hi people,
> >
> > I have a problem, where I'm trying to find a fast effective means of
> > continually polling a database for messages to process which are grouped
> > based on content and then processing these groups in multiple threads and
> > avoid any waiting.
> >
> > The processing is essentially, pick up all messages from the database,
> > split
> > into batches and post via http call to and end service that will deal with
> > the messages. Each post will return a unique result code, which I will
> > then
> > store against all messages in that batch for tracking purposes.
> >
> > The recorsets returned from the database at anyone time can be from 1 to
> > 1,000,000 messages and processing at a maximum in batches of 250.
> >
> > My problem is that I dont want to update a recordset everytime I load new
> > data to give it a processing state so it will not be loaded more than
> > once,
> > I've tried this before and can be a lengthy process, also if my service
> > terminated I have then lost which actual messages have been processed or
> > not.
> >
> > Thanks for any help
> >
>
>
>
Author
20 Apr 2006 6:51 PM
Cor Ligthert [MVP]
Andy,

Do you know that a data process is as fast as the smallest pipe it has to
pass.

As you have one server, one line, one client computer, than you can make 20
treads. The effect will probably be that the process takes 5 times longer
because all the processing around the threads.

Cor

Show quote
"Andy Furnival" <AndyFurni***@discussions.microsoft.com> schreef in bericht
news:D9BD77F1-FB08-4E0E-AEBF-DAB95266EB71@microsoft.com...
> Hi people,
>
> I have a problem, where I'm trying to find a fast effective means of
> continually polling a database for messages to process which are grouped
> based on content and then processing these groups in multiple threads and
> avoid any waiting.
>
> The processing is essentially, pick up all messages from the database,
> split
> into batches and post via http call to and end service that will deal with
> the messages. Each post will return a unique result code, which I will
> then
> store against all messages in that batch for tracking purposes.
>
> The recorsets returned from the database at anyone time can be from 1 to
> 1,000,000 messages and processing at a maximum in batches of 250.
>
> My problem is that I dont want to update a recordset everytime I load new
> data to give it a processing state so it will not be loaded more than
> once,
> I've tried this before and can be a lengthy process, also if my service
> terminated I have then lost which actual messages have been processed or
> not.
>
> Thanks for any help
>

AddThis Social Bookmark Button