|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Data Feed architectureHi, we have some datafeeds which pull info from external sources.
Unfortunately, we have to use screen scraping as there are no XML feeds. The data feeds are located in a variety of different applications located on different servers. I have to design a new architecture, I have a fair idea of how I would do it but if anyone has any pointers to a good existing architecure design or *things not to do*, please post. TIA Markus =================== googlenews2006markusj The Microsoft Patterns & Practices website has some good guidelines for
architecture design: http://msdn.microsoft.com/practices/ Markus***@gmail.com wrote: Show quote > Hi, we have some datafeeds which pull info from external sources. > Unfortunately, we have to use screen scraping as there are no XML > feeds. The data feeds are located in a variety of different > applications located on different servers. I have to design a new > architecture, I have a fair idea of how I would do it but if anyone has > any pointers to a good existing architecure design or *things not to > do*, please post. > > TIA > Markus > =================== > googlenews2006markusj adapters, agents, and messageware.
I've done this a couple of times so far. I'll need to know more about the technologies you are working with to help more, though. What your environment look like? Do you have Biztalk or an ESB running yet? What time requirements do you have for the data? -- Show quote--- Nick Malik [Microsoft] MCSD, CFPS, Certified Scrummaster http://blogs.msdn.com/nickmalik Disclaimer: Opinions expressed in this forum are my own, and not representative of my employer. I do not answer questions on behalf of my employer. I'm just a programmer helping programmers. -- <Markus***@gmail.com> wrote in message news:1159693701.853017.178020@m73g2000cwd.googlegroups.com... > Hi, we have some datafeeds which pull info from external sources. > Unfortunately, we have to use screen scraping as there are no XML > feeds. The data feeds are located in a variety of different > applications located on different servers. I have to design a new > architecture, I have a fair idea of how I would do it but if anyone has > any pointers to a good existing architecure design or *things not to > do*, please post. > > TIA > Markus > =================== > googlenews2006markusj > Hi Nick, we do not have BizTalk and I'm not too sure what you mean by
ESB sorry. Basically we have a number of distributed applications on a variety of platforms (Classic ASP, .NET 1.1/2.0 and a Python Script). These applications are scheduled via a scheduling program to go away and "screen scrape" information at a specified time. All information is then logged into a centralized database so the data can be used at a later date. Database wise we are using MSSQL 2005 TIA Markus Nick Malik [Microsoft] wrote: Show quote > adapters, agents, and messageware. > > I've done this a couple of times so far. I'll need to know more about the > technologies you are working with to help more, though. > > What your environment look like? Do you have Biztalk or an ESB running yet? > What time requirements do you have for the data? > > -- > --- Nick Malik [Microsoft] > MCSD, CFPS, Certified Scrummaster > http://blogs.msdn.com/nickmalik > > Disclaimer: Opinions expressed in this forum are my own, and not > representative of my employer. > I do not answer questions on behalf of my employer. I'm just a > programmer helping programmers. > -- > <Markus***@gmail.com> wrote in message > news:1159693701.853017.178020@m73g2000cwd.googlegroups.com... > > Hi, we have some datafeeds which pull info from external sources. > > Unfortunately, we have to use screen scraping as there are no XML > > feeds. The data feeds are located in a variety of different > > applications located on different servers. I have to design a new > > architecture, I have a fair idea of how I would do it but if anyone has > > any pointers to a good existing architecure design or *things not to > > do*, please post. > > > > TIA > > Markus > > =================== > > googlenews2006markusj > > Hi Markus,
From an architectural perspective, you have applications that draw data using screen scraping. They interpret that data and store it in a database. Part of what I need to know: how up to date does the data need to be? Example: Contoso Marine Supply is a catalog provider of small parts and fittings for boaters. They have a Mainframe application, written in CICS, that is used to enter catalog orders that arrive via a mail processing center. At any time, the company employees can see the list of invoices that need to be sent to the customer via a CICS screen on an IBM 3270 terminal. If the system that prints and sends the invoices is on the Windows platform, then it makes sense that the data is pulled periodically (perhaps nightly?) and if a new invoice is found, then the necessary data is stored for printing. We could also say that we print invoices twice a week. In this scenario, the data needs to get to the Windows application twice a week. We pull the data more often, which adds a level of *reliability* (because if the mainframe or the windows server app are not running on Tuesday at midnight, you can still pull the data on Wednesday for Thursday's print run... this serves the reliable delivery of data). A different scenario may be if the Windows server application is a Partner Relationship Management system. In that case, the PRM system needs to know about the orders as soon as they are entered, because a salesman may be about to call on a particular supplier, and they need accurate and up-to-date information about the orders that are coming through for their parts. In this case, the time requirements would be pretty much 'as soon as humanly possible' (I like the term "near real time"). So I'm asking about the time requirements. You've got some of the picture... you have apps that pull data. Cool. What data do they pull and why do they pull it? That's pretty important info if I'm going to be helpful. ESB = Enterprise Services Bus. Please tell me what type of app you are screen scraping (CICS, UNIX, AS/400, what?). -- Show quote--- Nick Malik [Microsoft] MCSD, CFPS, Certified Scrummaster http://blogs.msdn.com/nickmalik Disclaimer: Opinions expressed in this forum are my own, and not representative of my employer. I do not answer questions on behalf of my employer. I'm just a programmer helping programmers. -- <Markus***@gmail.com> wrote in message news:1159742150.275935.35650@c28g2000cwb.googlegroups.com... > Hi Nick, we do not have BizTalk and I'm not too sure what you mean by > ESB sorry. > > Basically we have a number of distributed applications on a variety of > platforms (Classic ASP, .NET 1.1/2.0 and a Python Script). These > applications are scheduled via a scheduling program to go away and > "screen scrape" information at a specified time. > > All information is then logged into a centralized database so the data > can be used at a later date. > > Database wise we are using MSSQL 2005 > > TIA > Markus > > > Nick Malik [Microsoft] wrote: >> adapters, agents, and messageware. >> >> I've done this a couple of times so far. I'll need to know more about >> the >> technologies you are working with to help more, though. >> >> What your environment look like? Do you have Biztalk or an ESB running >> yet? >> What time requirements do you have for the data? >> >> -- >> --- Nick Malik [Microsoft] >> MCSD, CFPS, Certified Scrummaster >> http://blogs.msdn.com/nickmalik >> >> Disclaimer: Opinions expressed in this forum are my own, and not >> representative of my employer. >> I do not answer questions on behalf of my employer. I'm just a >> programmer helping programmers. >> -- >> <Markus***@gmail.com> wrote in message >> news:1159693701.853017.178020@m73g2000cwd.googlegroups.com... >> > Hi, we have some datafeeds which pull info from external sources. >> > Unfortunately, we have to use screen scraping as there are no XML >> > feeds. The data feeds are located in a variety of different >> > applications located on different servers. I have to design a new >> > architecture, I have a fair idea of how I would do it but if anyone has >> > any pointers to a good existing architecure design or *things not to >> > do*, please post. >> > >> > TIA >> > Markus >> > =================== >> > googlenews2006markusj >> > > Hi Nick, thanks for your help
Please see below Nick Malik [Microsoft] wrote: > Hi Markus, The import is done on a daily basis, so information only needs to be> > From an architectural perspective, you have applications that draw data > using screen scraping. They interpret that data and store it in a database. > Part of what I need to know: how up to date does the data need to be? updated once a day from the existing data sources. Reports etc are viewed against this information all day long from many different sources (Web pages, applications etc) Show quote > It's just a external website. We just parse the HTML, retrieve the> Example: > Contoso Marine Supply is a catalog provider of small parts and fittings for > boaters. They have a Mainframe application, written in CICS, that is used > to enter catalog orders that arrive via a mail processing center. > > At any time, the company employees can see the list of invoices that need to > be sent to the customer via a CICS screen on an IBM 3270 terminal. > > If the system that prints and sends the invoices is on the Windows platform, > then it makes sense that the data is pulled periodically (perhaps nightly?) > and if a new invoice is found, then the necessary data is stored for > printing. We could also say that we print invoices twice a week. > > In this scenario, the data needs to get to the Windows application twice a > week. We pull the data more often, which adds a level of *reliability* > (because if the mainframe or the windows server app are not running on > Tuesday at midnight, you can still pull the data on Wednesday for Thursday's > print run... this serves the reliable delivery of data). > > A different scenario may be if the Windows server application is a Partner > Relationship Management system. In that case, the PRM system needs to know > about the orders as soon as they are entered, because a salesman may be > about to call on a particular supplier, and they need accurate and > up-to-date information about the orders that are coming through for their > parts. In this case, the time requirements would be pretty much 'as soon as > humanly possible' (I like the term "near real time"). > > So I'm asking about the time requirements. You've got some of the > picture... you have apps that pull data. Cool. What data do they pull and > why do they pull it? That's pretty important info if I'm going to be > helpful. > > ESB = Enterprise Services Bus. > > Please tell me what type of app you are screen scraping (CICS, UNIX, AS/400, > what?). information we need and update the database. Show quote > > > -- > --- Nick Malik [Microsoft] > MCSD, CFPS, Certified Scrummaster > http://blogs.msdn.com/nickmalik > > Disclaimer: Opinions expressed in this forum are my own, and not > representative of my employer. > I do not answer questions on behalf of my employer. I'm just a > programmer helping programmers. > -- > <Markus***@gmail.com> wrote in message > news:1159742150.275935.35650@c28g2000cwb.googlegroups.com... > > Hi Nick, we do not have BizTalk and I'm not too sure what you mean by > > ESB sorry. > > > > Basically we have a number of distributed applications on a variety of > > platforms (Classic ASP, .NET 1.1/2.0 and a Python Script). These > > applications are scheduled via a scheduling program to go away and > > "screen scrape" information at a specified time. > > > > All information is then logged into a centralized database so the data > > can be used at a later date. > > > > Database wise we are using MSSQL 2005 > > > > TIA > > Markus > > > > > > Nick Malik [Microsoft] wrote: > >> adapters, agents, and messageware. > >> > >> I've done this a couple of times so far. I'll need to know more about > >> the > >> technologies you are working with to help more, though. > >> > >> What your environment look like? Do you have Biztalk or an ESB running > >> yet? > >> What time requirements do you have for the data? > >> > >> -- > >> --- Nick Malik [Microsoft] > >> MCSD, CFPS, Certified Scrummaster > >> http://blogs.msdn.com/nickmalik > >> > >> Disclaimer: Opinions expressed in this forum are my own, and not > >> representative of my employer. > >> I do not answer questions on behalf of my employer. I'm just a > >> programmer helping programmers. > >> -- > >> <Markus***@gmail.com> wrote in message > >> news:1159693701.853017.178020@m73g2000cwd.googlegroups.com... > >> > Hi, we have some datafeeds which pull info from external sources. > >> > Unfortunately, we have to use screen scraping as there are no XML > >> > feeds. The data feeds are located in a variety of different > >> > applications located on different servers. I have to design a new > >> > architecture, I have a fair idea of how I would do it but if anyone has > >> > any pointers to a good existing architecure design or *things not to > >> > do*, please post. > >> > > >> > TIA > >> > Markus > >> > =================== > >> > googlenews2006markusj > >> > > > <Markus***@gmail.com> wrote in message
Show quote news:1159756708.209852.235000@k70g2000cwa.googlegroups.com... My prior responses were overkill.> Hi Nick, thanks for your help > Please see below > > The import is done on a daily basis, so information only needs to be > updated once a day from the existing data sources. Reports etc are > viewed against this information all day long from many different > sources (Web pages, applications etc) > >> >> Please tell me what type of app you are screen scraping (CICS, UNIX, >> AS/400, >> what?). > > It's just a external website. We just parse the HTML, retrieve the > information we need and update the database. > For your architecture, I would suggest that you create an app with two basic abilities: 1. the ability to specify as many target data pages as you want in an XML file. That way, if you want to expand the list of pages you want to pull data from, or if the information provider decides to break the information up onto multiple pages, you can adapt quickly. 2. the ability to define what data you want from your target page, and how to find it on the target page, using an XML description. That way, when the target page changes in formatting or coding, you don't have to change your C# code to allow you to get your data again. I would suggest that you run your app as a service that runs nightly. I notice that you posted your question to the ASP.Net newsgroup, so it is possible that you are familiar only with creating web apps. Writing a service is different, but not terribly difficult. Suggestion: Create a command line utility that will do the work of pulling the data. Then either write a service to call your command line utility, or simply schedule your command line utility with the scheduling service in Windows. That makes it easier to write and debug your code. Keep in mind that your app needs to run without calling a user interface of any kind. No input from console, no output to console (except debugging messages). Using a service will make it much easier to reliably get the data you want, and you can change the frequency by which you pull data by simply changing the scheduler or your service code. Hope this helps. -- --- Nick Malik [Microsoft] MCSD, CFPS, Certified Scrummaster http://blogs.msdn.com/nickmalik Disclaimer: Opinions expressed in this forum are my own, and not representative of my employer. I do not answer questions on behalf of my employer. I'm just a programmer helping programmers. -- Thanks for your help Nick
Regards Markus Nick Malik [Microsoft] wrote: Show quote > <Markus***@gmail.com> wrote in message > news:1159756708.209852.235000@k70g2000cwa.googlegroups.com... > > Hi Nick, thanks for your help > > Please see below > > > > The import is done on a daily basis, so information only needs to be > > updated once a day from the existing data sources. Reports etc are > > viewed against this information all day long from many different > > sources (Web pages, applications etc) > > > >> > >> Please tell me what type of app you are screen scraping (CICS, UNIX, > >> AS/400, > >> what?). > > > > It's just a external website. We just parse the HTML, retrieve the > > information we need and update the database. > > > > My prior responses were overkill. > > For your architecture, I would suggest that you create an app with two basic > abilities: > 1. the ability to specify as many target data pages as you want in an XML > file. That way, if you want to expand the list of pages you want to pull > data from, or if the information provider decides to break the information > up onto multiple pages, you can adapt quickly. > > 2. the ability to define what data you want from your target page, and how > to find it on the target page, using an XML description. That way, when > the target page changes in formatting or coding, you don't have to change > your C# code to allow you to get your data again. > > > I would suggest that you run your app as a service that runs nightly. I > notice that you posted your question to the ASP.Net newsgroup, so it is > possible that you are familiar only with creating web apps. Writing a > service is different, but not terribly difficult. Suggestion: Create a > command line utility that will do the work of pulling the data. Then either > write a service to call your command line utility, or simply schedule your > command line utility with the scheduling service in Windows. That makes it > easier to write and debug your code. Keep in mind that your app needs to > run without calling a user interface of any kind. No input from console, no > output to console (except debugging messages). > > Using a service will make it much easier to reliably get the data you want, > and you can change the frequency by which you pull data by simply changing > the scheduler or your service code. > > Hope this helps. > > -- > --- Nick Malik [Microsoft] > MCSD, CFPS, Certified Scrummaster > http://blogs.msdn.com/nickmalik > > Disclaimer: Opinions expressed in this forum are my own, and not > representative of my employer. > I do not answer questions on behalf of my employer. I'm just a > programmer helping programmers. > -- |
|||||||||||||||||||||||