|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Document ManagmentThe app is a medical claims case management system. It is written in VB.NET and it works great. It is a Winforms app primarily; there is a web component, but that is not relevant to this discussion. A new requirement has recently surfaced which would require the managment of a large repository of document files. This repository contains several 100,000 files of various types and sizes. The files arrive by a variety of means, such as scanned documents, fax images, file uploads, etc. Two file types are TIF image files and PDF's, both of which have files which can become 200MB or more in size. Most files are considerably smaller, though. Desktop users have the need to view, print, edit (e.g Word docs) or otherwise access one or more files from this repository. They will also add files to it. For security and audit reasons, the access to these files must be tightly controlled. All operations are logged. For certain files, editing of their contents is allowed. A user must "check out" the file for editting and then check it back in when they are done. The modified file will be added to the repository as a new version. While a file is checked out, no other user may check out or otherwise access the file, though they may view prior versions. Thus, there is a need to be able to efficiently transfer files both to and from a user's desktop. These transfers would be mediated, presumably, by some kind of file server/service which would authenticate the user, validate the operation being performed, create any log data, and transfer the file to/from the appropriate directory on the server. These requirements seem to suggest that business objects running on the client will cooperate with objects on a server somewhere to record the information as appropriate as well as effect the transfer of the files themselves. We discarded the idea of making the files available via direct access to a network share, since that would violate the security requirements - we can't have users messing around, outside of the app, in the repository directory tree. Though I think that would be by far the simplest (and fastest?) approach, we cannot allow direct access to the files; all access must be monitored and controlled. Indeed the users aren't really aware that there are files at all - they deal with cases and the case's supporting documents. They don't know or care what the filenames are. We have toyed with the idea of using a Web Service for this. The idea is that web service methods could be called with appropriate arguments for authentication as well as the operation being performed. For the file involved, a byte array by reference could be used as a argument to the service call. The byte array would "be" the file, and it would then be written as a temporary file on the user's local machine, or in the case of an upload by a user to the server, written to the appropriate server directory. We have developed some proof of concept code and it seems quite straightforward. But, the problem with this approach is, I think, the large files. While there aren't many of them, there are enough of them to force us to deal with them. Using Web Services means the byte array is serialized into an xml stream, increasing the size by, what, 50%? That is a significant overhead. Also, that would mean that the web site running the service would require that 200mb byte array to be resident in memory while being serialized and transferred, and if we had more than a few users doing that I suspect the web server would be overwhelmed. Indeed, in some of our tests we have had "Insufficient Resource" errors on the server when using a Binary Reader to load a large file into a byte array in preparation for returning that array to a caller. Does anyone have any thoughts on how to do this? Perhaps some sort of custom remoting to transfer the file? If the remoting were hosted in ISS (like the dataportal), then wouldn't the same resource problems exist with the large files? I saw an article somewhere (in the MS KB?) that showed how to write a service which would host the remote object, but isn't there still a problem with transferring 200MB in one big chunk? How would breaking a file into smaller chunks work using single-call remoting and how would that file be reassembled on the user's system? Or maybe somebody has an idea for some other approach entirely? Thanks for any help or insight anyone can offer. - jeff -- Jeff Jeff Mason <je.ma***@comcast.net> wrote in
Show quote news:jundf29c3qr4jdtflha21e9ttkeapos5lt@4ax.com: A simple solution to your problem would be to chunk the file - send the > But, the problem with this approach is, I think, the large files. > While there aren't many of them, there are enough of them to force us > to deal with them. Using Web Services means the byte array is > serialized into an xml stream, increasing the size by, what, 50%? > That is a significant overhead. Also, that would mean that the web > site running the service would require that 200mb byte array to be > resident in memory while being serialized and transferred, and if we > had more than a few users doing that I suspect the web server would be > overwhelmed. Indeed, in some of our tests we have had "Insufficient > Resource" errors on the server when using a Binary Reader to load a > large file into a byte array in preparation for returning that array > to a caller. file in 10 - 20MB increments. This will allow you to restart failed transfer too. Otherwise, with Microsoft's WSE I think you can do direct data transfer using WS-DIME: http://msdn.microsoft.com/msdnmag/issues/02/12/DIME/default.aspx Uploading large attachments with DIME: http://www.aspnetworld.com/articles/2004110301.aspx
Show quote
"Jeff Mason" <je.ma***@comcast.net> wrote in message WSS is part of thw Windows Server 2003 OS, and it does everything you news:jundf29c3qr4jdtflha21e9ttkeapos5lt@4ax.com... >I hope I can get some advice on a design/technology question here. > > The app is a medical claims case management system. It is written in > VB.NET and it > works great. It is a Winforms app primarily; there is a web component, > but that is > not relevant to this discussion. > .. . . > Does anyone have any thoughts on how to do this? Perhaps some sort of > custom > remoting to transfer the file? If the remoting were hosted in ISS (like > the > dataportal), then wouldn't the same resource problems exist with the large > files? I > saw an article somewhere (in the MS KB?) that showed how to write a > service which > would host the remote object, but isn't there still a problem with > transferring 200MB > in one big chunk? How would breaking a file into smaller chunks work using > single-call remoting and how would that file be reassembled on the user's > system? > > Or maybe somebody has an idea for some other approach entirely? > > Thanks for any help or insight anyone can offer. > described. Windows Sharepoint Services http://www.microsoft.com/technet/windowsserver/sharepoint/V2/default.mspx V3 is late in beta now. http://www.microsoft.com/office/preview/technologies/sharepointtechnology/highlights.mspx David You have posed a technical problem. There are two solutions:
1) chunk the data. I discussed this idea in detail (gory detail) on my blog quite some time ago. http://blogs.msdn.com/nickmalik/archive/2004/11/01/250883.aspx 2) don't solve the problem with web services that you write, but rather by using packaged software. There are literally dozens of applications that will handle document management for you, especially using the complex and varied data management requirements you describe. A very good backgrounder on this topic can be found at: http://en.wikipedia.org/wiki/Content_management_system (incomplete and not well formatted) Lists of products http://en.wikipedia.org/wiki/Comparison_of_content_management_systems I suggest that you look into a couple of different products: Windows SharePoint Server: http://support.microsoft.com/default.aspx?scid=kb;EN-US;830320 Documentum http://software.emc.com/products/content_management/content_management.htm There are some open source products in this space as well. I haven't used any of them and cannot comment on their capabilities, but some are well-liked, like Alfresco. Good luck -- Show quote--- Nick Malik [Microsoft] MCSD, CFPS, Certified Scrummaster http://blogs.msdn.com/nickmalik Disclaimer: Opinions expressed in this forum are my own, and not representative of my employer. I do not answer questions on behalf of my employer. I'm just a programmer helping programmers. -- "Jeff Mason" <je.ma***@comcast.net> wrote in message news:jundf29c3qr4jdtflha21e9ttkeapos5lt@4ax.com... >I hope I can get some advice on a design/technology question here. > > The app is a medical claims case management system. It is written in > VB.NET and it > works great. It is a Winforms app primarily; there is a web component, > but that is > not relevant to this discussion. > > A new requirement has recently surfaced which would require the managment > of a large > repository of document files. This repository contains several 100,000 > files of > various types and sizes. The files arrive by a variety of means, such as > scanned > documents, fax images, file uploads, etc. Two file types are TIF image > files and > PDF's, both of which have files which can become 200MB or more in size. > Most files > are considerably smaller, though. > > Desktop users have the need to view, print, edit (e.g Word docs) or > otherwise access > one or more files from this repository. They will also add files to it. > For > security and audit reasons, the access to these files must be tightly > controlled. All > operations are logged. For certain files, editing of their contents is > allowed. A > user must "check out" the file for editting and then check it back in when > they are > done. The modified file will be added to the repository as a new version. > While a > file is checked out, no other user may check out or otherwise access the > file, though > they may view prior versions. > > Thus, there is a need to be able to efficiently transfer files both to and > from a > user's desktop. These transfers would be mediated, presumably, by some > kind of file > server/service which would authenticate the user, validate the operation > being > performed, create any log data, and transfer the file to/from the > appropriate > directory on the server. These requirements seem to suggest that business > objects > running on the client will cooperate with objects on a server somewhere to > record the > information as appropriate as well as effect the transfer of the files > themselves. > > We discarded the idea of making the files available via direct access to a > network > share, since that would violate the security requirements - we can't have > users > messing around, outside of the app, in the repository directory tree. > Though I think > that would be by far the simplest (and fastest?) approach, we cannot allow > direct > access to the files; all access must be monitored and controlled. Indeed > the users > aren't really aware that there are files at all - they deal with cases and > the case's > supporting documents. They don't know or care what the filenames are. > > We have toyed with the idea of using a Web Service for this. The idea is > that web > service methods could be called with appropriate arguments for > authentication as well > as the operation being performed. For the file involved, a byte array by > reference > could be used as a argument to the service call. The byte array would > "be" the file, > and it would then be written as a temporary file on the user's local > machine, or in > the case of an upload by a user to the server, written to the appropriate > server > directory. > > We have developed some proof of concept code and it seems quite > straightforward. > > But, the problem with this approach is, I think, the large files. While > there aren't > many of them, there are enough of them to force us to deal with them. > Using Web > Services means the byte array is serialized into an xml stream, increasing > the size > by, what, 50%? That is a significant overhead. Also, that would mean that > the web > site running the service would require that 200mb byte array to be > resident in memory > while being serialized and transferred, and if we had more than a few > users doing > that I suspect the web server would be overwhelmed. Indeed, in some of > our tests we > have had "Insufficient Resource" errors on the server when using a Binary > Reader to > load a large file into a byte array in preparation for returning that > array to a > caller. > > Does anyone have any thoughts on how to do this? Perhaps some sort of > custom > remoting to transfer the file? If the remoting were hosted in ISS (like > the > dataportal), then wouldn't the same resource problems exist with the large > files? I > saw an article somewhere (in the MS KB?) that showed how to write a > service which > would host the remote object, but isn't there still a problem with > transferring 200MB > in one big chunk? How would breaking a file into smaller chunks work using > single-call remoting and how would that file be reassembled on the user's > system? > > Or maybe somebody has an idea for some other approach entirely? > > Thanks for any help or insight anyone can offer. > > - jeff > > -- Jeff |
|||||||||||||||||||||||