|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
[OT?] download Wikipedia....I would like to use some of the data in the Wikipedia in one of my (.NET!)
program. I'm still at the stage of trying to figure out how to download the data. Any tip on: - how to download WikiPedia data? - how to use the data once downloaded? Do you mean scrapping the wikipedia webpages? If that is the case,
then you want to take a look at the System.Net namespace; in particular the WebClient class or the HttpWebRequest class for download the content; then use a parser to extra the data from the webpage content. Does that help? Alex Nono...
I found the URL, you could download the wikipedia's books at: http://download.wikimedia.org/ Now I am the stage, trying to figure out what to do with this 136MB long XML file. Obviously basic XML tool which simply load it in memory are innapropriate.... Show quoteHide quote "Alex Li" <likw***@gmail.com> wrote in message news:1147581137.948357.286610@y43g2000cwc.googlegroups.com... > Do you mean scrapping the wikipedia webpages? If that is the case, > then you want to take a look at the System.Net namespace; in particular > the WebClient class or the HttpWebRequest class for download the > content; then use a parser to extra the data from the webpage content. > > Does that help? > Alex > Hi Lloyd,
> Now I am the stage, trying to figure out what to do with this 136MB long What is that you are trying to achieve will define a lot of things.> XML file. > Obviously basic XML tool which simply load it in memory are > innapropriate.... May be you need to upgrade to 2GB machine to load all XML in memory May be you need serial access (XMLTextReader) and can do with only 512MB of RAM. >> Now I am the stage, trying to figure out what to do with this 136MB long I try something very simple with XMLTextReader:>> XML file. >> Obviously basic XML tool which simply load it in memory are >> innapropriate.... > > What is that you are trying to achieve will define a lot of things. > May be you need to upgrade to 2GB machine to load all XML in memory > May be you need serial access (XMLTextReader) and can do with only > 512MB of RAM. > XmlTextReader xml = new XmlTextReader("theBigFile.xml"); while(!xml.EOF) xml.Skip(); it tooks ages..... so I am kind of dubious I could use for anything usefull.... but that's kind of suprising as I found some other WikiPedia tool which didn't seem to ave any trouble.. mhh.... > I try something very simple with XMLTextReader: Probably that demonstrates the difference between efficient and inefficient > XmlTextReader xml = new XmlTextReader("theBigFile.xml"); > while(!xml.EOF) > xml.Skip(); > > it tooks ages..... parsing logic...? Hu.. doesn't demonstrate much to me.
Anyway, interestingly This: === XmlTextReader xml = new XmlTextReader("theBigFile.xml"); xml.ReadStartElement(); <<== new while(!xml.EOF) xml.Skip(); === works much better!... Show quoteHide quote "Gaurav Vaish (EduJini.IN)" <gaurav.vaish.nospam@nospam.gmail.com> wrote in message news:%23ulewSWeGHA.2188@TK2MSFTNGP04.phx.gbl... >> I try something very simple with XMLTextReader: >> XmlTextReader xml = new XmlTextReader("theBigFile.xml"); >> while(!xml.EOF) >> xml.Skip(); >> >> it tooks ages..... > > Probably that demonstrates the difference between efficient and > inefficient parsing logic...? > > > > -- > Cheers, > Gaurav Vaish > http://www.mastergaurav.org > http://www.edujini.in > ------------------- > > Ha ha ha ha.
That tells me that we should be given access to the source code of the application to check and report the code that result in these issues. Let me also try out.. should be interesting to work with :D Show quoteHide quote "Lloyd Dupont" <net.galador@ld> wrote in message news:%23T59ouWeGHA.3556@TK2MSFTNGP02.phx.gbl... > Hu.. doesn't demonstrate much to me. > Anyway, interestingly This: > === > XmlTextReader xml = new XmlTextReader("theBigFile.xml"); > xml.ReadStartElement(); <<== new > while(!xml.EOF) > xml.Skip(); > === > works much better!... >
Other interesting topics
Webexception Error FTP 501
DataGridViewTextBoxCell and binding PropertyInfo.GetValues(...) with Generics / Reflection Installing a Windows Service [OT] Windows User Accout Issue...... Issues running the .NET v2.0.50727 x64 on Windows XP Pro x64 Dynamic language dependent ressources install framework 2.0 over network VSTS Source Control hanging... Visual Studio Version |
|||||||||||||||||||||||