Home All Groups Group Topic Archive Search About

[OT?] download Wikipedia....

Author
14 May 2006 2:40 AM
Lloyd Dupont
I would like to use some of the data in the Wikipedia in one of my (.NET!)
program.
I'm still at the stage of trying to figure out how to download the data.

Any tip on:
- how to download WikiPedia data?
- how to use the data once downloaded?
Author
14 May 2006 4:32 AM
Alex Li
Do you mean scrapping the wikipedia webpages?  If that is the case,
then you want to take a look at the System.Net namespace; in particular
the WebClient class or the HttpWebRequest class for download the
content; then use a parser to extra the data from the webpage content.

Does that help?
Alex
Are all your drivers up to date? click for free checkup

Author
14 May 2006 11:06 AM
Lloyd Dupont
Nono...
I found the URL, you could download the wikipedia's books at:
http://download.wikimedia.org/

Now I am the stage, trying to figure out what to do with this 136MB long XML
file.
Obviously basic XML tool which simply load it in memory are
innapropriate....

Show quoteHide quote
"Alex Li" <likw***@gmail.com> wrote in message
news:1147581137.948357.286610@y43g2000cwc.googlegroups.com...
> Do you mean scrapping the wikipedia webpages?  If that is the case,
> then you want to take a look at the System.Net namespace; in particular
> the WebClient class or the HttpWebRequest class for download the
> content; then use a parser to extra the data from the webpage content.
>
> Does that help?
> Alex
>
Author
14 May 2006 10:50 PM
Gaurav Vaish (EduJini.IN)
Hi Lloyd,

> Now I am the stage, trying to figure out what to do with this 136MB long
> XML file.
> Obviously basic XML tool which simply load it in memory are
> innapropriate....

    What is that you are trying to achieve will define a lot of things.
        May be you need to upgrade to 2GB machine to load all XML in memory
        May be you need serial access (XMLTextReader) and can do with only
512MB of RAM.


--
Cheers,
Gaurav Vaish
http://www.mastergaurav.org
http://www.edujini.in
-------------------
Author
15 May 2006 12:23 AM
Lloyd Dupont
>> Now I am the stage, trying to figure out what to do with this 136MB long
>> XML file.
>> Obviously basic XML tool which simply load it in memory are
>> innapropriate....
>
>    What is that you are trying to achieve will define a lot of things.
>        May be you need to upgrade to 2GB machine to load all XML in memory
>        May be you need serial access (XMLTextReader) and can do with only
> 512MB of RAM.
>
I try something very simple with XMLTextReader:
XmlTextReader xml = new XmlTextReader("theBigFile.xml");
while(!xml.EOF)
    xml.Skip();

it tooks ages.....
so I am kind of dubious I could use for anything usefull....

but that's kind of suprising as I found some other WikiPedia tool which
didn't seem to ave any trouble.. mhh....
Author
17 May 2006 3:47 AM
Gaurav Vaish (EduJini.IN)
> I try something very simple with XMLTextReader:
> XmlTextReader xml = new XmlTextReader("theBigFile.xml");
> while(!xml.EOF)
>    xml.Skip();
>
> it tooks ages.....

Probably that demonstrates the difference between efficient and inefficient
parsing logic...?



--
Cheers,
Gaurav Vaish
http://www.mastergaurav.org
http://www.edujini.in
-------------------
Author
17 May 2006 4:37 AM
Lloyd Dupont
Hu.. doesn't demonstrate much to me.
Anyway, interestingly This:
===
XmlTextReader xml = new XmlTextReader("theBigFile.xml");
xml.ReadStartElement(); <<== new
while(!xml.EOF)
   xml.Skip();
===
works much better!...


Show quoteHide quote
"Gaurav Vaish (EduJini.IN)" <gaurav.vaish.nospam@nospam.gmail.com> wrote in
message news:%23ulewSWeGHA.2188@TK2MSFTNGP04.phx.gbl...
>> I try something very simple with XMLTextReader:
>> XmlTextReader xml = new XmlTextReader("theBigFile.xml");
>> while(!xml.EOF)
>>    xml.Skip();
>>
>> it tooks ages.....
>
> Probably that demonstrates the difference between efficient and
> inefficient parsing logic...?
>
>
>
> --
> Cheers,
> Gaurav Vaish
> http://www.mastergaurav.org
> http://www.edujini.in
> -------------------
>
>
Author
17 May 2006 1:26 PM
Gaurav Vaish (EduJini.IN)
Ha ha ha ha.
That tells me that we should be given access to the source code of the
application to check and report the code that result in these issues.

Let me also try out.. should be interesting to work with :D

--
Happy Hacking,
Gaurav Vaish
http://www.mastergaurav.org
http://www.edujini.in
-------------------


Show quoteHide quote
"Lloyd Dupont" <net.galador@ld> wrote in message
news:%23T59ouWeGHA.3556@TK2MSFTNGP02.phx.gbl...
> Hu.. doesn't demonstrate much to me.
> Anyway, interestingly This:
> ===
> XmlTextReader xml = new XmlTextReader("theBigFile.xml");
> xml.ReadStartElement(); <<== new
> while(!xml.EOF)
>   xml.Skip();
> ===
> works much better!...
>

Bookmark and Share