You are here

Download Script

10 posts / 0 new
Last post
GammaLeak
Download Script

Hi!

Someone (not me) apparently put together a download script to scrape nwvault.ign.com

http://pastebin.com/2nC1iFWy

I'm a little curious if the writer of that script is here on these forums and whether or not this is the "final" version of the script (for what it's worth the Perl calls to open/write/close the comment/html files didn't work on my system).

Also, Rolo: would you consider use of an automated download script like this abuse of the site?

Thanks!

-GammaLeak

  • up
    50%
  • down
    50%
Rolo Kipp

Heh. That's an old version, for sure.

And that could be a loaded question. Let me look down the barrel to make sure it's not loaded.

Would I consider it abuse? It depends on who's doing, for what purpose and how they do it.

It's like loading/unloading a truck. If the stevadores do it, it's not abuse. If the guys from the hood do it, it's abuse. If they overload it or load it wrong so it tips over or jack-knifes, it's abuse.

For that script, it's abuse if many people run it often for personal or commercial purposes.

It's not abuse if designated people (i.e. me, tyvm) run it in update mode once a week to ensure a valid, up-to date archive.

The above are my considered opinion :-)

I *did* think about this a few times over the last nine months...


Rolo Kipp
 
From Dreamguard on Needlespire, The Gemworld of Amethyst
  • up
    50%
  • down
    50%
Pain Eternal

I beleive it was me posting that script, Rolo has a more updated version ( do not have that on me ) which i was sharing with other programmers at the ArchiveTeam.org. They were having issues parsting the various sites under IGN, and I was sharing what we are doing. The Wayback Machine, Internet Archive and the like work to preserve content like this, and it's also being put as seeds on torrents. They are a fine group of people which you can find here http://www.archiveteam.org/index.php?title=Main_Page  ( and more info on why they are deciding they need to back it up is also listed on the project page ). We are going to compare notes and files ( rsync data as needed ) to ensure we have a complete backup of the vault.

When i left i think those fine people were taking a look at that code to fix the comments, so there probably is a better version. Rolo would have to post a link to it. I have another program, never launcher, which i just got the comments working. But pastebin is a place to compare notes on code, it should not be used for actual code as it tends to be code which is broken and needing some review.

( technically what is happening is done by googlebot, and every other search engine, there is nothing wrong with it as long as it does not hammer the server like a certain chinese search engine happens to. It really depends on what you are doing if it's abuse. )

  • up
    50%
  • down
    50%
GammaLeak

Thanks for the clarification. I didn't mean to pitch you a loaded question smiley I just wanted to know: would it be acceptable for random Joe Blow (i.e., me) to use it for creating my own personal archive.

I'm not sure from your answer whether or not I qualify as stevadore or hood. wink

  • up
    50%
  • down
    50%
Rolo Kipp

LOL, I think we need to drive away from that metaphor fast and furious :-)

That script is not a very good one for generally storing an archive off the Vault. It does most of what I needed in a hurry, and I really do appreciate it. Now, once a week, it downloads new or updated files for the archive... a lot better than crawling nearly 500gigs worth of stuff!

NeverLauncher does a better job and benefits the community as well. That said, it would still be rather bad form (not to mention violating most site's terms of service) to have a lot of random fellows downloading over and over.

In that respect, I don't think you should scrape independently from something done for purpose and done with consideration for the site servers.

I think the best idea for anyone wanting a copy of the Vault repository is simply become a part of the vault's repository system itself. This would give you a complete repository that is actively maintained and it would benefit the community as a whole. It would also hit any particular server *once* and distribute both upload and download loads. 

Or you could mail me a 500gb external and I'll copy what I have ;-)

 

  • up
    50%
  • down
    50%
Rolo Kipp

Was that the copy I gave you yesterday?

The update mode stuff isn't in there, and some other things...

How'd I give you an old version?

*looks very confused*

  • up
    50%
  • down
    50%
GammaLeak
"I think the best idea for anyone wanting a copy of the Vault repository is simply become a part of the vault's repository system itself. This would give you a complete repository that is actively maintained and it would benefit the community as a whole. It would also hit any particular server *once* and distribute both upload and download loads."

Sounds cool. What are the requirements for becoming a part of the repository system? Excuse my n00biness. :^)

  • up
    50%
  • down
    50%
Rolo Kipp

Nah, you're not a n00b, we're still working this out.

Requirements... a bit of a commitment - some gigs (500gb preferred ;-), a server online more often than not. That's the core of it, I think.

A determination to *not* just fade away after a couple months... 

Common sense things.

The actual *details*, as I said, are being worked on :-)

Is that something you're interested in? Also what region of the world are you? It'd be nice to spread things out...

  • up
    50%
  • down
    50%
Pain Eternal

That was prior to your giving me the current version, that is kind of old. ( mainly to help discussion, they have other tech they are using )

  • up
    50%
  • down
    50%
GammaLeak

Hi! Sent you a private message. :^)

  • up
    50%
  • down
    50%