StankDawg

DocDroppers needs your help!

37 posts in this topic

There is a phenomenon taking place over on wikipedia where a small handful of security professionals have been slowly removing or editing back articles related to hacking. I am not even talking about mine specifically, but there seems to be a direction that is leading away from hackers and hacker culture being "notable" in general. They removed the hacking categories from the system a while back as well.

Instead of fighting over on wikipedia to save these entries, I am more inclined to bring it over to a site that WE control. I believe that this site is DocDroppers.org but I cannot do it alone. I have been doing my best to keep docdroppers alive, but I cannot do it anymore without your help. I takes a lot of our time just cleaning up spam and keeping the site running, but no major content has been added since we launched the site many years ago. Now, with our past being removed from wikipedia, it is our last chance to save our history. This is where you come in!

check out our to-do list on docdroppers and help out where you can.

The easiest 2 things that need to be done are:

1) copy any and all hacking related topics from wikipedia over to docdroppers. It can be as easy as a cut and paste since the sites both use the mediawiki engine.

2) add new entries and articles from zines such as phrack, blacklisted, 2600, and more. These may take more time since they will need to be formatted. Just get them in for now and we can all help format them over time.

This may be our last chance to save our history. Please help. I don't ask you for money, I don't ask you for excuses or complaints or attacks on wikipedia or it's members. I simply ask you to contribute your time to docdroppers as described above. Too much of our history is already lost. Please donate your time.

0

Share this post


Link to post
Share on other sites

Indeed, I will dump what I can in my free time.

0

Share this post


Link to post
Share on other sites

Oh crap, I totally forgot about fixing the skin for docdroppers.

I've been a little too busy at uni. Finals are almost over though.

0

Share this post


Link to post
Share on other sites

Need a bit of advice guys,

When I copy and paste, all the links from the original text link back to the original Wikipedia .

Don't they need to be removed or altered?

And also what about the EDIT tags? Do they require modification?

Thanks.

EDIT - OK, seems a few have already been done today, how are you doing this?!?!?!!!!

Edited by Swerve
0

Share this post


Link to post
Share on other sites

Nowadays i have much time to spare. I started wikipedia copying and if anything else is needed, i can help. Note: I try my best to implement wikipedia templates.

Edited by fiyaskofiyasko
0

Share this post


Link to post
Share on other sites

I don't really have the time or resources to devote to it, but if someone does: one could get a good foundation for this started by grabbing a dump of Wikipedia and sort out articles with a script that looks for keywords (such as "hacking", "computer security", etc.) and possibly following links to related articles. There would probably be some weird unrelated thing pulled in by this, but I think it would be easier to pick out the false positives than to manually go through finding pages in the current Wikipedia.

The bad news for this is that the generation and availability of dumps of Wikipedia is a mess. The ideal would be to grab "pages-meta-history.xml.bz2", which is a database dump of every page, with every revision (this way you could sort out situations where folks have removed things that you would want for docdroppers). There are a few problems with this, though: 1) It's going to be *massive*. The last complete one I believe was 1 terabyte uncompressed. 2) It's hard for the wikimedia folks to even generate. It's been months and months since a complete dump finished successfully. 3) They started the current dumping process on October 25, and it's not expected to finish generating the file until December 19th.

If you want to keep an eye on that, cross your fingers and check up here:

http://download.wikimedia.org/enwiki/20071018/

Another option would be to grab "pages-articles.xml.bz2", which has the current revisions of every article. It's "only" 3 gigs compressed (I don't know how much uncompressed). It may be missing some things that have already been taken down that you're wanting for docdroppers unfortunately. You can try hunting down older dumps, or work with some of the static html downloads that are older, like this one from April:

http://static.wikipedia.org/downloads/April_2007/en/

0

Share this post


Link to post
Share on other sites

Sifting through the dump is an interesting idea, but will anybody here have the kind of hardware to tackle that sort of thing? Also, I wonder how much of this stuff has ended up on archive.org. As far as the templates go, it's like Stank said--it's better to grab the data now and we can make it pretty later.

0

Share this post


Link to post
Share on other sites

I think this is a great initiative :voteyes:

Remember that when collecting Wikipedia entries it's very important to preserve the copyright notice. All the text on Wikipedia is under the GNU Free Documentation License, while DocDroppers defaults to Creative Commons Attribution-NonCommercial-ShareAlike. Relicensing the text is a violation of the copyright.

0

Share this post


Link to post
Share on other sites

I've got 2 or 3 comps I can devote some processing power to. All have over 2 GHz. Processors and over 1 Gig of RAM and they can go through the articles all day long so I would just need to get the dumps, split them to run over 3 computers and have a way of searching through them, I'm on my second week of learning C++ so maybe I can figure out a way of going through the articles. Anybody that can help me figure it out would be great.

0

Share this post


Link to post
Share on other sites
I think this is a great initiative :voteyes:

Remember that when collecting Wikipedia entries it's very important to preserve the copyright notice. All the text on Wikipedia is under the GNU Free Documentation License, while DocDroppers defaults to Creative Commons Attribution-NonCommercial-ShareAlike. Relicensing the text is a violation of the copyright.

Thanks for pointing that out snow. I will look into this topic. We may change our copyright policy on docdroppers or figure out some other way to make this work.

What is the policy if the article gets deleted from wikipedia though? There seems to be a little grey area here.

And thanks to everyone who is contributing! I was shocked when I came home from work today to see so many articles added! I will try to go behind everyone and clean them up as best as I can this weekend. If anyoen has any direct knowledge in mediawiki templates, contact me and we can work together on getting the template system in place.

0

Share this post


Link to post
Share on other sites
Text in Wikipedia, excluding quotations, has been released under the GNU Free Documentation License (or is in the public domain), and can therefore be reused only if you release any derived work under the GFDL... If you are unwilling or unable to use the GFDL for your work, use of Wikipedia content is unauthorized.

To me it seems like there are no grey areas. Deleted or not, in every situation wikipedia can easily show that the material originally belong to them.

0

Share this post


Link to post
Share on other sites
I think this is a great initiative :voteyes:

Remember that when collecting Wikipedia entries it's very important to preserve the copyright notice. All the text on Wikipedia is under the GNU Free Documentation License, while DocDroppers defaults to Creative Commons Attribution-NonCommercial-ShareAlike. Relicensing the text is a violation of the copyright.

Thanks for pointing that out snow. I will look into this topic. We may change our copyright policy on docdroppers or figure out some other way to make this work.

What is the policy if the article gets deleted from wikipedia though? There seems to be a little grey area here.

And thanks to everyone who is contributing! I was shocked when I came home from work today to see so many articles added! I will try to go behind everyone and clean them up as best as I can this weekend. If anyoen has any direct knowledge in mediawiki templates, contact me and we can work together on getting the template system in place.

Thanks guys for all your help we do appreciate it let stank or I know about any issues that you run into

-Enigma

0

Share this post


Link to post
Share on other sites

Could someone please post a definitive statement on the copyright issue once it's sorted?

Thanks :)

0

Share this post


Link to post
Share on other sites
Thanks for pointing that out snow. I will look into this topic. We may change our copyright policy on docdroppers or figure out some other way to make this work.

What is the policy if the article gets deleted from wikipedia though? There seems to be a little grey area here.

If an article gets deleted from Wikipedia it still remains under copyright. The content is the property of the authors, not Wikipedia, so relicensing it would actually require getting permission from all the contributors:

http://en.wikipedia.org/wiki/Wikipedia:Abo..._and_copyrights

I think it's a good idea to change the copyright policy of DocDroppers, but should you decide to change the copyright policy of articles that are already on DocDroppers (under the creative commons license), you must also get permission from the authors first.

If you want the least amount of work it is not a problem to keep the old content under the old license, and require any new content to be under the GNU Free Documentation License. In other words no previous copyright policy is changed, and you don't need to get any permissions other than what is granted by the authors through the copyright to begin with.

0

Share this post


Link to post
Share on other sites

The problem will be differentiating between the two types of content. I don't think a generic statement on the main page that says that content is one of these licenses would be good enough. I would imagine that we have to (or at least should) identify each individual article. To me, the easiest way that I see is to make a CATEGORY TAG for the 2 license and add it to each individual article/entry. By doing this, it would build a dynamic page that would list all topics for each license plus it would individually mark each page at the bottom.

This is just my first thought though...I am open to any other suggestions/solutions. I certainly want to give proper credit and copyright to the proper people. Let's make sure that is clear to everyone.

0

Share this post


Link to post
Share on other sites

Well while you guys are working on moving content over, I've started writing some. Hopefully by tomorrow I'll have one piece done and have added it to the site. Its my first paper on the subject so please scrutinize/comment it.

0

Share this post


Link to post
Share on other sites
The problem will be differentiating between the two types of content. I don't think a generic statement on the main page that says that content is one of these licenses would be good enough. I would imagine that we have to (or at least should) identify each individual article.

This is true. What license a particular article is under must be clear.

To me, the easiest way that I see is to make a CATEGORY TAG for the 2 license and add it to each individual article/entry. By doing this, it would build a dynamic page that would list all topics for each license plus it would individually mark each page at the bottom.

Sounds like a good solution.

I was thinking that if there are articles on Wikipedia that has already been deleted or is deleted before anyone grabs them, there may still be hope. The admins have access to view deleted content, so if we ask politely they may be willing to cooperate. Perhaps there's even an admin here on the forum?

0

Share this post


Link to post
Share on other sites
I was thinking that if there are articles on Wikipedia that has already been deleted or is deleted before anyone grabs them, there may still be hope. The admins have access to view deleted content, so if we ask politely they may be willing to cooperate. Perhaps there's even an admin here on the forum?

Archive.org is another place where the old information lies. I've been using that a bit as well. Most of the Hack related articles are on Docdroppers as of now, it's just a matter of formatting them to meet standards that lie within docdroppers in general, such as article title names. Wikipedia and Docdroppers have a few differences as I have noticed while copying and pasting. The templates aren't right, and I have been editing those as well. I can create the categories easily, but I just need to know about the copyright's in general.

ps: the china-men sure love to spam the site.

0

Share this post


Link to post
Share on other sites

Alright, playing off of the earlier idea about downloading a wiki dump I have done just that.

The parser is ready to go, the dump is downloaded, but the xml file is about 13G so I'd like to try to parse it as few times as possible. To that end I'm trying to compile a list of terms to search for. In an attempt to make the human sorting easier at the expensive of cycles I am using exact word matching, the current list contains...

my @searchTerms = ( "hack", "hacks", "hacker", "hackers", "hacking",

"phreak", "phreaks", "phreaker", "phreakers", "phreaking",

"binrev", "binary revolution", "ddp", "digital dawgpound", "digital dawg pound",

"def con", "h.o.p.e.", "interzone", "radio freak america", "infonomicon",

);

And I'm going to open things up for a day or two for anyone who has more words to add to the list. So please, any ideas feel free to drop um!

I'm going to try to make this work such that it directly interfaces with doc droppers and, if it's ok with the administration, possibly create a new category where people can just go through that category and speedy delete things that don't belong or recategorize those that do.

Thoughts?

-Dr^ZigMan

0

Share this post


Link to post
Share on other sites
And I'm going to open things up for a day or two for anyone who has more words to add to the list. So please, any ideas feel free to drop um!

Less directly related to culture, but still useful to have in there, would be things like: "computer security", "information security", "web security", "social engineering", "penetration test(ing)".

0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now