Find the answer to your Linux question:
Page 1 of 2 1 2 LastLast
Results 1 to 10 of 18
It has came to my attention that some people are using "download a website" or "offline website syncronisation" programs designed to "download" all or large parts of websites your hard ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Linux Guru
    Join Date
    Apr 2003
    Location
    London, UK
    Posts
    3,284

    A note on "downloading linuxforums.org"


    It has came to my attention that some people are using "download a website" or "offline website syncronisation" programs designed to "download" all or large parts of websites your hard disk for offline browsing.

    These "download a website" programs I am specifically refering to are different from local browser or proxy caching as they often send thousands of automated HTTP requests per minute that are not inline with "normal" web surfing activity.

    These programs/scripts cause serious performance issues on my server and sometimes render this site totally un-accessable to other users.

    Anyone using such applications against this site are asked to stop immediatly.

    As of right now any signs of these type of programs being run against this site will result in your IP address being perminantly firewalled off and an official Denial of Service complaint will be lodged with your ISP.

    I would like to clarify that local brower and proxy caching (where you have authorisation to use that proxy server) as a result of normal web surfing are NOT a problem. Automated website download scripts are the issue here - these are totally different.

    If any part of this is not clear feel free to reply to this post, PM me, or email me.

    Jason

  2. #2
    Linux Enthusiast scientica's Avatar
    Join Date
    Sep 2003
    Location
    South- or "Mid-" Sweden
    Posts
    742
    Jason, how about compiling the message board as an tarball? I've seen this on other foums, it's a sort of site archive, compiled by the webamster, it's simply all threads and their contents rendered as static files. (ie, just the threads+attachments, no user info (besides username avatar, etc - that which is seen when browsing threads))
    That way users could download all the usefull (and less ;) threads in a (somewhat) controllable way, tbz2 (tar.bz2) would be prefered (thinking of size), also, it would be a sort of backup of the site (think: "real men don't use backups, they share it" ;)
    I think there's some tool for phpBB to do this (or if it just a feature of vBullentin).
    It could be a monthly or quartedly snapshot of the threads.

    So, I guess the crawlers are the source of the slowdown of lfo - let's hope it ends with this thread :)
    Regards Scienitca (registered user #335819 - http://counter.li.org )
    --
    A master is nothing more than a student who knows something of which he can teach to other students.

  3. #3
    Linux Guru
    Join Date
    Apr 2003
    Location
    London, UK
    Posts
    3,284
    That could create all sorts of legal implications that i dont really want to get into, eg redistributing what people have written without their permission etc etc. Unfortunatly there are some sad people in this world who think it is better to sue people to make a living instead of going to get a job.

  4. #4
    Linux Guru lakerdonald's Avatar
    Join Date
    Jun 2004
    Location
    St. Petersburg, FL
    Posts
    5,035
    I have noticed in the past week that this site seems bogged-down, and it takes a good five minutes to load it on my dedicated T1 line. I have written scripts to download my favorite webcomics everyday, but that's different. I'm downloading just a graphic file per day. But downloading an entire site? That's totally different, and I'd hate to see such a great website get pulled into such a mess as this. I think that if you discover the member ID of people, you should ban their username as well as their IP.
    Keep up the good work!
    -lakerdonald

  5. #5
    Linux User
    Join Date
    Jul 2004
    Location
    USA, Michigan, Detroit
    Posts
    329
    It might be a leach reposting the content as an original site with there own advertisers paying them for eyeballs. A simular thing happened to another sites forums that I regularly visit about 4 or 5 months ago.
    Long live the revolution!
    Have a nice day.
    If you want real change vote Libertarian!

  6. #6
    Linux Guru
    Join Date
    Apr 2003
    Location
    London, UK
    Posts
    3,284
    Quote Originally Posted by lakerdonald
    I have noticed in the past week that this site seems bogged-down, and it takes a good five minutes to load it on my dedicated T1 line.
    We have also been having some issues with the database tables getting locked which causes the site to come to a stand still for 5 minutes at a time. This is a second issue to the download-a-website programs. Im working on trying to resolve those by optimising some of the SQL in phpBB but this is taking time todo. Let me know if this starts happening reguarly though..

    Since i've started this topic i've blocked 3 IP addresses. Tonight I have the pleasure of sitting down to start writing the relevant emails.

    It is a shame though as the time I spend dealing with silly issues like this is time im not spending on adding more features to linuxforums

    Quote Originally Posted by copycon
    It might be a leach reposting the content as an original site with there own advertisers paying them for eyeballs. A simular thing happened to another sites forums that I regularly visit about 4 or 5 months ago.
    It could well be.. what site was that btw?

  7. #7
    Linux Enthusiast scientica's Avatar
    Join Date
    Sep 2003
    Location
    South- or "Mid-" Sweden
    Posts
    742
    Quote Originally Posted by jasonlambert
    That could create all sorts of legal implications that i dont really want to get into, eg redistributing what people have written without their permission etc etc. Unfortunatly there are some sad people in this world who think it is better to sue people to make a living instead of going to get a job.
    *sigh* sometimes a "legal" system not justice... :/
    legal aspcet is easy quite to fix, just add some line like (or?):
    "By accepting this agreement I also agree to that anything and everying that I post/write/submit or by other means include in my posts may be republished in a downloadable archive from this site. (aside from this the contents of the post still remain the authors)"
    or
    "By agreeing to this agreement I grant linuxforums.org to republish all (or selected) my articles in a downloadable archive."
    to the agreement all users have to accept when registering - and inform all users that this change has been made and state clearly that any member who doesn't accept these new conditions shall conntact the admin and have their accound closed (and if they explicity demand so, have their posts delted/blanked). (iirc there is a clause that states that the agreements can be changed like this). (those who accept the new agreement continue as usual).
    Regards Scienitca (registered user #335819 - http://counter.li.org )
    --
    A master is nothing more than a student who knows something of which he can teach to other students.

  8. #8
    Linux User
    Join Date
    Jul 2004
    Location
    USA, Michigan, Detroit
    Posts
    329
    It was the UHACC (Unix, Hobiests, Adminastrators, Coders, Club) web site. Thier site is at www.uhacc.org. The thread talking about the incident is here

    http://uhacc.org/forums/index.php?bo...;threadid=1131

    If for some reason that link does not work the tree of the thread is

    UHACC forums
    ----UHACC foo
    --------Services (Moderators: mongoose, dfu)
    ------------An LDP Mirror site is glomming on UHACC (RESOLVED)

    After rereading the thread it looks like it might be a little diffrent but still worth looking at.
    Long live the revolution!
    Have a nice day.
    If you want real change vote Libertarian!

  9. #9
    Linux Guru lakerdonald's Avatar
    Join Date
    Jun 2004
    Location
    St. Petersburg, FL
    Posts
    5,035
    status update:
    linuxforums has slown to a crawl. i am typing about 45 seconds faster than the text box can handle. its taking me about five minutes to post something of this length. these downloaders are really becoming a problem
    -lakerdonald

  10. #10
    Linux Guru
    Join Date
    Apr 2003
    Location
    London, UK
    Posts
    3,284
    Quote Originally Posted by lakerdonald
    status update:
    linuxforums has slown to a crawl. i am typing about 45 seconds faster than the text box can handle. its taking me about five minutes to post something of this length. these downloaders are really becoming a problem
    -lakerdonald
    No downloaders were on the site at that time so the speed issue you experienced there was something else. AFAIK it wasnt a server issue.

Page 1 of 2 1 2 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •