Anti SOPA - How To Black Out |
Written by Alex Armstrong |
Tuesday, 17 January 2012 |
The news that Wikipedia has announced that it will go dark might encourage others to protest in the same way - but be careful how you do it. There is a right way and a wrong way to black out a website. First the latest update on the progress of the proposed strike. Despite the shelving of SOPA until a consensus is reached, and some presidential opposition to the whole idea, PIPA is still on the table. Most observers think that this is just a temporary truce - the war will continue when the odds are better for a SOPA like bill As a result Wikipedia, with an estimated 25 million visitors per day, is going dark on the 18th. Wikipedia is by far the largest site to join in and more importantly it serves "normal" users and not just geeks and nerds like us. Interestingly Twitter probably isn't joining in as indicated by a Tweet from its CEO Dick Costolo: "That's just silly. Closing a global business in reaction to single-issue national politics is foolish." A single issue national politics that has a global effect might be a different matter. Perhaps Twitter, well known for its "fail whale" icon fears that it might not be able to restart after a blackout in the style of mainframes of the past. For more information see the further reading at the end of this news item.
Political and social protest by removing websites that provide a service to users who might otherwise be unaware of what is going on seems to be catching on. However if you want to join in it is important to do it right if you want to avoid consequences that last for a lot longer than the blackout period. The problem is all to do with the way bots and spiders scan a site to build and keep an index up-to-date. Google's Pierre Far thinks that this is so important he has written a Google+ (where else) post on the subject. The advice applies even to the situation where you need to take a site down for a day or two for maintenance reasons. If you put up a replacement page then the site will be reindexed with that page being regarded as new content. The best solution is to return a 503 HTTP header for all URLs that are being blacked out. This signals to the bots that the page is not the real content and the state is temporary. If you are using PHP this is just a matter of: header('HTTP/1.1 503 Service Temporarily Unavailable');
If a lot of pages report the 503 status then the Google bot will reduce the frequency that it scans the site. When you return to normal service the bot will eventually notice that you are back and continue where it left off. Some small suggestions to make things even easier are:
The final advice from Pierre Far is to keep it simple, don't change DNS or crawl frequency setting for example. Also don't use 302 redirects- 503 or any of the 5xx will do the job reliably. If you have any questions about Google crawl bot then visit: http://www.google.com/support/forum/p/Webmasters?hl=en More informationRelated NewsSOPA Shelved - But What About Protect IP
Comments
or email your comment to: comments@i-programmer.info
To be informed about new articles on I Programmer, subscribe to the RSS feed, follow us on Google+, Twitter, Linkedin or Facebook or sign up for our weekly newsletter.
|
Last Updated ( Tuesday, 17 January 2012 ) |