Carbon Dating The Web
Written by Ian Elliot   
Monday, 03 June 2013

While surfing the web, you find something really interesting. But us it of current interest or is it already long gone. One of the problems with the web is that we don't remove dead material and who ever adds an accurate date of posting? Now however we have a way to discover how old a webpage is.

It is often impossible to find a date for the origination of a webpage. And of course you cannot always trust a claim such as "serving the web since 1902" - well if such a claim were made you would know it was an exaggeration since the web is not much more than 20 years old.

Carbon Dating the Web is a utility that retrieves various items of evidence to provide the estimated creation date of any page. All the user has to do is enter its url and then, after a short delay, it gives a report.


The utility has been provided as part of a research project being undertaken by Hany SalahEldeen and Michael Nelson of the Department of Computer Science at Old Dominion University in Norfolk, Virginia.  

According to the abstract of a paper presented at the WWW 2013 conference held in Rio de Janeiro, Brazil in May:

To establish a likely datetime, we poll Bitly for the first time someone shortened the URI, Topsy for the first time someone
tweeted the URI, a Memento aggregator for the first time it appeared in a public web archive, Google’s time of last crawl, and the Last-Modified HTTP response header of the resource itself. We also examine the backlinks of the URI as reported by Google and apply the same techniques for the resources that link to the URI.

The paper also includes this timeline for the resources used for the process of carbon dating.

 

Hany SalahEldeen, writing on the Web Science and Digital Libraries Research Group blog, explained how the researchers tested the accuracy of the model of 1200 resources for which they were able to manually extract a creation date. The model was able to estimate a creation date in 75% of cases, with 33% being the exact creation date. The model was then used to build the utility shown above. The page we tested was for I Programmer's most popular book review, Beautiful Architecture.

This page was actually created on June 12, 2009 and the date shown on its page is April 10, 2010. It is carbon dated to 20 July 2009 which it also gives as the date it was initially tweeted.This fits in with I Programmer's history, which started to Tweet its articles in July 2010.

Dating web pages is something many developers would find useful and the code for the utility is available on GitHub. Anyone who registers with Bitly and Topsy to obtain API keys can set up a service.

More Information

Carbon Dating the Web

Carbon Dating The Web: Estimating the Age of Web Resources (archiv pdf)

CarbonDate on GitHub

Related Articles

Google Flu Prediction - Beware The Media Effect

1 Billion Web Pages = 1 Million Dollars?

Microsoft's New Research Center into Social Data

Social Networks, Suicide and Statistics

 

To be informed about new articles on I Programmer, install the I Programmer Toolbar, subscribe to the RSS feed, follow us on, Twitter, Facebook, Google+ or Linkedin,  or sign up for our weekly newsletter.

 

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Banner


Copilot Improves Code Quality
27/11/2024

Findings from GitHub show that code authored with Copilot has increased functionality and improved readability, is of better quality, and receives higher approval rates than code authored without it.

 [ ... ]



Sequin - Open Source Message Stream Built On Postgres
31/10/2024

Sequin is a tool for capturing changes and streaming data out of your Postgres database, guaranteeing exactly once processing. What does that mean?


More News

Last Updated ( Monday, 03 June 2013 )