Guardian moves on from Java and Oracle |
Written by Kay Ewbank |
Thursday, 07 April 2011 |
The developer team at online news site guardian.co.uk is making another interesting decision to use emerging technology for its site - and its choice is Scala. The Guardian website has the highest readership of any online news site apart from the New York Times. Over recent months developers working on the site have been revealing plans to move from the current Java and Oracle based system to one based on Scala and MongoDB. The changeover is starting with the Content API which is used for collecting the content from the online newspaper. The API was developed in Java, but the team has decided to switch to the JVM-based Scala instead. The decision was made because of the need to reduce the time taken to deliver new features. Scala, although based on Java, is very much a modern language organised around a functional programming approach plus objects. It is certainly a more advanced language than Java but obviously it is also a less tried and trusted tool.
Graham Tackley, the Web Platform Development Team Lead for guardian.co.uk, explained at Eurocon 2010 how they represent their 50 table relational database model in Apache Solr for the media storage and used Scala for the real-time content searching, indexing or updating. He said moving to Scala reduced the time for building the search index from 20 hours to one. The slides of this talk on Solr are available here with details of the system architecture. Tacklry has also written some interesting posts about Scala and why he likes it on his blog. Meanwhile, another developer on the Guardian, Mat Wall, spoke at Qcon London about the decisions behind moving from Oracle to MongoDB. MongoDB stores documents in JSON (JavaScript Object Notation) format rather than using a relational structure, which means that documents with new attributes can be added to the database at runtime. The Guardian team were being restricted by the fact that if the developers wanted to update the code that runs the site, they often had to update the database schema, and so put content updates on hold while the schema was updated. Both MongoDB and Oracle are in use at the moment, and this is being handed by a custom API layer that acts as a wrapper for the database access to the two very different data structures. Mat Wall also said the team are looking to move their data to being hosted in the cloud. The slides from the session can be downloaded from the Qcon website. You have to admire the team for making decisions to use tools that are "non-safe". After all it is an old saying that no one ever got fired for buying into the mainstream. You also can't help but think that they are deliberately setting out to be trailblazers. Whether they or the Guardian will regret it in the longer term is a different matter. Adopting new technologies on such a scale is an interesting experiment and one we can all learn from.
|
Last Updated ( Thursday, 07 April 2011 ) |