A team from Microsoft Research has taken the lead in the MinuteSort data sorting test using a specially-devised technology, Flat DataCenter Storage.
MinuteSort is a test of how much data can be sorted in a minute, and Jeremy Elson's team from Microsoft sorted three times the data of the previous record holder (a team from Yahoo in 2009).
The figures are impressive - 1401 gigabytes in the 60 seconds, using 1033 disks across 250 machines. This is not only three times as much as the previous record, but also, uses only one sixth of the hardware resources, according to a blog post about the test from Microsoft.
MinuteSort is a measure of data-crunching speed devised by the late Jim Gray, the well known Microsoft Research scientist. The award for the team’s achievement will be presented during the 2012 SIGMOD/PODS Conference, an international forum for database researchers, practitioners, developers, and users which is taking place this week in Scottsdale, Arizona.
The MinuteSort team: (from left) Jon Howell, Jeremy Elson, Ed Nightingale, Yutaka Suzue, Jinliang Fan, Johnson Apacible, and Rich Draves.
One thing that’s interesting about the success is the technology used. While solutions such as Hadoop and MapReduce are traditionally used for working with large data sets, Microsoft Research created its own technology called the “Flat Datacenter Storage,” or FDS for short.
Microsoft Research's Jeremy Elson, Ed Nightingale, and Jon Howell came up with the idea behind FDS to tackle problems that traditional solutions have problem with, such as where you have two big data sets and want to join them. They worked out that increases in network bandwidth could be used to put together a simpler model for data sorting, in which every computer sees all of the data.
In Microsoft Research’s full bisection bandwidth networks, if you were to draw an imaginary line through a collection of computers connected by a full bisection bandwidth network, every computer on one side of the line could send data at full speed to every computer on the other side of the line, and vice versa, no matter where the line is drawn. This technology is used by FDS. Elson uses the comparison of an organizational chart showing who employees report to in a company. In a hierarchical company, employees report to a superior, then to another superior, and so on. In a “flat” organization, they basically report to everyone, and vice versa. FDS mimics a flat organization, as all the computers report to each other.
This isn’t just academic research, of course. The team from Microsoft Research has already been working with the Bing team to help Bing accelerate its search results, and there are plans to use it in other Microsoft technologies.
Rumors that Google was acquiring the data science community Kaggle were confirmed at the Google Cloud Next Conference yesterday. This confers the benefit of the ability to store and query large datase [ ... ]