New MinuteSort Record Set by Microsoft Research
Written by Kay Ewbank   
Wednesday, 23 May 2012

A team from Microsoft Research has taken the lead in the MinuteSort data sorting test using a specially-devised technology, Flat DataCenter Storage.

MinuteSort is a test of how much data can be sorted in a minute, and Jeremy Elson's team from Microsoft sorted three times the data of the previous record holder (a team from Yahoo in 2009).

The figures are impressive - 1401 gigabytes in the 60 seconds, using 1033 disks across 250 machines. This is not only three times as much as the previous record, but also, uses only one sixth of the hardware resources, according to a blog post about the test from Microsoft.

MinuteSort is a measure of data-crunching speed devised by the late Jim Gray, the well known Microsoft Research scientist. The award for the team’s achievement will be presented during the 2012 SIGMOD/PODS Conference, an international forum for database researchers, practitioners, developers, and users which is taking place this week in Scottsdale, Arizona.

 

minutesort

The MinuteSort team: (from left) Jon Howell, Jeremy Elson, Ed Nightingale, Yutaka Suzue, Jinliang Fan, Johnson Apacible, and Rich Draves.

One thing that’s interesting about the success is the technology used. While solutions such as Hadoop and MapReduce are traditionally used for working with large data sets, Microsoft Research created its own technology called the “Flat Datacenter Storage,” or FDS for short.

Microsoft Research's Jeremy Elson, Ed Nightingale, and Jon Howell came up with the idea behind FDS to tackle problems that traditional solutions have problem with, such as where you have two big data sets and want to join them. They worked out that increases in network bandwidth could be used to put together a simpler model for data sorting, in which every computer sees all of the data.

In Microsoft Research’s full bisection bandwidth networks, if you were to draw an imaginary line through a collection of computers connected by a full bisection bandwidth network, every computer on one side of the line could send data at full speed to every computer on the other side of the line, and vice versa, no matter where the line is drawn. This technology is used by FDS. Elson uses the comparison of an organizational chart showing who employees report to in a company. In a hierarchical company, employees report to a superior, then to another superior, and so on. In a “flat” organization, they basically report to everyone, and vice versa. FDS mimics a flat organization, as all the computers report to each other.

This isn’t just academic research, of course. The team from Microsoft Research has already been working with the Bing team to help Bing accelerate its search results, and there are plans to use it in other Microsoft technologies.

flatsort

More Information

Data in the Fast Lane - Microsoft Research Blog

MinuteSort with Flat Datacenter Storage

 

To be informed about new articles on I Programmer, subscribe to the RSS feed, follow us on Google+, Twitter, Linkedin or Facebook, install the I Programmer Toolbar or sign up for our weekly newsletter.

 

Banner


Chainguard Joins Docker Verified Publisher Program
15/03/2024

Chainguard has joined the Docker Verified Publisher (DVP) program, meaning its Chainguard Developer Images are now officially available on Docker's container image registry.



Crazy Clocks
10/03/2024

It's that time again when the clocks change and  time is of the essence and I indulge my interest in crazy clocks. I am always surprised that there are still new ideas for how to display the time [ ... ]


More News

 

raspberry pi books

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 22 January 2013 )