GraphLab Create
Written by Kay Ewbank   
Tuesday, 29 July 2014

Software designed to let data science teams gain insights into big data up to 10,000 times faster than rival products has been announced by GraphLab and is downloadable for free.



The software, GraphLab Create, simplifies big data analysis by combining all phases of the prototype-to-production process, allowing a single data scientist to do the job of many, according to the creators. The company says that there is a current shortage of data scientists, who have to derive value from a company's data by integrating a range of highly complicated, disparate tools and datasets. By using machine learning, GraphLab Create simplifies this task.

The software started life as a research project on graph analysis at Carnegie Mellon University. This was extended to add the ability to process tables and text, and the GraphLab company was created to improve on the open source project (PowerGraph) and create commercial software.

GraphLab Create 1.0 was officially shown off at GraphLab’s conference in San Francisco, where the developers said the software is between 100 and 10,000 times faster at analytics and model training than other products. GraphLab Create has been benchmarked against products MLlib (part of the Apache Spark project), Sci-Kit Learn and Mahout.

The keynote presentation at the conference showed GraphLab Create v1.0 being used to analyze one terabyte of data or more, at interactive speeds, on desktop systems. Its use on distributed systems using a Hadoop Yarn or EC2 cluster was also demoed.

GraphLab Create lets you switch between analysis of data as graph or table, and can be incorporated into data products that make use of the software’s machine learning, text analytics and graph analytics capabilities. GraphLab Create 1.0 includes GraphLab Canvas, the company’s new visualization platform for big data.



The software is designed to work with the same code on different platforms, so you can prototype on a single machine then move the completed project to production on distributed systems. It has been certified as interoperable on Cloudera Hadoop distributions.

The package can be used via a Python API, which gives you access to two scalable data structures called SFrame and SGraph for analysis of tabular and graph data sets. The product details say:

“the machine learning engine provides access to the latest ML algorithms which are foundational inputs to many data products like recommenders, fraud detection systems, text and sentiment analyzers. Data inputs can be taken in any form and from any location, whether local to the platform or in common stores like Amazon’s cloud, relational and graph databases or Hadoop distributions. Connectors for additional data types and stores can be easily added.”

The scalable frame is the means by which GraphLab Create can be used on very large data sets. The data is treated as a series of frames, scalable data structures. The software uses the computer memory to view a single frame, and if you’re working on a desktop or laptop machine, iterates over the data on the hard disk frame by frame.



Arduino 3D Printing And CNC Machines

October 3rd was National Manufacturing Day in the USA. What has this to do with software? Everything.

Xamarin Platform Previews

Several new features for the Xamarin platform, including an improved Android emulator and a real time monitoring system, have been introduced at Xamarin Evolve 2014

More News


blog comments powered by Disqus

Last Updated ( Tuesday, 29 July 2014 )

RSS feed of news items only
I Programmer News
Copyright © 2014 All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.