GraphLab Create
Written by Kay Ewbank   
Tuesday, 29 July 2014

Software designed to let data science teams gain insights into big data up to 10,000 times faster than rival products has been announced by GraphLab and is downloadable for free.

 

 

The software, GraphLab Create, simplifies big data analysis by combining all phases of the prototype-to-production process, allowing a single data scientist to do the job of many, according to the creators. The company says that there is a current shortage of data scientists, who have to derive value from a company's data by integrating a range of highly complicated, disparate tools and datasets. By using machine learning, GraphLab Create simplifies this task.

The software started life as a research project on graph analysis at Carnegie Mellon University. This was extended to add the ability to process tables and text, and the GraphLab company was created to improve on the open source project (PowerGraph) and create commercial software.

GraphLab Create 1.0 was officially shown off at GraphLab’s conference in San Francisco, where the developers said the software is between 100 and 10,000 times faster at analytics and model training than other products. GraphLab Create has been benchmarked against products MLlib (part of the Apache Spark project), Sci-Kit Learn and Mahout.

The keynote presentation at the conference showed GraphLab Create v1.0 being used to analyze one terabyte of data or more, at interactive speeds, on desktop systems. Its use on distributed systems using a Hadoop Yarn or EC2 cluster was also demoed.

GraphLab Create lets you switch between analysis of data as graph or table, and can be incorporated into data products that make use of the software’s machine learning, text analytics and graph analytics capabilities. GraphLab Create 1.0 includes GraphLab Canvas, the company’s new visualization platform for big data.

graphlabcreate

 

The software is designed to work with the same code on different platforms, so you can prototype on a single machine then move the completed project to production on distributed systems. It has been certified as interoperable on Cloudera Hadoop distributions.

The package can be used via a Python API, which gives you access to two scalable data structures called SFrame and SGraph for analysis of tabular and graph data sets. The product details say:

“the machine learning engine provides access to the latest ML algorithms which are foundational inputs to many data products like recommenders, fraud detection systems, text and sentiment analyzers. Data inputs can be taken in any form and from any location, whether local to the platform or in common stores like Amazon’s cloud, relational and graph databases or Hadoop distributions. Connectors for additional data types and stores can be easily added.”

The scalable frame is the means by which GraphLab Create can be used on very large data sets. The data is treated as a series of frames, scalable data structures. The software uses the computer memory to view a single frame, and if you’re working on a desktop or laptop machine, iterates over the data on the hard disk frame by frame.

 

Banner


Improved Code Completion With JetBrains Mellum
29/10/2024

JetBrains has launched Mellum, a proprietary large language model specifically built for coding. Currently available only with JetBrains AI Assistant, Mellum is claimed to provide faster, sm [ ... ]



Microsoft Introduces Vector Abstractions Library For .NET
21/11/2024

Microsoft has announced a preview release of the Microsoft Extensions VectorData Abstractions library, which can be used to help integrate vector stores into .NET applications and libraries.


More News

 

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 29 July 2014 )