Grail Open Sources Bigslice Cluster Computing For Golang
Written by Kay Ewbank   
Thursday, 17 October 2019

GRAIL has open sourced two projects, Bigslice and Bigmachine, which enable distributed computation across large datasets using simple Golang programs. 

Bigslice is a system for fast, large-scale, serverless data processing using Go. It exposes a composable API that lets the user express data processing tasks in terms of a series of data transformations that invoke user code.

The developers describe Bigslice as similar to data processing systems like Apache Spark and FlumeJava, but with different aims. Bigslice is built for Go, and is used as an ordinary Go package. Users use their existing Go code, and Bigslice binaries are compiled like ordinary Go binaries.

It is also serverless, and the team says that with nothing more than cloud credentials, Bigslice can be used to process large datasets without the use of any other external infrastructure.

bigslice

 

Bigslice programs are regular Go programs, providing users with a familiar environment and tools. A Bigslice program can be run on a single node like any other program, but it is also capable of transparently distributing itself across an ad hoc cluster, managed entirely by the program itself.

The data processing features of Bigslice come in the form of a coherent set of operators that can be used to work with large data sets using ordinary Go code. The operators are familiar data transformation primitives such as map, filter, reduce, and join. While the user’s computations are sequential — they specify how a dataset is to be transformed, step-by-step, into the desired result — Bigslice parallelizes the computation and can distribute it across many processors and over large compute clusters.

Bigslice achieves this by splitting the datasets into many smaller pieces, and performing the transformations individually on each piece so that they can fit in memory, and so they can be performed in parallel across many machines. When transformations require that data be rearranged (for operations like join or reduce), Bigslice arranges that the data are re-shuffled accordingly.

Bigslice uses Bigmachine to manage an ad-hoc cluster of compute nodes to support distribution. Bigmachine is a toolkit for building self-managing serverless applications in Go. It provides an API that lets a driver process form an ad-hoc cluster of machines to which user code is transparently distributed. User code is exposed through services, which are stateful Go objects associated with each machine.

bigslice

More Information

Bigslice Home Page

Related Articles

Spark Gets NLP Library

Apache Spark With Structured Streaming

Spark BI Gets Fine Grain Security

Spark 2.0 Released

Go 1.13 Modernizes Number Literals

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


PHP 8.4 Adds Property Hooks
26/11/2024

PHP 8.4 is available with improvements including property hooks, asymmetric visibility, and an updated DOM API.



pg_parquet - Postgres To Parquet Interoperability
28/11/2024

pg_parquet is a new extension by Crunchy Data that allows a PostgreSQL instance to work with Parquet files. With pg_duckdb, pg_analytics and pg_mooncake all of which can access Parquet files, is  [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Thursday, 17 October 2019 )