Apache Flink ML 2.0 Released
Written by Kay Ewbank   
Thursday, 27 January 2022

Flink ML 2.0.0 has been released. Flink ML is a library that provides APIs and infrastructure for building stream-batch unified machine learning algorithms, that can be easy-to-use and performant with (near-) real-time latency.

Apache Flink is an open source platform for distributed stream and batch data processing, with a streaming dataflow engine for data distribution and distributed computations over data streams.

flinklogo

The updated version of Flink ML is described as a major refactor of the earlier Flink ML library with major new features that extend the Flink ML API and the iteration runtime, such as supporting stages with multi-input multi-output, graph-based stage composition, and a new stream-batch unified iteration library.

The developers have also added five algorithm implementations in this release, which is the start of a long-term initiative to provide a large number of off-the-shelf algorithms in Flink ML.

The new support for stages requiring multi-input multi-output means that algorithm developers can assemble a machine learning workflow as a directed acyclic graph (DAG) of pre-defined stages. This workflow can then be configured and deployed without users knowing the implementation details of this graph. This improvement could considerably expand the applicability and usability of Flink ML.

The next improvement is the addition of support for online learning with APIs exposing model data. The support has been added to handle situations where there's a long-running job that keeps processing training data and updating a machine learning model. The traditional Estimator/Transformer paradigm does not provide APIs to expose this model data in a streaming manner, meaning users have to repeatedly call fit() to update model data, which is very inefficient. The new release means model data can be exposed as an unbounded stream, and algorithm users can then transfer the model data to web servers in real-time and use the up-to-date model data to do online inference.

Other improvements include simpler parameter handling for algorithms, and new tools for composing DAG of stages into a new stage. There's also a new stream-batch unified iteration library that provides the function of transmitting records back to the precedent operators and the ability to track the progress of rounds inside the iteration.

Flink ML 2.0 is available now.

flinklogo

More Information

Flink website

Related Articles

Apache Flink 1.9 Adds New Query Engine

Apache Flink 1.5.0 Adds Support For Broadcast State

Flink Gets Event-time Streaming

FLink Reaches Top Level Status

 

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


GitHub Is Retiring Atom
09/06/2022

GitHub has announced that it is shutting down the Atom editor in six months time. Why is it doing this and what should its users do?



Videos From Inaugural Computer History Conference
12/06/2022

The First International Research Conference on the History of Computing, dubbed Computing's Woodstock, gathered together a global elite of computer pioneers. It took place in June 1976 and now the Com [ ... ]


More News

pythondata

 



 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Thursday, 27 January 2022 )