Apache Flink 1.5.0 Adds Support For Broadcast State
Written by Kay Ewbank   
Friday, 08 June 2018

The latest version of Apache Flink has been released with a rewritten deployment and process model, and support for broadcast state.

Apache Flink is an open source platform for distributed stream and batch data processing. It consists of a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink includes several APIs, including the DataSet API for static data embedded in Java, Scala, and Python; the DataStream API for unbounded streams embedded in Java and Scala; the Table API with a SQL-like expression language embedded in Java and Scala; and the streaming SQL API that enables SQL queries to be executed on streaming and batch tables, with a syntax is based on Apache Calcite.

One addition to the latest release is a new SQL CLI client that is the first step to adding a service to execute streaming and batch SQL queries. The client provides a SQL shell to run exploratory queries on data streams.

flinkcli

 

The developers have redesigned and reimplemented large parts of Flink’s process model, and say that while there is still work to do on this reworking, the changes already implemented enable more natural Kubernetes deployments, and mean that all requests to the JobManager now happen through REST. The improvements also add support for dynamic resource allocation and dynamic release of resources on YARN and Mesos schedulers for better resource utilization. In a later version it will be possible to dockerize jobs and deploy them in a natural way as part of the container deployment.

The new support for broadcast state is something that developers using Flink have requested. Broadcast state is replicated across all parallel instances of a function, and might typically be used where you have two streams, a regular data stream alongside a control stream that serves rules, patterns, or other configuration messages. Changes to the processing of the regular stream are configured by the messages of the control stream. By broadcasting rules or patterns to all parallel instances of a function, they can be applied to all events of the regular stream.

The next improvement is aimed at improving the efficiency of failure recovery. While in normal use, Flink writes copies of an application’s state to a remote, persistent storage and loads it back in case of a failure. This ensures state information is not lost, but recovering the state takes time. The new feature, task-local state recovery, adds the ability to also keep a copy on the local disk of each machine of the application’s state, so recovery is faster. 

Other improvements include extended join support for SQL and Table API with the addition of support for windowed outer equi-joins; and improved reading and writing JSON messages from and to connectors.

flinklogo

More Information

Flink website

Related Articles

Flink Gets Event-time Streaming

FLink Reaches Top Level Status

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Apache Releases Tomcat 11
07/11/2024

Apache has announced the release of Tomcat 11, as well as marking the 25th anniversary of the first commit to the Apache Tomcat source code repository since becoming an ASF project.



Google Opensources Privacy Library
08/11/2024

Google is making a new differential privacy library available as open source. PipelineDP4J is a Java-based library that can be used to analyse data sets while preserving privacy.


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Friday, 08 June 2018 )