Apache Flink 1.5.0 Adds Support For Broadcast State
Written by Kay Ewbank   
Friday, 08 June 2018

The latest version of Apache Flink has been released with a rewritten deployment and process model, and support for broadcast state.

Apache Flink is an open source platform for distributed stream and batch data processing. It consists of a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink includes several APIs, including the DataSet API for static data embedded in Java, Scala, and Python; the DataStream API for unbounded streams embedded in Java and Scala; the Table API with a SQL-like expression language embedded in Java and Scala; and the streaming SQL API that enables SQL queries to be executed on streaming and batch tables, with a syntax is based on Apache Calcite.

One addition to the latest release is a new SQL CLI client that is the first step to adding a service to execute streaming and batch SQL queries. The client provides a SQL shell to run exploratory queries on data streams.

flinkcli

 

The developers have redesigned and reimplemented large parts of Flink’s process model, and say that while there is still work to do on this reworking, the changes already implemented enable more natural Kubernetes deployments, and mean that all requests to the JobManager now happen through REST. The improvements also add support for dynamic resource allocation and dynamic release of resources on YARN and Mesos schedulers for better resource utilization. In a later version it will be possible to dockerize jobs and deploy them in a natural way as part of the container deployment.

The new support for broadcast state is something that developers using Flink have requested. Broadcast state is replicated across all parallel instances of a function, and might typically be used where you have two streams, a regular data stream alongside a control stream that serves rules, patterns, or other configuration messages. Changes to the processing of the regular stream are configured by the messages of the control stream. By broadcasting rules or patterns to all parallel instances of a function, they can be applied to all events of the regular stream.

The next improvement is aimed at improving the efficiency of failure recovery. While in normal use, Flink writes copies of an application’s state to a remote, persistent storage and loads it back in case of a failure. This ensures state information is not lost, but recovering the state takes time. The new feature, task-local state recovery, adds the ability to also keep a copy on the local disk of each machine of the application’s state, so recovery is faster. 

Other improvements include extended join support for SQL and Table API with the addition of support for windowed outer equi-joins; and improved reading and writing JSON messages from and to connectors.

flinklogo

More Information

Flink website

Related Articles

Flink Gets Event-time Streaming

FLink Reaches Top Level Status

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


pg_parquet - Postgres To Parquet Interoperability
28/11/2024

pg_parquet is a new extension by Crunchy Data that allows a PostgreSQL instance to work with Parquet files. With pg_duckdb, pg_analytics and pg_mooncake all of which can access Parquet files, is  [ ... ]



RAG from Scratch
10/12/2024

The "RAG from Scratch" tutorial by Langchain coupled with the "RAG playground" are two great educational resources that will help you kickstart your journey with RAG.


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Friday, 08 June 2018 )