Apache Flink 1.9 Adds New Query Engine

Written by Kay Ewbank

Tuesday, 27 August 2019

Significant features on this path are batch-style recovery for batch jobs and a preview of the new Blink-based query engine for Table API and SQL queries.

Apache Flink is an open source platform for distributed stream and batch data processing. It consists of a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink includes several APIs, including the DataSet API for static data embedded in Java, Scala, and Python; the DataStream API for unbounded streams embedded in Java and Scala; the Table API with a SQL-like expression language embedded in Java and Scala; and the streaming SQL API that enables SQL queries to be executed on streaming and batch tables, with a syntax is based on Apache Calcite.

flinklogo

The main improvements to the new version start with the addition of batch-style recovery for batch jobs, and a preview of the new Blink-based query engine for Table API and SQL queries.

The new batch-style recovery has significantly reduced the time to recover a batch job from a task failure. This covers DataSet, Table API and SQL jobs. Until this version, if a task failed, the recovery of a batch job involved canceling all tasks and restarting the whole job, voiding all progress. You can now configure Flink to limit the recovery to only those tasks that are in the same failover region, the set of tasks that are connected via pipelined data exchanges.

The Blink-based query engine preview is a development of the donation of Blink to Apache Flink. Blink’s query optimizer and runtime for the Table API and SQL have been integrated into Flink, and the query planner has been extended so that there are now two choices of pluggable query processors to execute Table API and SQL statements: the pre-1.9 Flink processor and the new Blink-based query processor. The Blink-based query processor offers better SQL coverage and improved performance for batch queries because it has more extensive query optimization including cost-based plan selection and more optimization rules. Because the query processor is still not fully integrated, in this release the original processor is still the default choice, though you can enable the Blink processor.

Elsewhere, the State Processor API is now fully available and can be used to read and write savepoints with Flink DataSet jobs. Finally, Flink 1.9 includes a reworked WebUI and previews of Flink’s new Python Table API and its integration with the Apache Hive ecosystem.

flinklogo

More Information

Flink website

Apache Flink 1.5.0 Adds Support For Broadcast State

Flink Gets Event-time Streaming

FLink Reaches Top Level Status

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Express.js 5 Released With Greater Security
16/01/2025

Express.js 5 has been released, ten years after Express.js 4. The new release has dropped support for outdated versions of Node.js, addresses security concerns, and brings simplified maintenance.

+ Full Story

European Robotics Hackathon 2025 Open For Entries
17/01/2025

ENRICH 2025, the European Robotics Hackathon, is open now for team entries. To be held at the Zwentendorf Nuclear Power Plant in Austria, the aim is to develop robots that can carry out tasks in a nuc [ ... ]

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

More Information

Related Articles

Comments