Apache Kylin 2.5 Adds All-in-Spark Cubing Engine
Written by Kay Ewbank   
Tuesday, 02 October 2018

There's a new release of Apache Kylin with improvements including an all-in-Spark cubing engine, and support for using MySQL for the Kylin metastore.

kylin

Kylin is an open source distributed analytics engine designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Apache. It was originally developed at eBay before becoming an Apache project.

The Kylin OLAP Engine is made up of a metadata engine, a query engine, a job engine and a storage engine. It also includes a REST Server to service client requests. The query engine is based on Apache Calcite.

The new all-in-Spark cubing engine means that Kylin’s Spark engine will now run all distributed jobs in Spark, including fetch distinct dimension values, converting cuboid files to HBase HFile, merging segments and merging dictionaries. The developers say that the default configurations are tuned so the user can get an out-of-box experience, and that the performance is expected to improve. Job management for Spark has also been improved so that you can get the job link in the web console once Spark starts to run. If you discard the job, Kylin will kill the Spark job to release the resource, and if Kylin is restarted, it can resume from the previous job instead of resubmitting a new job.

In previous versions, the only choice for storing Kylin metadata was HBase, but from this release you can choose to use MySQL instead. This will overcome problems caused by the fact that replicated HBase is read only, so doesn't really work when used for Kylin's High Availability in a clustered structure. MySQL will be able to work correctly in such cases, though the function is currently in beta.

The next improvement is the ability to create Hybrid models in a custom web GUI. Hybrid is an advanced model for creating multiple cubes, and it can be used for the Cube schema change issue. This function had no GUI in the past so was used by only a small portion of Kylin users. This version of Kylin adds a web GUI to make it easier to use. 

Another cube related improvement the enabling by default of the cube planner, a feature added in Kylin 2.3 to optimize the cube structure. The cube planner can not only optimize the cube structure, but by doing that can use less computing and/or storage resources and improve the query performance. The algorithm will automatically optimize the cube by your data statistics on the first build.

This release also offers better segment pruning to reduce the disk and network I/O. Until now, Kylin only pruned segments by the partition column’s value, but this version records the minimum and maximum values for all dimensions at the segment level.

Other improvements include the ability to carry out dictionary merges on YARN rather than in Kylin’s JVM; better estimating of cube size for TOPN and COUNT DISTINCT measures; and support for Hadoop 3 and HBase 2. 

kylin

More Information

Kylin Website

Related Articles

Kylin 2.3.0 Adds SQL Server Support

Apache Kylin Gets Table Level ACL Management

Apache Kylin Adds RDBMS Support 

Spark BI Gets Fine Grain Security

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


The Advent of SQL 2024 Has Commenced
11/12/2024

It's Advent - the time of year when we countdown the days to Christmas - and if your are a programmer complete daily coding challenges with the Advent of Code, the Advent of Perl, the Advent of Java,  [ ... ]



The Art Of Computer Programming - A Great Present
15/12/2024

If you are looking for a programmer present this holiday season, there is one book, or set of books, that should be top of any list... Donald Knuth's The Art of Computer Programming.


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 02 October 2018 )