Apache Kylin 2.5 Adds All-in-Spark Cubing Engine
Written by Kay Ewbank   
Tuesday, 02 October 2018

There's a new release of Apache Kylin with improvements including an all-in-Spark cubing engine, and support for using MySQL for the Kylin metastore.

kylin

Kylin is an open source distributed analytics engine designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Apache. It was originally developed at eBay before becoming an Apache project.

The Kylin OLAP Engine is made up of a metadata engine, a query engine, a job engine and a storage engine. It also includes a REST Server to service client requests. The query engine is based on Apache Calcite.

The new all-in-Spark cubing engine means that Kylin’s Spark engine will now run all distributed jobs in Spark, including fetch distinct dimension values, converting cuboid files to HBase HFile, merging segments and merging dictionaries. The developers say that the default configurations are tuned so the user can get an out-of-box experience, and that the performance is expected to improve. Job management for Spark has also been improved so that you can get the job link in the web console once Spark starts to run. If you discard the job, Kylin will kill the Spark job to release the resource, and if Kylin is restarted, it can resume from the previous job instead of resubmitting a new job.

In previous versions, the only choice for storing Kylin metadata was HBase, but from this release you can choose to use MySQL instead. This will overcome problems caused by the fact that replicated HBase is read only, so doesn't really work when used for Kylin's High Availability in a clustered structure. MySQL will be able to work correctly in such cases, though the function is currently in beta.

The next improvement is the ability to create Hybrid models in a custom web GUI. Hybrid is an advanced model for creating multiple cubes, and it can be used for the Cube schema change issue. This function had no GUI in the past so was used by only a small portion of Kylin users. This version of Kylin adds a web GUI to make it easier to use. 

Another cube related improvement the enabling by default of the cube planner, a feature added in Kylin 2.3 to optimize the cube structure. The cube planner can not only optimize the cube structure, but by doing that can use less computing and/or storage resources and improve the query performance. The algorithm will automatically optimize the cube by your data statistics on the first build.

This release also offers better segment pruning to reduce the disk and network I/O. Until now, Kylin only pruned segments by the partition column’s value, but this version records the minimum and maximum values for all dimensions at the segment level.

Other improvements include the ability to carry out dictionary merges on YARN rather than in Kylin’s JVM; better estimating of cube size for TOPN and COUNT DISTINCT measures; and support for Hadoop 3 and HBase 2. 

kylin

More Information

Kylin Website

Related Articles

Kylin 2.3.0 Adds SQL Server Support

Apache Kylin Gets Table Level ACL Management

Apache Kylin Adds RDBMS Support 

Spark BI Gets Fine Grain Security

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Go At Highest Rank Ever in TIOBE Index
20/11/2024

Go is currently in 7th place in the TIOBE Index for November 2024. Not only is this is the highest position it has ever had, it's percentage rating is almost equal to its all-time-high. Will Go contin [ ... ]



Extend NGINX With The New JavaScript Module
28/10/2024

Inject middleware functionality into NGINX with the expressive power of Javascript. NGINX JavaScript or NJS for short is a dynamic module under which you can use scripting for hooking into the NGINX e [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 02 October 2018 )