Apache Pinot 1.0 Released |
Written by Kay Ewbank |
Tuesday, 26 September 2023 |
Apache Pinot 1.0 has been released. The real-time distributed OLAP datastore has been purpose-built for low-latency, high-throughput analytics. Pinot was originally developed at LinkedIn in 2013 to enable the company to run various queries including showing users who had viewed their profile. Highlights of the 1.0 release are an extension to the multi-stage query engine, upsert capabilities (delete, metadata TTL, segment preloading and segment compaction), NULL value support in queries, support for SPI-based pluggable indexes, and improvements to the Spark 3 connector.
Pinot can perform typical analytical operations such as slice and dice, drill down, roll up, and pivot on large scale multi-dimensional data. Apache describes Pinot as being ideal for ingesting and immediately querying data from streaming or batch data sources (including, Apache Kafka, Amazon Kinesis, Hadoop HDFS, Amazon S3, Azure ADLS, and Google Cloud Storage). It provides ultra low-latency analytics even at extremely high throughput, and is a columnar data store with several smart indexing and pre-aggregation techniques. The developers say it offers consistent performance based on the size of your cluster and an expected query per second (QPS) threshold. The current version of Pinot's features include real-time support for upsert mutations (if exists update else insert) that are used when it's not clear if the respective row is already present in the database. It also supports query-time Native JOINs through its multi-stage query engine which efficiently manages complex analytical queries, including JOIN operations. The developers say this engine alleviates computational burdens by offloading tasks from brokers to a dedicated intermediate compute stage. It can also handle semi-structured or unstructured data, and the team says offers "improving" ANSI SQL compliance. The team says the original query engine works very well for simpler filter-and-aggregate queries, but the broker could become a bottleneck for more complex queries. The new engine resolves this by introducing intermediary compute stages on the query servers, and brings Apache Pinot closer to full ANSI SQL semantics. Apache says that for application developers, Pinot works well as an aggregate store that sources events from streaming data sources, such as Kafka, and makes it available for a query using SQL. You can also use Pinot to aggregate data across a microservice architecture into one easily queryable view of the domain. Apache Pinot is available now.
More InformationRelated ArticlesApache Iceberg Improves Spark Support Spark BI Gets Fine Grain Security Kafka Adds KRaft-Based Authorizer To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info |
Last Updated ( Tuesday, 26 September 2023 ) |