Apache Lucene Improves Sparce Indexing
Written by Kay Ewbank   
Tuesday, 22 October 2024

Apache Lucene 10 has been released. The updated version adds a new IndexInput prefetch API, support for sparse indexing on doc values, and upgraded Snowball dictionaries resulting in improved tokenization.

Apache Lucene is a high-performance search engine library written entirely in Java. The developers describe it as being suitable for nearly any application that requires structured search, full-text search, faceting, nearest-neighbor search on high-dimensionality vectors, spell correction or query suggestions. There's also a PyLucene sub project that provides Python bindings for Lucene Core.

lucene

The first improvement is the new IndexInput#prefetch API, which means query evaluation logic can let the Directory know about regions of data that are about to be read. This helps perform I/O concurrently.
Lucene also now has support for sparse indexing on doc values. The sparse index will record the minimum and maximum values per block of doc IDs, and when used in conjunction with index sorting to cluster similar documents, allows for very space-efficient and CPU-efficient filtering.

Search concurrency has also been improved so that it is now decoupled from the index geometry, meaning an index can be searched using any number of threads, regardless of its number of segments.

Snowball dictionaries have been upgraded, resulting in improved tokenization, and Kmeans clustering has been added on vectors.

This release also adds initial support for intra-segment concurrency, meaning the index searcher now supports searching across leaf reader partitions concurrently. The developers say this helps make maximum use of available resources especially with force merged indices or big segments, but there is still a performance penalty for queries that require segment-level computation ahead of time, such as points/range queries. This is an implementation limitation that the developers expect to improve in future releases, but at the moment intra-segment slicing is not enabled by default.

Lucene 10.0 is available now.

lucene

More Information

Lucene Website

Related Articles

Apache Lucene Adds Similarity Vector Searches

Lucene Core and Solr updated to 3.3

Amazon Announces OpenSearch

Elastic 8 Enhances ElasticSearch

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Google Home APIs In Android Beta
13/01/2025

Google has made the Android version of its Home APIs available in public developer beta, with the iOS version to follow in the coming months. 



Rust Foundation Update On Goals
03/02/2025

Over the last six months, the Rust project has been working towards implementing 26 project goals, with 3 of them designated as Flagship Goals. The team has now provided an end-of-year update on progr [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info