Apache Lucene Improves Sparce Indexing
Written by Kay Ewbank   
Tuesday, 22 October 2024

Apache Lucene 10 has been released. The updated version adds a new IndexInput prefetch API, support for sparse indexing on doc values, and upgraded Snowball dictionaries resulting in improved tokenization.

Apache Lucene is a high-performance search engine library written entirely in Java. The developers describe it as being suitable for nearly any application that requires structured search, full-text search, faceting, nearest-neighbor search on high-dimensionality vectors, spell correction or query suggestions. There's also a PyLucene sub project that provides Python bindings for Lucene Core.

lucene

The first improvement is the new IndexInput#prefetch API, which means query evaluation logic can let the Directory know about regions of data that are about to be read. This helps perform I/O concurrently.
Lucene also now has support for sparse indexing on doc values. The sparse index will record the minimum and maximum values per block of doc IDs, and when used in conjunction with index sorting to cluster similar documents, allows for very space-efficient and CPU-efficient filtering.

Search concurrency has also been improved so that it is now decoupled from the index geometry, meaning an index can be searched using any number of threads, regardless of its number of segments.

Snowball dictionaries have been upgraded, resulting in improved tokenization, and Kmeans clustering has been added on vectors.

This release also adds initial support for intra-segment concurrency, meaning the index searcher now supports searching across leaf reader partitions concurrently. The developers say this helps make maximum use of available resources especially with force merged indices or big segments, but there is still a performance penalty for queries that require segment-level computation ahead of time, such as points/range queries. This is an implementation limitation that the developers expect to improve in future releases, but at the moment intra-segment slicing is not enabled by default.

Lucene 10.0 is available now.

lucene

More Information

Lucene Website

Related Articles

Apache Lucene Adds Similarity Vector Searches

Lucene Core and Solr updated to 3.3

Amazon Announces OpenSearch

Elastic 8 Enhances ElasticSearch

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Tesla's Optimus Robot Expected To Cost Less Than $30,000 ... Eventually
18/10/2024

No fewer than 37 Optimus Robots mingled with guests at last week's We, Robot event and the audience were treated to the spectacle of around seven of them performing a dance routine. Elon Musk also mad [ ... ]



Check Your APIs With Zuplo's Rate My OpenAPI
15/10/2024

Zuplo has launched a new suite of tools that rates the quality of your API, based on its OpenAPI specification. We put it through its paces and find it useful.


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info