Apache Lucene Adds Similarity Vector Searches
Written by Kay Ewbank   
Tuesday, 27 February 2024

Apache Lucene 9.10 has been released with support for similarity-based vector searches. Other improvements include block join compatible index sorting, and several improvements to ensure the software takes advantage of the now finalized JDK foreign memory API internally when running on Java 22 or later.

Apache Lucene is a high-performance search engine library written entirely in Java. The developers describe it as being suitable for nearly any application that requires structured search, full-text search, faceting, nearest-neighbor search on high-dimensionality vectors, spell correction or query suggestions. There's also a PyLucene sub project that provides Python bindings for Lucene Core.

lucene

Until recently,  the Solr sub project was part of Lucene, but this has now moved to a separate Apache Top Level Project (TLP). Solr is a popular open source enterprise search platform built on Apache Lucene.

One of the technologies underpinning Lucene is Apache OpenNLP, an open source machine learning library for natural language processing (NLP) for Java.

The commercial uses of Lucene include Amazon Elasticsearch, a free and open search and analytics solution that includes an HTTP web interface and schema-free JSON documents. Elasticsearch is built on Apache Lucene, and Amazon OpenSearch is an open source fork of Elasticsearch.

The main improvement to the latest release is the addition of support for indexing high-dimensionality numeric vectors to perform nearest-neighbor search, using the Hierarchical Navigable Small World graph algorithm. This finds all the vectors scoring above a 'resultSimilarity' while traversing the HNSW graph till better-scoring nodes are available, or the best candidate is below a score of 'traversalSimilarity' in the lowest level.

The second improvement of note means index sorting is now compatible with block joins. This means that IndexWriter preserves document blocks that are indexed when index sorting is configured.

The MMapDirectory has been improved to take advantage of the now finalized JDK foreign memory API internally when running on Java 22 (or later), and SIMD vectorization now takes advantage of the JDK vector incubator on Java 22.

A number of optimizations have also been added to speed queries that match lots of terms; and to make r that have short postings range queries on points end faster.

Lucene 9.10 is available now.

lucene

More Information

Lucene Website

Related Articles

Lucene Core and Solr updated to 3.3

Amazon Announces OpenSearch

Elastic 8 Enhances ElasticSearch

New Amazon Elasticsearch Service

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Redis Changes License, Rival Fork Launched
03/04/2024

The developers of Redis have announced that they are changing the licensing model for the database. From now on, all future versions of Redis will be released with source-available licenses rather tha [ ... ]



JetBrains Launches IDE Services
09/04/2024

JetBrains has launched a new product suite for enterprises. JetBrains IDE Services is designed for use by large organizations with the aim of boosting developer productivity at scale.


More News

raspberry pi books

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 27 February 2024 )