Apache Lucene Adds Similarity Vector Searches
Written by Kay Ewbank   
Tuesday, 27 February 2024

Apache Lucene 9.10 has been released with support for similarity-based vector searches. Other improvements include block join compatible index sorting, and several improvements to ensure the software takes advantage of the now finalized JDK foreign memory API internally when running on Java 22 or later.

Apache Lucene is a high-performance search engine library written entirely in Java. The developers describe it as being suitable for nearly any application that requires structured search, full-text search, faceting, nearest-neighbor search on high-dimensionality vectors, spell correction or query suggestions. There's also a PyLucene sub project that provides Python bindings for Lucene Core.

lucene

Until recently,  the Solr sub project was part of Lucene, but this has now moved to a separate Apache Top Level Project (TLP). Solr is a popular open source enterprise search platform built on Apache Lucene.

One of the technologies underpinning Lucene is Apache OpenNLP, an open source machine learning library for natural language processing (NLP) for Java.

The commercial uses of Lucene include Amazon Elasticsearch, a free and open search and analytics solution that includes an HTTP web interface and schema-free JSON documents. Elasticsearch is built on Apache Lucene, and Amazon OpenSearch is an open source fork of Elasticsearch.

The main improvement to the latest release is the addition of support for indexing high-dimensionality numeric vectors to perform nearest-neighbor search, using the Hierarchical Navigable Small World graph algorithm. This finds all the vectors scoring above a 'resultSimilarity' while traversing the HNSW graph till better-scoring nodes are available, or the best candidate is below a score of 'traversalSimilarity' in the lowest level.

The second improvement of note means index sorting is now compatible with block joins. This means that IndexWriter preserves document blocks that are indexed when index sorting is configured.

The MMapDirectory has been improved to take advantage of the now finalized JDK foreign memory API internally when running on Java 22 (or later), and SIMD vectorization now takes advantage of the JDK vector incubator on Java 22.

A number of optimizations have also been added to speed queries that match lots of terms; and to make r that have short postings range queries on points end faster.

Lucene 9.10 is available now.

lucene

More Information

Lucene Website

Related Articles

Lucene Core and Solr updated to 3.3

Amazon Announces OpenSearch

Elastic 8 Enhances ElasticSearch

New Amazon Elasticsearch Service

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Extend NGINX With The New JavaScript Module
28/10/2024

Inject middleware functionality into NGINX with the expressive power of Javascript. NGINX JavaScript or NJS for short is a dynamic module under which you can use scripting for hooking into the NGINX e [ ... ]



Google Intensive AI Course - Free On Kaggle
05/11/2024

Google is offering a 5-Day Gen AI Intensive Course designed to equip data scientists with the knowledge and skills to tackle generative AI projects with confidence. It runs on the Kaggle platform from [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 27 February 2024 )