Apache Lucene Adds Similarity Vector Searches |
Written by Kay Ewbank |
Tuesday, 27 February 2024 |
Apache Lucene 9.10 has been released with support for similarity-based vector searches. Other improvements include block join compatible index sorting, and several improvements to ensure the software takes advantage of the now finalized JDK foreign memory API internally when running on Java 22 or later. Apache Lucene is a high-performance search engine library written entirely in Java. The developers describe it as being suitable for nearly any application that requires structured search, full-text search, faceting, nearest-neighbor search on high-dimensionality vectors, spell correction or query suggestions. There's also a PyLucene sub project that provides Python bindings for Lucene Core. Until recently, the Solr sub project was part of Lucene, but this has now moved to a separate Apache Top Level Project (TLP). Solr is a popular open source enterprise search platform built on Apache Lucene. One of the technologies underpinning Lucene is Apache OpenNLP, an open source machine learning library for natural language processing (NLP) for Java. The commercial uses of Lucene include Amazon Elasticsearch, a free and open search and analytics solution that includes an HTTP web interface and schema-free JSON documents. Elasticsearch is built on Apache Lucene, and Amazon OpenSearch is an open source fork of Elasticsearch. The main improvement to the latest release is the addition of support for indexing high-dimensionality numeric vectors to perform nearest-neighbor search, using the Hierarchical Navigable Small World graph algorithm. This finds all the vectors scoring above a 'resultSimilarity' while traversing the HNSW graph till better-scoring nodes are available, or the best candidate is below a score of 'traversalSimilarity' in the lowest level. The second improvement of note means index sorting is now compatible with block joins. This means that IndexWriter preserves document blocks that are indexed when index sorting is configured. The MMapDirectory has been improved to take advantage of the now finalized JDK foreign memory API internally when running on Java 22 (or later), and SIMD vectorization now takes advantage of the JDK vector incubator on Java 22. A number of optimizations have also been added to speed queries that match lots of terms; and to make r that have short postings range queries on points end faster. Lucene 9.10 is available now. More InformationRelated ArticlesLucene Core and Solr updated to 3.3 Elastic 8 Enhances ElasticSearch New Amazon Elasticsearch Service To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info |
Last Updated ( Tuesday, 27 February 2024 ) |