Lucene and Elastic Search

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

Summary

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

Things to Remember

Lucene is a free, open source project implemented in Java.

Lucene to provide full-text indexing across both database objects and documents

Elasticsearch is distributed. It fully supports the near real-time search of Apache Lucene.

Handling multitenancy is not a special configuration, whereas a more advanced setup is necessary with Solr.

Elasticsearch introduces the concept of the Gateway, which makes full backups easier.

MCQs

No MCQs found.

Subjective Questions

No subjective questions found.

Videos

No videos found.

Lucene and Elastic Search

Lucene and Elastic Search

Lucene

Lucene is a free, open-source project implemented in Java.

licensed under Apache Software Foundation.

Lucene itself is a single JAR (Java Archive) file, less than 1 MB in size, and with no dependencies, and integrates into the simplest Java stand-alone console program as well as the most sophisticated enterprise application.

Rich and powerful full-text search library.

Lucene to provide full-text indexing across both database objects and documents in various formats (Microsoft Office documents, PDF, HTML, text, and so on).

supporting full-text search using Lucene requires two steps:

  1. creating a Lucene index : creating a Lucene index on the documents and/or database objects.
  2. Parsing looking up : parsing the user query and looking up the prebuilt index to answer the query.
fig:- Lucene in a search system
fig:- Lucene in a search system

Creating an index (IndexWriter Class)

  • The first step in implementing full-text searching with Lucene is to build an index.
  • To create an index, the first thing that needs to do is to create an IndexWriter object.
  • The IndexWriter object is used to create the index and to add new index entries (i.e., Documents) to this index. You can create an IndexWriter as follows

IndexWriter indexWriter = new IndexWriter("index-directory", new
StandardAnalyzer(), true);


Parsing the Documents (Analyzer Class)

  • The job of Analyzer is to "parse" each field of your data into indexable "tokens" or keywords.
  • Several types of analyzers are provided out of the box. Table 1 shows some of the most interesting ones.
  • StandardAnalyzer : A sophisticated general-purpose analyzer.
  • WhitespaceAnalyzer : A very simple analyzer that just separates tokens using white space.
  • StopAnalyzer : Removes common English words that are not usually useful for indexing.
  • SnowballAnalyzer : An interesting experimental analyzer that works on word roots (a
    search on rain should also return entries with raining, rained, and so on).


Adding a Document/object to Index (Document Class)

  • To index an object, we use the Lucene Document class, to which we add the fields that you want to be indexed.

Document doc = new Document();
doc.add(new Field("description",hotel.getDescription(), Field.Store.YES, Field.Index.TOKENIZED));

Elastic-search

Elasticsearch is a search server based on Lucene. It is developed by Shaun Banon. It provides a distributed multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License.

Features of Elasticsearch [1]

  • open-source search & analytics engine
  • for structured & unstructured Data
  • real time
  • analytics capabilities (facets)
  • schema-free
  • high availability
  • per-operation persistence
  • RESTful API based
  • distributed
  • designed for the Cloud/BigData

References:

  1. "What's New?" at www.elastic.co/products/elastic search
  2. "Lucene in Action", Second Edition by Michael McCandless
  3. "A Short Introduction to Lucene" at oak.cs.ucla.edu/cs144/projects/lucene/

Lesson

Searching and Indexing Big Data

Subject

Computer Engineering

Grade

Engineering

Recent Notes

No recent notes.

Related Notes

No related notes.