Lucene and Elastic Search
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
Summary
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
Things to Remember
Lucene is a free, open source project implemented in Java.
Lucene to provide full-text indexing across both database objects and documents
Elasticsearch is distributed. It fully supports the near real-time search of Apache Lucene.
Handling multitenancy is not a special configuration, whereas a more advanced setup is necessary with Solr.
Elasticsearch introduces the concept of the Gateway, which makes full backups easier.
MCQs
No MCQs found.
Subjective Questions
No subjective questions found.
Videos
No videos found.

Lucene and Elastic Search
Lucene
Lucene is a free, open-source project implemented in Java.
licensed under Apache Software Foundation.
Lucene itself is a single JAR (Java Archive) file, less than 1 MB in size, and with no dependencies, and integrates into the simplest Java stand-alone console program as well as the most sophisticated enterprise application.
Rich and powerful full-text search library.
Lucene to provide full-text indexing across both database objects and documents in various formats (Microsoft Office documents, PDF, HTML, text, and so on).
supporting full-text search using Lucene requires two steps:
- creating a Lucene index : creating a Lucene index on the documents and/or database objects.
- Parsing looking up : parsing the user query and looking up the prebuilt index to answer the query.

Creating an index (IndexWriter Class)
- The first step in implementing full-text searching with Lucene is to build an index.
- To create an index, the first thing that needs to do is to create an IndexWriter object.
- The IndexWriter object is used to create the index and to add new index entries (i.e., Documents) to this index. You can create an IndexWriter as follows
IndexWriter indexWriter = new IndexWriter("index-directory", new
StandardAnalyzer(), true);
Parsing the Documents (Analyzer Class)
- The job of Analyzer is to "parse" each field of your data into indexable "tokens" or keywords.
- Several types of analyzers are provided out of the box. Table 1 shows some of the most interesting ones.
- StandardAnalyzer : A sophisticated general-purpose analyzer.
- WhitespaceAnalyzer : A very simple analyzer that just separates tokens using white space.
- StopAnalyzer : Removes common English words that are not usually useful for indexing.
- SnowballAnalyzer : An interesting experimental analyzer that works on word roots (a
search on rain should also return entries with raining, rained, and so on).
Adding a Document/object to Index (Document Class)
- To index an object, we use the Lucene Document class, to which we add the fields that you want to be indexed.
Document doc = new Document();
doc.add(new Field("description",hotel.getDescription(), Field.Store.YES, Field.Index.TOKENIZED));
Elastic-search
Elasticsearch is a search server based on Lucene. It is developed by Shaun Banon. It provides a distributed multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License.
Features of Elasticsearch [1]
- open-source search & analytics engine
- for structured & unstructured Data
- real time
- analytics capabilities (facets)
- schema-free
- high availability
- per-operation persistence
- RESTful API based
- distributed
- designed for the Cloud/BigData
References:
- "What's New?" at www.elastic.co/products/elastic search
- "Lucene in Action", Second Edition by Michael McCandless
- "A Short Introduction to Lucene" at oak.cs.ucla.edu/cs144/projects/lucene/
Lesson
Searching and Indexing Big Data
Subject
Computer Engineering
Grade
Engineering
Recent Notes
No recent notes.
Related Notes
No related notes.