Knowledge information retrieval isn’t a luxury requirement that your application may or may not provide. Best applications are those that are providing cross-site search, so it’s minimize the efforts that you spent when trying to find or locate a piece of information.
Typically, you choose the Portal to build your application for many advantages that it provides. Mainly, you can consider the most important one is the ability to integrate with the latest search engines, thus, provide one central location for the users to get all contents searched including those static content like HTML, file systems and etc.
This tutorial will guide you through a well-thought-out steps that lead you finally into integrating Apache Pluto Portal with the latest version of Apache Lucene Search.
Lucene Concept
Lucene is a search engine, it contains a lot of components that work each together to get you finally the result that you want. It’s important for you to get passed upon these components as that should help you gather the maximum benefit for what already supposed to be at this tutorial.
Mainly, there’s two key functions that Lucene provides; creating and index and executing a user’s query. Your application is responsible for setting up each of these, but these operation will be done separately.
Figure below shows you the first step you should pass through to ensure that your documents (Contents) are indexed.
While querying the index should be depicted by the below figure:
Sections below will help you getting further details about all of these components that you saw involved in the creating/querying index.
Documents
Ideally, Lucene’s index consists of documents and the lucene document consists of one indexed object. This object could be a database record, web page, Java Object and etc.
Each document consists of set of fields and each field is a pair of name/value that represents a piece of content. A given samples on those fields might be title, summary, content, etc.
To use a lucene’s document object you should have an object of type org.apache.lucene.document.Document
class.
Analyzer
Analyzer is the pump heart of Lucene, you use Analyzer and its structural type in creating the Lucene index and inquiring it after then. Analyzer has the ability to turn free text into tokens that can be inquired later on.
Lucene has provided a lot of types of Analyzer as you can use the most fit one for your application. When you add a document to lucene’s index, Lucene will use the analyzer to process the text for every fields that are located at that document.
You should be able of locating different types of Analyzers underneath org.apache.lucene.analysis
package.
Query
Query object is the object that you used for inquiring the Lucene index. To create a Query object you may use different kinds of ways to achieve a Query against your index. You may return back into Lucene API to know more about :
TermQuery
BooleanQuery
WildcardQuery
PhraseQuery
PrefixQuery
MultiPhraseQuery
FuzzyQuery
RegexpQuery
TermRangeQuery
NumericRangeQuery
ConstantScoreQuery
DisjunctionMaxQuery
MatchAllDocsQuery
Field
As we’ve stated earlier, a field is a pair of name/value that represents one piece of metadata or content for a Lucene document. Each field may be indexed, stored and/or tokenized. Indexed fields are searchable in Lucene, and Lucene will process them when the indexer adds the document to the index.
Processing of document’s fields into sets of individual tokens is the job of Lucene Analyzer. A field object exist at the package org.apache.lucene.document
.
TopScoreDocCollector
A collector implementation that collects the top-scoring hits, returning them as a TopDocs. This is used by IndexSearcher to implement TopDocs-based search. Hits are sorted by score descending and then (when the scores are tied) docID ascending.
IndexSearcher
You may notice below at the proposed sample that we used IndexSearcher that’s located at org.apache.lucene.search.Index.IndexSearcher
package to make a search against out index and using the provided Query.
Mainly, to get an IndexSearcher object you need to pass IndexReader as an argument to its constructor. As soon as you’ve invoked search against your IndexSearcher, the Collector object has propagated with the search result so that you can invoke topDocs().scoreDocs
to acquire the hits object that is mainly contained for all of searched documents.
Hits
The search method on the IndexSearcher class returns an org.apache.lucene.search.Hits
object which mainly contains the searched documents so that you can access, process and display all of them in whatever the form you want.
Hits object isn’t just simple Collection object, as much bigger as the result can be, the importance of Hits methods are become so critical and surely helpful. Hits object has mainly provided you a three methods that can be used for several reasons:
public final Document doc(int n) throws IOException
which mainly returns a Document that contains all of the document’s fields that were stored at the time the document as indexed.public final int length()
which mainly returns the number of search results that matched the query.public final float score(int n) throws IOException
which mainly returns the calculated score for each hit in the search results.
Index Building – Indexer
Following sample below shows how you can leverage the Lucene API to index set of proposed JournalDev Tutorials. This index shall help you inquiring about any Tutorial that JournalDev site has provided.
This index will assume that you’re looking for Tutorials by their Title.
Indexer.java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
package com.journaldev.portlet; import java.io.File; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.TextField; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.store.Directory; import org.apache.lucene.store.SimpleFSDirectory; import org.apache.lucene.util.Version; public class Indexer { static { try { System.out.println("Initialize of Indexer ::"); // Create an analyzer StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_40); // Create a Lucene directory Directory dir = new SimpleFSDirectory(new File("D:\LuceneSearch\store")); System.out.println("Clean Index ::"); for (String fileName : dir.listAll()){ dir.deleteFile(fileName); } // Create index configuration writer IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, analyzer); // Create writer IndexWriter writer = new IndexWriter(dir, config); // Tutorial Topics String [] topics = {"Apache Pluto Tutorial","Hibernate Tutorial","Spring Tutorial","JSP & Servlet Tutorial","Primefaces Tutorial","LuceneSearch Tutorial"}; for(String topic : topics){ // Create document Document doc = new Document(); // Add field doc.add(new TextField("title",topic,Field.Store.YES)); // write document writer.addDocument(doc); } // Commit changes writer.commit(); // Close the stream, so that you can open a read stream writer.close(); System.out.println("All Tutorials Are Indexed ::"); } catch(Exception e){ e.printStackTrace(); } } } |
Here’s below an additional clarification for the proposed code above:
- You have multiple types of Store locations, you can use RAMDirectory or something else you may find it eligible instead of using Physical location. This indexer above has used SimpleSFDirectory (Simple System File Directory) as a location for the index’s segments.
- This indexer will be got executed as soon as the Indexer class has loaded by the ClassLoader. This kind of loading will absolutely trigger the static initializer to start indexing a proposed documents.
- To prevent the index from indexing the same document multiple times at each time the Indexer got loaded, we provided a simple remove mechanism that help you clear the index directory.
- We’ve used a simple Analyzer for generating the needed tokens.
- We’ve supposed a different Topics that JournalDev site has provided through defining of
String [] topics
Tutorial array. - For every single Tutorial we defined a document has been created with one
title
field and indexed as well. - All changes on the index shall be committed.
- Index write shall be closed so that another writer/reader can consume the created index.
- In case you’ve missed closing your own writer once its work got finished, an exception will be thrown.
Simple Lucene Search Portlet
Following below a simple Lucene Search Portlet that’s already built upon the same used index.
Remember, you always use doView for rendering the view of the Portlet, meanwhile processAction has been used for initiating actions against your Portlet.
LuceneSearch.java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
package com.journaldev.portlet; import java.io.File; import java.io.IOException; import java.io.PrintWriter; import javax.portlet.ActionRequest; import javax.portlet.ActionResponse; import javax.portlet.GenericPortlet; import javax.portlet.PortletException; import javax.portlet.RenderRequest; import javax.portlet.RenderResponse; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.IndexReader; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TopScoreDocCollector; import org.apache.lucene.store.Directory; import org.apache.lucene.store.SimpleFSDirectory; import org.apache.lucene.util.QueryBuilder; import org.apache.lucene.util.Version; public class LuceneSearch extends GenericPortlet{ static { try { // Load the Indexer Class.forName("com.journaldev.portlet.Indexer"); } catch (ClassNotFoundException e) { e.printStackTrace(); } } private ScoreDoc [] hits = new ScoreDoc[0]; private IndexSearcher searcher = null; public void doView(RenderRequest request, RenderResponse response) throws PortletException, IOException { synchronized(hits){ // Get the writer PrintWriter out = response.getWriter(); if(request.getParameter("status") == null || request.getParameter("status").equals("initial")){ // Print out the form Tag out.print("<form method="GET" action=""+response.createActionURL()+"">"); // Print out the search input out.print("<p>Search about your favor Tutorial That JournalDev has presented : <input type="text" " + "name="query" id="query"/></p>"); // Print out the search command out.print("<br/> " + "<input type="submit" value="Search"/>"); // close form out.print("</form>"); } else { // Print out the form Tag out.print("<form method="GET">"); // Print out the result for(ScoreDoc hit : hits){ int docId = hit.doc; Document d = searcher.doc(docId); out.print("<p>Tutorial Is <span style="font-style: oblique;font-weight: bolder;">"+d.get("title")+"</span> <span style="color:red">With Score :"+hit.score+"</span></p>"); } // Print out the render link out.print("<br/>" + "<a href=""+response.createRenderURL()+""?status=initial>Search Again</a>"); // Print out the form Tag out.print("</form>"); // Check whether the searcher is not null to close it if(searcher != null){ // Close the reader for future modifications on the indexer searcher.getIndexReader().close(); } } } } public void processAction(ActionRequest request, ActionResponse response) throws PortletException, IOException { // Fetch the hits synchronized (hits){ // Reset the hits object hits = new ScoreDoc[0]; // Create an analyzer StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_40); // Create a Lucene directory Directory dir = new SimpleFSDirectory(new File("D:\LuceneSearch\store")); // Open index reader IndexReader reader = IndexReader.open(dir); // Create index searcher searcher = new IndexSearcher(reader); // Inquiry using QueryBuidler Query query = new QueryBuilder(analyzer).createPhraseQuery("title", request.getParameter("query")); // Create collector TopScoreDocCollector collector = TopScoreDocCollector.create(10, true); // Search using defined query and fill in the resulted in document inside collector searcher.search(query, collector); // Acquire the hits this.hits = collector.topDocs().scoreDocs; response.setRenderParameter("status", "searched"); } } } |
Here’s below a detailed clarification for the code listed above:
- According for best Portlet design,
doView
is will be used for displaying the search form and displaying the search result as well. At the same time,processAction
will be used for handling the user’s query and to do the actual search work. - LuceneSearch Portlet will load the
Indexer
class, so that the index will be created for next coming search operations. - Two different instance variables have been defined and used;
hits
andsearcher
. - In case request’s parameter
status
is null or equal to initial, a search form will be provided for the end user to fill in his/her Tutorial title that he/she is looking for. - In case request’s parameter
status
isn’t null or equal to initial, that means the user has clicked on the search action and the search results has been propagated and waiting to display. - To protect your application from multiple requests that can affect the result to be inconsistent, a
synchronized
block has been provided for both of doView & processAction. - Once the user has clicked on the search action, processAction method got executed and the search operation has been started.
- Hits object will be propagated with the resulted in documents and
status
parameter changed to be searched. - IndexWriter and IndexReader are used for writing to and reading from, respectively.
- doView method starts its work once the processAction got finished.
- The search result will be displayed and the IndexReader will be closed. This close will help you avoiding any lock your read operation may cause. If you’re trying to write on your index while the reading process is already running an exception will be thrown and vice versa is true.
- The results will be displayed attached with their scores.
Simple Lucene Search Portlet Demo
Following below the normal flow that you may face if you’re deploying the Portlet into your Apache Pluto. This Tutorial assumes that you’re already familiar with the Apache Pluto and know exactly how you can create a Portal Page and deploying your Portlet within it.
In case you’ve missed out this important practice, it’s better for you to return back into Introduction Into Apache Pluto.
Summary
Search functionality is a key aspect that most recent sites provide it. Most applications these days don’t rely on a single location to retain its data, it’s most probably tend to search against database records, HTML pages, word document and many others. Best solution for this issue is having a single search engine that can do its work against all of these types of data in uniform interface.
This tutorial will help you getting started leveraging Lucene Search Engine and enabling you to create a Search Portlet. Contribute us by commenting below and find below this downloadable source code for your experimental.