5.11. Extra features for Lucene indices

Prev		Next

5.11.1. Numeric ranges
5.11.2. Sorting
5.11.3. Querying with Lucene Query objects
5.11.4. Compound queries
5.11.5. Default operator
5.11.6. Caching

5.11.1. Numeric ranges

Lucene supports smart indexing of numbers, querying for ranges and sorting such results, and so does its backend for Neo4j. To mark a value so that it is indexed as a numeric value, we can make use of the ValueContext class, like this:

movies.add( theMatrix, "year-numeric", new ValueContext( 1999L ).indexNumeric() );
movies.add( theMatrixReloaded, "year-numeric", new ValueContext( 2003L ).indexNumeric() );

// Query for range
long startYear = 1997;
long endYear = 2001;
hits = movies.query( NumericRangeQuery.newLongRange( "year-numeric", startYear, endYear, true, true ) );

	Note
	Values that are indexed numerically must be queried using NumericRangeQuery.

5.11.2. Sorting

Lucene performs sorting very well, and that is also exposed in the index backend, through the QueryContext class:

hits = movies.query( "title", new QueryContext( "*" ).sort( "title" ) );
for ( Node hit : hits )
{
    // all movies with a title in the index, ordered by title
}
// or
hits = movies.query( new QueryContext( "title:*" ).sort( "year", "title" ) );
for ( Node hit : hits )
{
    // all movies with a title in the index, ordered by year, then title
}

We sort the results by relevance (score) like this:

hits = movies.query( "title", new QueryContext( "The*" ).sortByScore() );
for ( Node movie : hits )
{
    // hits sorted by relevance (score)
}

5.11.3. Querying with Lucene Query objects

Instead of passing in Lucene query syntax queries, you can instantiate such queries programmatically and pass in as argument, for example:

// a TermQuery will give exact matches
Node actor = actors.query( new TermQuery( new Term( "name", "Keanu Reeves" ) ) ).getSingle();

Note that the TermQuery is basically the same thing as using the get method on the index.

This is how to perform wildcard searches using Lucene Query Objects:

hits = movies.query( new WildcardQuery( new Term( "title", "The Matrix*" ) ) );
for ( Node movie : hits )
{
    System.out.println( movie.getProperty( "title" ) );
}

Note that this allows for whitespace in the search string.

5.11.4. Compound queries

Lucene supports querying for multiple terms in the same query, like so:

hits = movies.query( "title:*Matrix* AND year:1999" );

	Caution
	Compound queries can’t search across committed index entries and those who haven’t got committed yet at the same time.

5.11.5. Default operator

The default operator (that is whether AND or OR is used in between different terms) in a query is OR. Changing that behavior is also done via the QueryContext class:

QueryContext query = new QueryContext( "title:*Matrix* year:1999" ).defaultOperator( Operator.AND );
hits = movies.query( query );

5.11.6. Caching

If your index lookups becomes a performance bottle neck, caching can be enabled for certain keys in certain indices (key locations) to speed up get requests. The caching is implemented with an LRU cache so that only the most recently accessed results are cached (with "results" meaning a query result of a get request, not a single entity). You can control the size of the cache (the maximum number of results) per index key.

Index<Node> index = graphDb.index().forNodes( "actors" );
( (LuceneIndex<Node>) index ).setCacheCapacity( "name", 300000 );

	Caution
	This setting is not persisted after shutting down the database. This means: set this value after each startup of the database if you want to keep it.