Explicit indexing

Explicit indexing in Neo4j is deprecated, and will be removed in the next major release. Consider configuring indexes to support full-text search instead. See Cypher Manual → Indexes to support full-text search.

The functionality described here has been removed in Neo4j 4.0.

Explicit indexing operations are part of the Neo4j index API.

Each index is tied to a unique, user-specified name (for example "first_name" or "books") and can index either org.neo4j.graphdb.Node or org.neo4j.graphdb.Relationship.

The default index implementation is provided by the neo4j-lucene-index component, which is included in the standard Neo4j download. It can also be downloaded separately from https://repo1.maven.org/maven2/org/neo4j/neo4j-lucene-index/. For Maven users, the neo4j-lucene-index component has the coordinates org.neo4j:neo4j-lucene-index and should be used with the same version of org.neo4j:neo4j-kernel. Different versions of the index and kernel components are not compatible in the general case. Both components are included transitively by the org.neo4j:neo4j:pom artifact, which makes it possible to keep the versions in sync.

Transactions

All modifying index operations must be performed inside a transaction, as with any modifying operation in Neo4j.

Create

An index is created if it does not exist when you ask for it. Unless you give it a custom configuration, it will be created with default configuration and backend.

To set the stage for our examples, we can create some indexes to begin with:

IndexManager index = graphDb.index();
Index<Node> actors = index.forNodes( "actors" );
Index<Node> movies = index.forNodes( "movies" );
RelationshipIndex roles = index.forRelationships( "roles" );

This will create two node indexes and one relationship index with default configuration. See Relationship indexes for more information specific to relationship indexes.

See Configuration and full-text indexes for how to create full-text indexes.

You can also check if an index exists like this:

IndexManager index = graphDb.index();
boolean indexExists = index.existsForNodes( "actors" );

The source code for the examples can be found here, ImdbDocTest.java.

Delete

Indexes can be deleted. When deleting, the entire contents of the index will be removed as well as its associated configuration. An index can be created with the same name at a later point in time.

IndexManager index = graphDb.index();
Index<Node> actors = index.forNodes( "actors" );
actors.delete();

Note that the actual deletion of the index is made during the commit of the surrounding transaction. Calls made to such an index instance after delete() has been called are invalid inside that transaction as well as outside (if the transaction is successful), but will become valid again if the transaction is rolled back.

The source code for the examples can be found here, ImdbDocTest.java.

Add

Each index supports associating any number of key-value pairs with any number of entities (nodes or relationships), where each association between entity and key-value pair is performed individually. To begin with, we can add a few nodes to the indexes:

// Actors
Node reeves = graphDb.createNode();
reeves.setProperty( "name", "Keanu Reeves" );
actors.add( reeves, "name", reeves.getProperty( "name" ) );
Node bellucci = graphDb.createNode();
bellucci.setProperty( "name", "Monica Bellucci" );
actors.add( bellucci, "name", bellucci.getProperty( "name" ) );
// multiple values for a field, in this case for search only
// and not stored as a property.
actors.add( bellucci, "name", "La Bellucci" );
// Movies
Node theMatrix = graphDb.createNode();
theMatrix.setProperty( "title", "The Matrix" );
theMatrix.setProperty( "year", 1999 );
movies.add( theMatrix, "title", theMatrix.getProperty( "title" ) );
movies.add( theMatrix, "year", theMatrix.getProperty( "year" ) );
Node theMatrixReloaded = graphDb.createNode();
theMatrixReloaded.setProperty( "title", "The Matrix Reloaded" );
theMatrixReloaded.setProperty( "year", 2003 );
movies.add( theMatrixReloaded, "title", theMatrixReloaded.getProperty( "title" ) );
movies.add( theMatrixReloaded, "year", 2003 );
Node malena = graphDb.createNode();
malena.setProperty( "title", "Malèna" );
malena.setProperty( "year", 2000 );
movies.add( malena, "title", malena.getProperty( "title" ) );
movies.add( malena, "year", malena.getProperty( "year" ) );

Note that there can be multiple values associated with the same entity and key.

Next up, we will create relationships and index them as well:

// we need a relationship type
RelationshipType ACTS_IN = RelationshipType.withName( "ACTS_IN" );
// create relationships
Relationship role1 = reeves.createRelationshipTo( theMatrix, ACTS_IN );
role1.setProperty( "name", "Neo" );
roles.add( role1, "name", role1.getProperty( "name" ) );
Relationship role2 = reeves.createRelationshipTo( theMatrixReloaded, ACTS_IN );
role2.setProperty( "name", "Neo" );
roles.add( role2, "name", role2.getProperty( "name" ) );
Relationship role3 = bellucci.createRelationshipTo( theMatrixReloaded, ACTS_IN );
role3.setProperty( "name", "Persephone" );
roles.add( role3, "name", role3.getProperty( "name" ) );
Relationship role4 = bellucci.createRelationshipTo( malena, ACTS_IN );
role4.setProperty( "name", "Malèna Scordia" );
roles.add( role4, "name", role4.getProperty( "name" ) );

After these operations, our example graph looks like this:

Movie and Actor graph.

The source code for the examples can be found here, ImdbDocTest.java

Remove

Removing (remove()) from an index is similar to adding, but can be done by supplying one of the following combinations of arguments:

  • entity

  • entity, key

  • entity, key, value

// completely remove bellucci from the actors index
actors.remove( bellucci );
// remove any "name" entry of bellucci from the actors index
actors.remove( bellucci, "name" );
// remove the "name" -> "La Bellucci" entry of bellucci
actors.remove( bellucci, "name", "La Bellucci" );

The source code for the example can be found here, ImdbDocTest.java.

Update

To update an index entry, the old one must be removed and a new one added. For details on removing index entries, see Remove.

Remember that a node or relationship can be associated with any number of key-value pairs in an index. This means that you can index a node or relationship with many key-value pairs that have the same key. In the case where a property value changes and you would like to update the index, it is not enough to just index the new value — you will have to remove the old value as well.

Here is a code example that demonstrates how it is done:

// create a node with a property
// so we have something to update later on
Node fishburn = graphDb.createNode();
fishburn.setProperty( "name", "Fishburn" );
// index it
actors.add( fishburn, "name", fishburn.getProperty( "name" ) );
// update the index entry
// when the property value changes
actors.remove( fishburn, "name", fishburn.getProperty( "name" ) );
fishburn.setProperty( "name", "Laurence Fishburn" );
actors.add( fishburn, "name", fishburn.getProperty( "name" ) );

The source code for the example can be found here, ImdbDocTest.java.

An index can be searched in two ways, get and query. The get method will return exact matches to the given key-value pair, whereas query exposes querying capabilities directly from the backend used by the index. For example the Lucene query syntax can be used directly with the default indexing backend.

Get

This is how to search for a single exact match:

IndexHits<Node> hits = actors.get( "name", "Keanu Reeves" );
Node reeves = hits.getSingle();

org.neo4j.graphdb.index.IndexHits is an Iterable with some additional useful methods. For example getSingle() returns the first and only item from the result iterator, or null if there is no hit.

Here is how to get a single relationship by exact matching and retrieve its start and end nodes:

Relationship persephone = roles.get( "name", "Persephone" ).getSingle();
Node actor = persephone.getStartNode();
Node movie = persephone.getEndNode();

Finally, you can iterate over all exact matches from a relationship index:

for ( Relationship role : roles.get( "name", "Neo" ) )
{
    // this will give us Reeves twice
    Node reeves = role.getStartNode();
}

In case you do not iterate through all the hits, IndexHits.close() must be called explicitly.

The source code for the examples can be found here, ImdbDocTest.java.

Query

There are two query methods, one which uses a key-value signature where the value represents a query for values with the given key only. The other method is more generic and supports querying for more than one key-value pair in the same query.

Here is an example using the key-query option:

for ( Node actor : actors.query( "name", "*e*" ) )
{
    // This will return Reeves and Bellucci
}

In the following example the query uses multiple keys:

for ( Node movie : movies.query( "title:*Matrix* AND year:1999" ) )
{
    // This will return "The Matrix" from 1999 only.
}

Beginning a wildcard search with "*" or "?" is discouraged by Lucene, but will nevertheless work.

You cannot have any whitespace in the search term with this syntax. See Querying with Lucene query objects for how to do that.

The source code for the examples can be found here, ImdbDocTest.java.

Relationship indexes

An index for relationships is just like an index for nodes, extended by providing support to constrain a search to relationships with a specific start and/or end node. These extra methods reside in the org.neo4j.graphdb.index.RelationshipIndex interface which extends org.neo4j.graphdb.index.Index<Relationship>.

Example of querying a relationship index:

// find relationships filtering on start node
// using exact matches
IndexHits<Relationship> reevesAsNeoHits;
reevesAsNeoHits = roles.get( "name", "Neo", reeves, null );
Relationship reevesAsNeo = reevesAsNeoHits.iterator().next();
reevesAsNeoHits.close();
// find relationships filtering on end node
// using a query
IndexHits<Relationship> matrixNeoHits;
matrixNeoHits = roles.query( "name", "*eo", null, theMatrix );
Relationship matrixNeo = matrixNeoHits.iterator().next();
matrixNeoHits.close();

And here is an example for the special case of searching for a specific relationship type:

// find relationships filtering on end node
// using a relationship type.
// this is how to add it to the index:
roles.add( reevesAsNeo, "type", reevesAsNeo.getType().name() );
// Note that to use a compound query, we can't combine committed
// and uncommitted index entries, so we'll commit before querying:
tx.success();
tx.close();

// and now we can search for it:
try ( Transaction tx = graphDb.beginTx() )
{
    IndexHits<Relationship> typeHits = roles.query( "type:ACTS_IN AND name:Neo", null, theMatrix );
    Relationship typeNeo = typeHits.iterator().next();
    typeHits.close();

Such an index can be useful if your domain has nodes with a very large number of relationships between them, since it reduces the search time for a relationship between two nodes. A good example where this approach pays dividends is in time series data, where we have readings represented as a relationship per occurrence.

The source code for the examples can be found here, ImdbDocTest.java.

Scores

The IndexHits interface exposes scoring (org.neo4j.graphdb.index.IndexHits.currentScore()) so that the index can communicate scores for the hits.

The result is not sorted by the score unless you explicitly specify that. See Sorting for how to sort by score.

IndexHits<Node> hits = movies.query( "title", "The*" );
for ( Node movie : hits )
{
    System.out.println( movie.getProperty( "title" ) + " " + hits.currentScore() );
}

The source code for the example can be found here, ImdbDocTest.java.

Configuration and full-text indexes

At the time of creation extra configuration can be specified to control the behavior of the index and which backend to use. For example to create a Lucene full-text index:

IndexManager index = graphDb.index();
Index<Node> fulltextMovies = index.forNodes( "movies-fulltext",
        MapUtil.stringMap( IndexManager.PROVIDER, "lucene", "type", "fulltext" ) );
fulltextMovies.add( theMatrix, "title", "The Matrix" );
fulltextMovies.add( theMatrixReloaded, "title", "The Matrix Reloaded" );
// search in the full-text index
Node found = fulltextMovies.query( "title", "reloAdEd" ).getSingle();

The source code for the example can be found here, ImdbDocTest.java.

Here is an example of how to create an exact index which is case insensitive:

Index<Node> index = graphDb.index().forNodes( "exact-case-insensitive",
        MapUtil.stringMap( "type", "exact", "to_lower_case", "true" ) );
Node node = graphDb.createNode();
index.add( node, "name", "Thomas Anderson" );
assertContains( index.query( "name", "\"Thomas Anderson\"" ), node );
assertContains( index.query( "name", "\"thoMas ANDerson\"" ), node );

In order to search for tokenized words, the query method has to be used. The get method will only match the full string value, not the tokens.

The source code for the example can be found here, TestLuceneIndex.java.

The configuration of the index is persisted once the index has been created. The provider configuration key is interpreted by Neo4j, but any other configuration is passed onto the backend index (e.g. Lucene) to interpret.

Table 1. Lucene indexing configuration parameters
Parameter Possible values Effect

type

exact, fulltext.

exact is the default and uses a Lucene keyword analyzer. fulltext uses a white-space tokenizer in its analyzer.

to_lower_case

true, false.

This parameter goes together with type: fulltext and converts values to lower case during both additions and querying, making the index case insensitive. Defaults to true.

analyzer

The full class name of an Analyzer.

Overrides the type so that a custom analyzer can be used.

to_lower_case still affects lowercasing of string queries.

If the custom analyzer uppercases the indexed tokens, string queries will not match as expected.

Extra features for Lucene indexes

Numeric ranges

Lucene supports smart indexing of numbers, querying for ranges and sorting such results, and so does its backend for Neo4j. To mark a value so that it is indexed as a numeric value, we can make use of the org.neo4j.index.lucene.ValueContext class, like this:

movies.add( theMatrix, "year-numeric", new ValueContext( 1999 ).indexNumeric() );
movies.add( theMatrixReloaded, "year-numeric", new ValueContext( 2003 ).indexNumeric() );
movies.add( malena, "year-numeric", new ValueContext( 2000 ).indexNumeric() );

int from = 1997;
int to = 1999;
hits = movies.query( QueryContext.numericRange( "year-numeric", from, to ) );

The same type must be used for indexing and querying. That is, you cannot index a value as a Long and then query the index using an Integer.

By giving null as from/to argument, an open ended query is created. In the following example we are doing that, and have added sorting to the query as well:

hits = movies.query(
        QueryContext.numericRange( "year-numeric", from, null )
                .sortNumeric( "year-numeric", false ) );

From/to in the ranges defaults to be inclusive, but you can change this behavior by using two extra parameters:

movies.add( theMatrix, "score", new ValueContext( 8.7 ).indexNumeric() );
movies.add( theMatrixReloaded, "score", new ValueContext( 7.1 ).indexNumeric() );
movies.add( malena, "score", new ValueContext( 7.4 ).indexNumeric() );

// include 8.0, exclude 9.0
hits = movies.query( QueryContext.numericRange( "score", 8.0, 9.0, true, false ) );

The source code for the examples can be found here, ImdbDocTest.java.

Sorting

Lucene performs sorting very well, and that is also exposed in the index backend, through the org.neo4j.index.lucene.QueryContext class:

hits = movies.query( "title", new QueryContext( "*" ).sort( "title" ) );
for ( Node hit : hits )
{
    // all movies with a title in the index, ordered by title
}
// or
hits = movies.query( new QueryContext( "title:*" ).sort( "year", "title" ) );
for ( Node hit : hits )
{
    // all movies with a title in the index, ordered by year, then title
}

You can sort the results by relevance (score) like this:

hits = movies.query( "title", new QueryContext( "The*" ).sortByScore() );
for ( Node movie : hits )
{
    // hits sorted by relevance (score)
}

The source code for the examples can be found here, ImdbDocTest.java.

Querying with Lucene query objects

Instead of passing in Lucene query syntax queries, you can instantiate such queries programmatically and pass in as argument, for example:

Node actor = actors.query( new TermQuery( new Term( "name", "Keanu Reeves" ) ) ).getSingle();

The TermQuery is basically the same thing as using the get method on the index.

This is how to perform wildcard searches using Lucene query objects:

hits = movies.query( new WildcardQuery( new Term( "title", "The Matrix*" ) ) );
for ( Node movie : hits )
{
    System.out.println( movie.getProperty( "title" ) );
}

Note that this allows for whitespace in the search string.

The source code for the examples can be found here, ImdbDocTest.java.

Compound queries

Lucene supports querying for multiple terms in the same query, like so:

hits = movies.query( "title:*Matrix* AND year:1999" );

Compound queries cannot search across committed index entries and those who have not got committed yet at the same time.

The source code for the example can be found here, ImdbDocTest.java.

Default operator

The default operator (that is whether AND or OR is used in between different terms) in a query is OR. Changing that behavior is also done via the org.neo4j.index.lucene.QueryContext class:

QueryContext query = new QueryContext( "title:*Matrix* year:1999" )
        .defaultOperator( Operator.AND );
hits = movies.query( query );

The source code for the example can be found here, ImdbDocTest.java.