Explicit indexing
Explicit indexing in Neo4j is deprecated, and will be removed in the next major release. Consider configuring indexes to support full-text search instead. See Cypher Manual → Indexes to support full-text search. |
The functionality described here has been removed in Neo4j 4.0. |
Explicit indexing operations are part of the Neo4j index API.
Each index is tied to a unique, user-specified name (for example "first_name" or "books") and can index either org.neo4j.graphdb.Node
or org.neo4j.graphdb.Relationship
.
The default index implementation is provided by the neo4j-lucene-index
component, which is included in the standard Neo4j download.
It can also be downloaded separately from https://repo1.maven.org/maven2/org/neo4j/neo4j-lucene-index/.
For Maven users, the neo4j-lucene-index
component has the coordinates org.neo4j:neo4j-lucene-index
and should be used with the same version of org.neo4j:neo4j-kernel
.
Different versions of the index and kernel components are not compatible in the general case.
Both components are included transitively by the org.neo4j:neo4j:pom
artifact, which makes it possible to keep the versions in sync.
Transactions
All modifying index operations must be performed inside a transaction, as with any modifying operation in Neo4j. |
Create
An index is created if it does not exist when you ask for it. Unless you give it a custom configuration, it will be created with default configuration and backend.
To set the stage for our examples, we can create some indexes to begin with:
IndexManager index = graphDb.index();
Index<Node> actors = index.forNodes( "actors" );
Index<Node> movies = index.forNodes( "movies" );
RelationshipIndex roles = index.forRelationships( "roles" );
This will create two node indexes and one relationship index with default configuration. See Relationship indexes for more information specific to relationship indexes.
See Configuration and full-text indexes for how to create full-text indexes.
You can also check if an index exists like this:
IndexManager index = graphDb.index();
boolean indexExists = index.existsForNodes( "actors" );
The source code for the examples can be found here, |
Delete
Indexes can be deleted. When deleting, the entire contents of the index will be removed as well as its associated configuration. An index can be created with the same name at a later point in time.
IndexManager index = graphDb.index();
Index<Node> actors = index.forNodes( "actors" );
actors.delete();
Note that the actual deletion of the index is made during the commit of the surrounding transaction.
Calls made to such an index instance after delete()
has been called are invalid inside that transaction as well as outside (if the transaction is successful), but will become valid again if the transaction is rolled back.
The source code for the examples can be found here, |
Add
Each index supports associating any number of key-value pairs with any number of entities (nodes or relationships), where each association between entity and key-value pair is performed individually. To begin with, we can add a few nodes to the indexes:
// Actors
Node reeves = graphDb.createNode();
reeves.setProperty( "name", "Keanu Reeves" );
actors.add( reeves, "name", reeves.getProperty( "name" ) );
Node bellucci = graphDb.createNode();
bellucci.setProperty( "name", "Monica Bellucci" );
actors.add( bellucci, "name", bellucci.getProperty( "name" ) );
// multiple values for a field, in this case for search only
// and not stored as a property.
actors.add( bellucci, "name", "La Bellucci" );
// Movies
Node theMatrix = graphDb.createNode();
theMatrix.setProperty( "title", "The Matrix" );
theMatrix.setProperty( "year", 1999 );
movies.add( theMatrix, "title", theMatrix.getProperty( "title" ) );
movies.add( theMatrix, "year", theMatrix.getProperty( "year" ) );
Node theMatrixReloaded = graphDb.createNode();
theMatrixReloaded.setProperty( "title", "The Matrix Reloaded" );
theMatrixReloaded.setProperty( "year", 2003 );
movies.add( theMatrixReloaded, "title", theMatrixReloaded.getProperty( "title" ) );
movies.add( theMatrixReloaded, "year", 2003 );
Node malena = graphDb.createNode();
malena.setProperty( "title", "Malèna" );
malena.setProperty( "year", 2000 );
movies.add( malena, "title", malena.getProperty( "title" ) );
movies.add( malena, "year", malena.getProperty( "year" ) );
Note that there can be multiple values associated with the same entity and key.
Next up, we will create relationships and index them as well:
// we need a relationship type
RelationshipType ACTS_IN = RelationshipType.withName( "ACTS_IN" );
// create relationships
Relationship role1 = reeves.createRelationshipTo( theMatrix, ACTS_IN );
role1.setProperty( "name", "Neo" );
roles.add( role1, "name", role1.getProperty( "name" ) );
Relationship role2 = reeves.createRelationshipTo( theMatrixReloaded, ACTS_IN );
role2.setProperty( "name", "Neo" );
roles.add( role2, "name", role2.getProperty( "name" ) );
Relationship role3 = bellucci.createRelationshipTo( theMatrixReloaded, ACTS_IN );
role3.setProperty( "name", "Persephone" );
roles.add( role3, "name", role3.getProperty( "name" ) );
Relationship role4 = bellucci.createRelationshipTo( malena, ACTS_IN );
role4.setProperty( "name", "Malèna Scordia" );
roles.add( role4, "name", role4.getProperty( "name" ) );
After these operations, our example graph looks like this:
The source code for the examples can be found here, |
Remove
Removing (remove()
) from an index is similar to adding, but can be done by supplying one of the following combinations of arguments:
-
entity
-
entity, key
-
entity, key, value
// completely remove bellucci from the actors index
actors.remove( bellucci );
// remove any "name" entry of bellucci from the actors index
actors.remove( bellucci, "name" );
// remove the "name" -> "La Bellucci" entry of bellucci
actors.remove( bellucci, "name", "La Bellucci" );
The source code for the example can be found here, |
Update
To update an index entry, the old one must be removed and a new one added. For details on removing index entries, see Remove. |
Remember that a node or relationship can be associated with any number of key-value pairs in an index. This means that you can index a node or relationship with many key-value pairs that have the same key. In the case where a property value changes and you would like to update the index, it is not enough to just index the new value — you will have to remove the old value as well.
Here is a code example that demonstrates how it is done:
// create a node with a property
// so we have something to update later on
Node fishburn = graphDb.createNode();
fishburn.setProperty( "name", "Fishburn" );
// index it
actors.add( fishburn, "name", fishburn.getProperty( "name" ) );
// update the index entry
// when the property value changes
actors.remove( fishburn, "name", fishburn.getProperty( "name" ) );
fishburn.setProperty( "name", "Laurence Fishburn" );
actors.add( fishburn, "name", fishburn.getProperty( "name" ) );
The source code for the example can be found here, |
Search
An index can be searched in two ways, get
and query
.
The get
method will return exact matches to the given key-value pair, whereas query
exposes querying capabilities directly from the backend used by the index.
For example the Lucene query syntax can be used directly with the default indexing backend.
Get
This is how to search for a single exact match:
IndexHits<Node> hits = actors.get( "name", "Keanu Reeves" );
Node reeves = hits.getSingle();
org.neo4j.graphdb.index.IndexHits
is an Iterable
with some additional useful methods.
For example getSingle()
returns the first and only item from the result iterator, or null
if there is no hit.
Here is how to get a single relationship by exact matching and retrieve its start and end nodes:
Relationship persephone = roles.get( "name", "Persephone" ).getSingle();
Node actor = persephone.getStartNode();
Node movie = persephone.getEndNode();
Finally, you can iterate over all exact matches from a relationship index:
for ( Relationship role : roles.get( "name", "Neo" ) )
{
// this will give us Reeves twice
Node reeves = role.getStartNode();
}
In case you do not iterate through all the hits, |
The source code for the examples can be found here, |
Query
There are two query methods, one which uses a key-value signature where the value represents a query for values with the given key only. The other method is more generic and supports querying for more than one key-value pair in the same query.
Here is an example using the key-query option:
for ( Node actor : actors.query( "name", "*e*" ) )
{
// This will return Reeves and Bellucci
}
In the following example the query uses multiple keys:
for ( Node movie : movies.query( "title:*Matrix* AND year:1999" ) )
{
// This will return "The Matrix" from 1999 only.
}
Beginning a wildcard search with |
You cannot have any whitespace in the search term with this syntax. See Querying with Lucene query objects for how to do that. |
The source code for the examples can be found here, |
Relationship indexes
An index for relationships is just like an index for nodes, extended by providing support to constrain a search to relationships with a specific start and/or end node.
These extra methods reside in the org.neo4j.graphdb.index.RelationshipIndex
interface which extends org.neo4j.graphdb.index.Index<Relationship>
.
Example of querying a relationship index:
// find relationships filtering on start node
// using exact matches
IndexHits<Relationship> reevesAsNeoHits;
reevesAsNeoHits = roles.get( "name", "Neo", reeves, null );
Relationship reevesAsNeo = reevesAsNeoHits.iterator().next();
reevesAsNeoHits.close();
// find relationships filtering on end node
// using a query
IndexHits<Relationship> matrixNeoHits;
matrixNeoHits = roles.query( "name", "*eo", null, theMatrix );
Relationship matrixNeo = matrixNeoHits.iterator().next();
matrixNeoHits.close();
And here is an example for the special case of searching for a specific relationship type:
// find relationships filtering on end node
// using a relationship type.
// this is how to add it to the index:
roles.add( reevesAsNeo, "type", reevesAsNeo.getType().name() );
// Note that to use a compound query, we can't combine committed
// and uncommitted index entries, so we'll commit before querying:
tx.success();
tx.close();
// and now we can search for it:
try ( Transaction tx = graphDb.beginTx() )
{
IndexHits<Relationship> typeHits = roles.query( "type:ACTS_IN AND name:Neo", null, theMatrix );
Relationship typeNeo = typeHits.iterator().next();
typeHits.close();
Such an index can be useful if your domain has nodes with a very large number of relationships between them, since it reduces the search time for a relationship between two nodes. A good example where this approach pays dividends is in time series data, where we have readings represented as a relationship per occurrence.
The source code for the examples can be found here, |
Scores
The IndexHits
interface exposes scoring (org.neo4j.graphdb.index.IndexHits.currentScore()
) so that the index can communicate scores for the hits.
The result is not sorted by the score unless you explicitly specify that. See Sorting for how to sort by score. |
IndexHits<Node> hits = movies.query( "title", "The*" );
for ( Node movie : hits )
{
System.out.println( movie.getProperty( "title" ) + " " + hits.currentScore() );
}
The source code for the example can be found here, |
Configuration and full-text indexes
At the time of creation extra configuration can be specified to control the behavior of the index and which backend to use. For example to create a Lucene full-text index:
IndexManager index = graphDb.index();
Index<Node> fulltextMovies = index.forNodes( "movies-fulltext",
MapUtil.stringMap( IndexManager.PROVIDER, "lucene", "type", "fulltext" ) );
fulltextMovies.add( theMatrix, "title", "The Matrix" );
fulltextMovies.add( theMatrixReloaded, "title", "The Matrix Reloaded" );
// search in the full-text index
Node found = fulltextMovies.query( "title", "reloAdEd" ).getSingle();
The source code for the example can be found here, |
Here is an example of how to create an exact index which is case insensitive:
Index<Node> index = graphDb.index().forNodes( "exact-case-insensitive",
MapUtil.stringMap( "type", "exact", "to_lower_case", "true" ) );
Node node = graphDb.createNode();
index.add( node, "name", "Thomas Anderson" );
assertContains( index.query( "name", "\"Thomas Anderson\"" ), node );
assertContains( index.query( "name", "\"thoMas ANDerson\"" ), node );
In order to search for tokenized words, the |
The source code for the example can be found here, |
The configuration of the index is persisted once the index has been created.
The provider
configuration key is interpreted by Neo4j, but any other configuration is passed onto the backend index (e.g. Lucene) to interpret.
Parameter | Possible values | Effect | ||
---|---|---|---|---|
|
|
|
||
|
|
This parameter goes together with |
||
|
The full class name of an Analyzer. |
Overrides the
If the custom analyzer uppercases the indexed tokens, string queries will not match as expected. |
Extra features for Lucene indexes
Numeric ranges
Lucene supports smart indexing of numbers, querying for ranges and sorting such results, and so does its backend for Neo4j.
To mark a value so that it is indexed as a numeric value, we can make use of the org.neo4j.index.lucene.ValueContext
class, like this:
movies.add( theMatrix, "year-numeric", new ValueContext( 1999 ).indexNumeric() );
movies.add( theMatrixReloaded, "year-numeric", new ValueContext( 2003 ).indexNumeric() );
movies.add( malena, "year-numeric", new ValueContext( 2000 ).indexNumeric() );
int from = 1997;
int to = 1999;
hits = movies.query( QueryContext.numericRange( "year-numeric", from, to ) );
The same type must be used for indexing and querying. That is, you cannot index a value as a Long and then query the index using an Integer. |
By giving null
as from/to argument, an open ended query is created.
In the following example we are doing that, and have added sorting to the query as well:
hits = movies.query(
QueryContext.numericRange( "year-numeric", from, null )
.sortNumeric( "year-numeric", false ) );
From/to in the ranges defaults to be inclusive, but you can change this behavior by using two extra parameters:
movies.add( theMatrix, "score", new ValueContext( 8.7 ).indexNumeric() );
movies.add( theMatrixReloaded, "score", new ValueContext( 7.1 ).indexNumeric() );
movies.add( malena, "score", new ValueContext( 7.4 ).indexNumeric() );
// include 8.0, exclude 9.0
hits = movies.query( QueryContext.numericRange( "score", 8.0, 9.0, true, false ) );
The source code for the examples can be found here, |
Sorting
Lucene performs sorting very well, and that is also exposed in the index backend, through the org.neo4j.index.lucene.QueryContext
class:
hits = movies.query( "title", new QueryContext( "*" ).sort( "title" ) );
for ( Node hit : hits )
{
// all movies with a title in the index, ordered by title
}
// or
hits = movies.query( new QueryContext( "title:*" ).sort( "year", "title" ) );
for ( Node hit : hits )
{
// all movies with a title in the index, ordered by year, then title
}
You can sort the results by relevance (score) like this:
hits = movies.query( "title", new QueryContext( "The*" ).sortByScore() );
for ( Node movie : hits )
{
// hits sorted by relevance (score)
}
The source code for the examples can be found here, |
Querying with Lucene query objects
Instead of passing in Lucene query syntax queries, you can instantiate such queries programmatically and pass in as argument, for example:
Node actor = actors.query( new TermQuery( new Term( "name", "Keanu Reeves" ) ) ).getSingle();
The |
This is how to perform wildcard searches using Lucene query objects:
hits = movies.query( new WildcardQuery( new Term( "title", "The Matrix*" ) ) );
for ( Node movie : hits )
{
System.out.println( movie.getProperty( "title" ) );
}
Note that this allows for whitespace in the search string.
The source code for the examples can be found here, |
Compound queries
Lucene supports querying for multiple terms in the same query, like so:
hits = movies.query( "title:*Matrix* AND year:1999" );
Compound queries cannot search across committed index entries and those who have not got committed yet at the same time. |
The source code for the example can be found here, |
Default operator
The default operator (that is whether AND
or OR
is used in between different terms) in a query is OR
.
Changing that behavior is also done via the org.neo4j.index.lucene.QueryContext
class:
QueryContext query = new QueryContext( "title:*Matrix* year:1999" )
.defaultOperator( Operator.AND );
hits = movies.query( query );
The source code for the example can be found here, |