Neo4j has a batch insertion facility intended for initial imports, which bypasses transactions and other checks in favor of performance. This is useful when you have a big dataset that needs to be loaded once.
Batch insertion is inlcuded in the neo4j-kernel component, which is part of all Neo4j distributions and editions.
Be aware of the following points when using batch insertion:
shutdown
is successfully invoked at the end of the import, the database files will be corrupt.
Warning | |
---|---|
Always perform batch insertion in a single thread (or use synchronization to make only one thread at a time access the batch inserter) and invoke |
Creating a batch inserter is similar to how you normally create data in the database, but in this case the low-level BatchInserter
interface is used.
As we have already pointed out, you can’t have multiple threads using the batch inserter concurrently without external synchronization.
Tip | |
---|---|
The source code of the examples is found here: BatchInsertDocTest.java |
To get hold of a BatchInseter
, use BatchInserters
and then go from there:
BatchInserter inserter = BatchInserters.inserter( "target/batchinserter-example", fileSystem ); Map<String, Object> properties = new HashMap<String, Object>(); properties.put( "name", "Mattias" ); long mattiasNode = inserter.createNode( properties ); properties.put( "name", "Chris" ); long chrisNode = inserter.createNode( properties ); RelationshipType knows = DynamicRelationshipType.withName( "KNOWS" ); // To set properties on the relationship, use a properties map // instead of null as the last parameter. inserter.createRelationship( mattiasNode, chrisNode, knows, null ); inserter.shutdown();
To gain good performance you probably want to set some configuration settings for the batch inserter. Read Section 25.9.2, “Batch insert example” for information on configuring a batch inserter. This is how to start a batch inserter with configuration options:
Map<String, String> config = new HashMap<String, String>(); config.put( "neostore.nodestore.db.mapped_memory", "90M" ); BatchInserter inserter = BatchInserters.inserter( "target/batchinserter-example-config", fileSystem, config ); // Insert data here ... and then shut down: inserter.shutdown();
In case you have stored the configuration in a file, you can load it like this:
InputStream input = fileSystem.openAsInputStream( new File( "target/batchinsert-config" ) ); Map<String, String> config = MapUtil.load( input ); BatchInserter inserter = BatchInserters.inserter( "target/batchinserter-example-config", fileSystem, config ); // Insert data here ... and then shut down: inserter.shutdown();
In case you already have code for data import written against the normal Neo4j API, you could consider using a batch inserter exposing that API.
Note | |
---|---|
This will not perform as good as using the |
Also be aware of the following:
Transaction.finish()
or Transaction.success()
will do nothing.
Transaction.failure()
method will generate a NotInTransaction
exception.
Node.delete()
and Node.traverse()
are not supported.
Relationship.delete()
is not supported.
GraphDatabaseService.getRelationshipTypes()
, getAllNodes()
and getAllRelationships()
are not supported.
With these precautions in mind, this is how to do it:
GraphDatabaseService batchDb = BatchInserters.batchDatabase( "target/batchdb-example", fileSystem ); Node mattiasNode = batchDb.createNode(); mattiasNode.setProperty( "name", "Mattias" ); Node chrisNode = batchDb.createNode(); chrisNode.setProperty( "name", "Chris" ); RelationshipType knows = DynamicRelationshipType.withName( "KNOWS" ); mattiasNode.createRelationshipTo( chrisNode, knows ); batchDb.shutdown();
Tip | |
---|---|
The source code of the example is found here: BatchInsertDocTest.java |
For general notes on batch insertion, see Section 18.1, “Batch Insertion”.
Indexing during batch insertion is done using BatchInserterIndex which are provided via BatchInserterIndexProvider. An example:
BatchInserter inserter = BatchInserters.inserter( "target/neo4jdb-batchinsert" ); BatchInserterIndexProvider indexProvider = new LuceneBatchInserterIndexProvider( inserter ); BatchInserterIndex actors = indexProvider.nodeIndex( "actors", MapUtil.stringMap( "type", "exact" ) ); actors.setCacheCapacity( "name", 100000 ); Map<String, Object> properties = MapUtil.map( "name", "Keanu Reeves" ); long node = inserter.createNode( properties ); actors.add( node, properties ); //make the changes visible for reading, use this sparsely, requires IO! actors.flush(); // Make sure to shut down the index provider as well indexProvider.shutdown(); inserter.shutdown();
The configuration parameters are the same as mentioned in Section 19.10, “Configuration and fulltext indexes”.
Here are some pointers to get the most performance out of BatchInserterIndex
:
Note | |
---|---|
Changes to the index are available for reading first after they are flushed to disk. Thus, for optimal performance, read and lookup operations should be kept to a minimum during batchinsertion since they involve IO and impact speed negatively. |
Copyright © 2013 Neo Technology