5.12. Batch insertion

5.12.1. Best practices

Neo4j has a batch insertion mode intended for initial imports, which must run in a single thread and bypasses transactions and other checks in favor of performance. Indexing during batch insertion is done using BatchInserterIndex which are provided via BatchInserterIndexProvider. An example:

BatchInserter inserter = new BatchInserterImpl( "target/neo4jdb-batchinsert" );
BatchInserterIndexProvider indexProvider = new LuceneBatchInserterIndexProvider( inserter );
BatchInserterIndex actors = indexProvider.nodeIndex( "actors", MapUtil.stringMap( "type", "exact" ) );
actors.setCacheCapacity( "name", 100000 );

Map<String, Object> properties = MapUtil.map( "name", "Keanu Reeves" );
long node = inserter.createNode( properties );
actors.add( node, properties );

// Make sure to shut down the index provider
indexProvider.shutdown();
inserter.shutdown();

The configuration parameters are the same as mentioned in Section 5.10, “Configuration and fulltext indices”.

5.12.1. Best practices

Here are some pointers to get the most performance out of BatchInserterIndex:

  • Try to avoid flushing too often because each flush will result in all additions (since last flush) to be visible to the querying methods, and publishing those changes can be a performance penalty.
  • Have (as big as possible) phases where one phase is either only writes or only reads, and don’t forget to flush after a write phase so that those changes becomes visible to the querying methods.
  • Enable caching for keys you know you’re going to do lookups for later on to increase performance significantly (though insertion performance may degrade slightly).