This describes neo4j-embedded, a Python library that lets you use the embedded Neo4j database in Python.

Note
The version found at PyPI might not be updated. To get a fresh version, you can build it yourself.

Tutorials

This describes how to get started with Neo4j embedded in python. See reference for the full reference documentation.

You have to have installed the neo4j-embedded python library to try these examples, see installation.

Hello, world!

Here is a simple example to get you started.

from neo4j import GraphDatabase

# Create a database
db = GraphDatabase(folder_to_put_db_in)

# All write operations happen in a transaction
with db.transaction:
    firstNode = db.node(name='Hello')
    secondNode = db.node(name='world!')

    # Create a relationship with type 'knows'
    relationship = firstNode.knows(secondNode, name='graphy')

# Read operations can happen anywhere
message = ' '.join([firstNode['name'], relationship['name'], secondNode['name']])

print message

# Delete the data
with db.transaction:
    firstNode.knows.single.delete()
    firstNode.delete()
    secondNode.delete()

# Always shut down your database when your application exits
db.shutdown()

A sample app using cypher and indexes

For detailed documentation on the concepts used here, see indexes and Cypher.

This example shows you how to get started building something like a simple invoice tracking application with Neo4j.

We start out by importing Neo4j, and creating some meta data that we will use to organize our actual data with.

from neo4j import GraphDatabase, INCOMING, Evaluation

# Create a database
db = GraphDatabase(folder_to_put_db_in)

# All write operations happen in a transaction
with db.transaction:

    # A node to connect customers to
    customers = db.node()

    # A node to connect invoices to
    invoices = db.node()

    # Connected to the reference node, so
    # that we can always find them.
    db.reference_node.CUSTOMERS(customers)
    db.reference_node.INVOICES(invoices)

    # An index, helps us rapidly look up customers
    customer_idx = db.node.indexes.create('customers')

Domain logic

Then we define some domain logic that we want our application to be able to perform. Our application has two domain objects, Customers and Invoices. Let’s create methods to add new customers and invoices.

def create_customer(name):
    with db.transaction:
        customer = db.node(name=name)
        customer.INSTANCE_OF(customers)

        # Index the customer by name
        customer_idx['name'][name] = customer
    return customer

def create_invoice(customer, amount):
    with db.transaction:
        invoice = db.node(amount=amount)
        invoice.INSTANCE_OF(invoices)

        invoice.SENT_TO(customer)
    return customer

In the customer case, we create a new node to represent the customer and connect it to the customers node. This helps us find customers later on, as well as determine if a given node is a customer.

We also index the name of the customer, to allow for quickly finding customers by name.

In the invoice case, we do the same, except no indexing. We also connect each new invoice to the customer it was sent to, using a relationship of type SENT_TO.

Next, we want to be able to retrieve customers and invoices that we have added. Because we are indexing customer names, finding them is quite simple.

def get_customer(name):
    return customer_idx['name'][name].single

Lets say we also like to do something like finding all invoices for a given customer that are above some given amount. This could be done by writing a cypher query, like this:

def get_invoices_with_amount_over(customer, min_sum):
    # Find all invoices over a given sum for a given customer.
    # Note that we return an iterator over the "invoice" column
    # in the result (['invoice']).
    return db.query('''START customer=node({customer_id})
                       MATCH invoice-[:SENT_TO]->customer
                       WHERE has(invoice.amount) and invoice.amount >= {min_sum}
                       RETURN invoice''',
                       customer_id = customer.id, min_sum = min_sum)['invoice']

Creating data and getting it back

Putting it all together, we can create customers and invoices, and use the search methods we wrote to find them.

for name in ['Acme Inc.', 'Example Ltd.']:
   create_customer(name)

# Loop through customers
for relationship in customers.INSTANCE_OF:
   customer = relationship.start
   for i in range(1,12):
       create_invoice(customer, 100 * i)

# Finding large invoices
large_invoices = get_invoices_with_amount_over(get_customer('Acme Inc.'), 500)

# Getting all invoices per customer:
for relationship in get_customer('Acme Inc.').SENT_TO.incoming:
    invoice = relationship.start

Reference Documentation

The source code for this project lives on GitHub: https://github.com/neo4j-contrib/python-embedded

Installation

Note
The Neo4j database itself (from the Community Edition) is included in the neo4j-embedded distribution.

Installation on OSX/Linux

Prerequisites
Caution
Make sure that the entire stack used is either 64bit or 32bit (no mixing, that is). That means the JVM, Python and JPype.

First, install JPype:

  1. Download the latest version of JPype from http://sourceforge.net/projects/jpype/files/JPype/.

  2. Unzip the file.

  3. Open a console and navigate into the unzipped folder.

  4. Run sudo python setup.py install

JPype is also available in the Debian repos:

sudo apt-get install python-jpype

Then, make sure the JAVA_HOME environment variable is set to your jre or jdk folder, so that JPype can find the JVM.

Note
Installation can be problematic on OSX. See the following Stack Overflow discussion for help: http://stackoverflow.com/questions/8525193/cannot-install-jpype-on-os-x-lion-to-use-with-neo4j and this blog post may be of help as well: http://blog.y3xz.com/blog/2011/04/29/installing-jpype-on-mac-os-x/
Installing neo4j-embedded

You can install neo4j-embedded with your python package manager of choice:

sudo pip install neo4j-embedded
sudo easy_install neo4j-embedded

Or install manually:

  1. Download the latest appropriate version of JPype from http://sourceforge.net/projects/jpype/files/JPype/ for 32bit or from http://www.lfd.uci.edu/~gohlke/pythonlibs/ for 64bit.

  2. Unzip the file.

  3. Open a console and navigate into the unzipped folder.

  4. Run sudo python setup.py install

Installation on Windows

Prerequisites
Warning
It is imperative that the entire stack used is either 64bit or 32bit (no mixing, that is). That means the JVM, Python, JPype and all extra DLLs (see below).

First, install JPype:

Note
Notice that JPype only works with Python 2.6 and 2.7. Also note that there are different downloads depending on which version you use.
  1. Download the latest appropriate version of JPype from http://sourceforge.net/projects/jpype/files/JPype/ for 32bit or from http://www.lfd.uci.edu/~gohlke/pythonlibs/ for 64bit.

  2. Run the installer.

Then, make sure the JAVA_HOME environment variable is set to your jre or jdk folder. There is a description of how to set environment variables in [python-embedded-installation-windows-dlls].

Note
There may be DLL files missing from your system that are required by JPype. See DLLs for instructions for how to fix this.
Installing neo4j-embedded
  1. Download the latest version from http://pypi.python.org/pypi/neo4j-embedded/.

  2. Run the installer.

Solving problems with missing DLL files

Certain versions of Windows ship without DLL files needed to programmatically launch a JVM. You will need to make IEShims.dll and certain debugging dlls available on Windows.

IEShims.dll is normally included with Internet Explorer installs. To make windows find this file globally, you need to add the IE install folder to your PATH.

  1. Right click on "My Computer" or "Computer".

  2. Select "Properties".

  3. Click on "Advanced" or "Advanced system settings".

  4. Click the "Environment variables" button.

  5. Find the path varible, and add C:\Program Files\Internet Explorer to it (or the install location of IE, if you have installed it somewhere else).

Required debugging dlls are bundled with Microsoft Visual C++ Redistributable libraries.

If you are still getting errors about missing DLL files, you can use http://www.dependencywalker.com/ to open your jvm.dll (located in JAVA_HOME/bin/client/ or JAVA_HOME/bin/server/), and it will tell you if there are other missing dlls.

Core API

This section describes how get get up and running, and how to do basic operations.

Getting started

Creating a database
from neo4j import GraphDatabase

# Create db
db = GraphDatabase(folder_to_put_db_in)

# Always shut down your database
db.shutdown()
Creating a database, with configuration

Please see Neo4j Configuration for what options you can use here.

from neo4j import GraphDatabase

# Example configuration parameters
db = GraphDatabase(folder_to_put_db_in, string_block_size=200, array_block_size=240)

db.shutdown()
JPype JVM configuration

You can set extra arguments to be passed to the JVM using the NEO4J_PYTHON_JVMARGS environment variable. This can be used to, for instance, increase the max memory for the database.

Note that you must set this before you import the neo4j package, either by setting it before you start python, or by setting it programatically in your app.

import os
os.environ['NEO4J_PYTHON_JVMARGS'] = '-Xms128M -Xmx512M'
import neo4j

You can also override the classpath used by neo4j-embedded, by setting the NEO4J_PYTHON_CLASSPATH environment variable.

Transactions

All write operations to the database need to be performed from within transactions. This ensures that your database never ends up in an inconsistent state.

See Neo4j Transactions for details on how Neo4j handles transactions.

We use the python with statement to define a transaction context. If you are using an older version of Python, you may have to import the with statement:

from __future__ import with_statement

Either way, this is how you get into a transaction:

# Start a transaction
with db.transaction:
    # This is inside the transactional
    # context. All work done here
    # will either entirely succeed,
    # or no changes will be applied at all.

    # Create a node
    node = db.node()

    # Give it a name
    node['name'] = 'Cat Stevens'

# The transaction is automatically
# commited when you exit the with
# block.

Nodes

This describes operations that are specific to node objects. For documentation on how to handle properties on both relationships and nodes, see properties.

Creating a node
with db.transaction:
    # Create a node
    thomas = db.node(name='Thomas Anderson', age=42)
Fetching a node by id
# You don't have to be in a transaction
# to do read operations.
a_node = db.node[some_node_id]

# Ids on nodes and relationships are available via the "id"
# property, eg.:
node_id = a_node.id
Fetching the reference node
reference = db.reference_node
Removing a node
with db.transaction:
    node = db.node()
    node.delete()
Tip
See also Neo4j Delete Semantics.
Removing a node by id
with db.transaction:
    del db.node[some_node_id]
Accessing relationships from a node

For details on what you can do with the relationship objects, see relationships.

# All relationships on a node
for rel in a_node.relationships:
    pass

# Incoming relationships
for rel in a_node.relationships.incoming:
    pass

# Outgoing relationships
for rel in a_node.relationships.outgoing:
    pass

# Relationships of a specific type
for rel in a_node.mayor_of:
    pass

# Incoming relationships of a specific type
for rel in a_node.mayor_of.incoming:
    pass

# Outgoing relationships of a specific type
for rel in a_node.mayor_of.outgoing:
    pass
Getting and/or counting all nodes

Use this with care, it will become extremely slow in large datasets.

for node in db.nodes:
    pass

# Shorthand for iterating through
# and counting all nodes
number_of_nodes = len(db.nodes)

Relationships

This describes operations that are specific to relationship objects. For documentation on how to handle properties on both relationships and nodes, see properties.

Creating a relationship
with db.transaction:
    # Nodes to create a relationship between
    steven = self.graphdb.node(name='Steve Brook')
    poplar_bluff = self.graphdb.node(name='Poplar Bluff')

    # Create a relationship of type "mayor_of"
    relationship = steven.mayor_of(poplar_bluff, since="12th of July 2012")

    # Or, to create relationship types with names
    # that would not be possible with the above
    # method.
    steven.relationships.create('mayor_of', poplar_bluff, since="12th of July 2012")
Fetching a relationship by id
the_relationship = db.relationship[a_relationship_id]
Removing a relationship
with db.transaction:
    # Create a relationship
    source = db.node()
    target = db.node()
    rel = source.Knows(target)

    # Delete it
    rel.delete()
Tip
See also Neo4j Delete Semantics.
Removing a relationship by id
with db.transaction:
    del db.relationship[some_relationship_id]
Relationship start node, end node and type
relationship_type = relationship.type

start_node = relationship.start
end_node = relationship.end
Getting and/or counting all relationships

Use this with care, it will become extremely slow in large datasets.

for rel in db.relationships:
    pass

# Shorthand for iterating through
# and counting all relationships
number_of_rels = len(db.relationships)

Properties

Both nodes and relationships can have properties, so this section applies equally to both node and relationship objects. Allowed property values include strings, numbers, booleans, as well as arrays of those primitives. Within each array, all values must be of the same type.

Setting properties
with db.transaction:
    node_or_rel['name'] = 'Thomas Anderson'
    node_or_rel['age'] = 42
    node_or_rel['favourite_numbers'] = [1,2,3]
    node_or_rel['favourite_words'] = ['banana','blue']
Getting properties
numbers = node_or_rel['favourite_numbers']
Removing properties
with db.transaction:
    del node_or_rel['favourite_numbers']
Looping through properties
# Loop key and value at the same time
for key, value in node_or_rel.items():
    pass

# Loop property keys
for key in node_or_rel.keys():
    pass

# Loop property values
for value in node_or_rel.values():
    pass

Paths

A path object represents a path between two nodes in the graph. Paths thus contain at least two nodes and one relationship, but can reach arbitrary length. It is used in various parts of the API, most notably in traversals.

Accessing the start and end nodes
start_node = path.start
end_node = path.end
Accessing the last relationship
last_relationship = path.last_relationship
Looping through the entire path

You can loop through all elements of a path directly, or you can choose to only loop through nodes or relationships. When you loop through all elements, the first item will be the start node, the second will be the first relationship, the third the node that the relationship led to and so on.

for item in path:
    # Item is either a Relationship,
    # or a Node
    pass

for nodes in path.nodes:
    # All nodes in a path
    pass

for nodes in path.relationships:
    # All relationships in a path
    pass

Indexes

In order to rapidly find nodes or relationship based on properties, Neo4j supports indexing. This is commonly used to find start nodes for traversals.

By default, the underlying index is powered by Apache Lucene, but it is also possible to use Neo4j with other index implementations.

You can create an arbitrary number of named indexes. Each index handles either nodes or relationships, and each index works by indexing key/value/object triplets, object being either a node or a relationship, depending on the index type.

Index management

Just like the rest of the API, all write operations to the index must be performed from within a transaction.

Creating an index

Create a new index, with optional configuration.

with db.transaction:
    # Create a relationship index
    rel_idx = db.relationship.indexes.create('my_rels')

    # Create a node index, passing optional
    # arguments to the index provider.
    # In this case, enable full-text indexing.
    node_idx = db.node.indexes.create('my_nodes', type='fulltext')
Retrieving a pre-existing index
with db.transaction:
    node_idx = db.node.indexes.get('my_nodes')

    rel_idx = db.relationship.indexes.get('my_rels')
Deleting indexes
with db.transaction:
    node_idx = db.node.indexes.get('my_nodes')
    node_idx.delete()

    rel_idx = db.relationship.indexes.get('my_rels')
    rel_idx.delete()
Checking if an index exists
exists = db.node.indexes.exists('my_nodes')

Indexing things

Adding nodes or relationships to an index
with db.transaction:
    # Indexing nodes
    a_node = db.node()
    node_idx = db.node.indexes.create('my_nodes')

    # Add the node to the index
    node_idx['akey']['avalue'] = a_node

    # Indexing relationships
    a_relationship = a_node.knows(db.node())
    rel_idx = db.relationship.indexes.create('my_rels')

    # Add the relationship to the index
    rel_idx['akey']['avalue'] = a_relationship
Removing indexed items

Removing items from an index can be done at several levels of granularity. See the example below.

# Remove specific key/value/item triplet
del idx['akey']['avalue'][item]

# Remove all instances under a certain
# key
del idx['akey'][item]

# Remove all instances all together
del idx[item]

Searching the index

You can retrieve indexed items in two ways. Either you do a direct lookup, or you perform a query. The direct lookup is the same across different index providers while the query syntax depends on what index provider you use. As mentioned previously, Lucene is the default and by far most common index provider.

There is a python library for programatically generating Lucene queries, available at GitHub.

Important
Unless you loop through the entire index result, you have to close the result when you are done with it. If you do not, the database does not know when it can release the resources the result is taking up.
Direct lookups
hits = idx['akey']['avalue']
for item in hits:
    pass

# Always close index results when you are
# done, to free up resources.
hits.close()
Querying
hits = idx.query('akey:avalue')
for item in hits:
    pass

# Always close index results when you are
# done, to free up resources.
hits.close()

Cypher Queries

You can use the Cypher query language from neo4j-embedded. Read more about cypher syntax and cool stuff you can with it here: Cypher Reference.

Querying and reading the result

Basic query

To execute a plain text cypher query, do this:

result = db.query("START n=node(0) RETURN n")
Retrieve query result

Cypher returns a tabular result. You can either loop through the table row-by-row, or you can loop through the values in a given column. Here is how to loop row-by-row:

root_node = "START n=node(0) RETURN n"

# Iterate through all result rows
for row in db.query(root_node):
    node = row['n']

# We know it's a single result,
# so we could have done this as well
node = db.query(root_node).single['n']

Here is how to loop through the values of a given column:

root_node = "START n=node(0) RETURN n"

# Fetch an iterator for the "n" column
column = db.query(root_node)['n']

for cell in column:
    node = cell

# Coumns support "single":
column = db.query(root_node)['n']
node = column.single
List the result columns

You can get a list of the column names in the result like this:

result = db.query("START n=node(0) RETURN n,count(n)")

# Get a list of the column names
columns = result.keys()

Parameterized and prepared queries

Parameterized queries

Cypher supports parameterized queries, see Cypher Parameters. This is how you use them in neo4j-embedded.

result = db.query("START n=node({id}) RETURN n",id=0)

node = result.single['n']
Prepared queries

Prepared queries, where you could retrieve a pre-parsed version of a cypher query to be used later, is deprecated. Cypher will recognize if it has previously parsed a given query, and won’t parse the same string twice.

So, in effect, all cypher queries are prepared queries, if you use them more than once. Use parameterized queries to gain the full power of this - then a generic query can be pre-parsed, and modified with parameters each time it is executed.

Traversals

Warning
Traversal support in neo4j-embedded for python is deprecated as of Neo4j 1.7 GA. Please see Cypher or the core API instead. This is done because the traversal framework requires a very tight coupling between the JVM and python. To keep improving performance, we need to break that coupling.

The below documentation will be removed in neo4j-embedded 1.8, and support for traversals will be dropped in neo4j-embedded 1.9.

The traversal API used here is essentially the same as the one used in the Java API, with a few modifications.

Traversals start at a given node and uses a set of rules to move through the graph and to decide what parts of the graph to return.

Basic traversals

Following a relationship

The most basic traversals simply follow certain relationship types, and return everything they encounter. By default, each node is visited only once, so there is no risk of infinite loops.

traverser = db.traversal()\
    .relationships('related_to')\
    .traverse(start_node)

# The graph is traversed as
# you loop through the result.
for node in traverser.nodes:
    pass
Following a relationship in a specific direction

You can tell the traverser to only follow relationships in some specific direction.

from neo4j import OUTGOING, INCOMING, ANY

traverser = db.traversal()\
    .relationships('related_to', OUTGOING)\
    .traverse(start_node)
Following multiple relationship types

You can specify an arbitrary number of relationship types and directions to follow.

from neo4j import OUTGOING, INCOMING, ANY

traverser = db.traversal()\
    .relationships('related_to', INCOMING)\
    .relationships('likes')\
    .traverse(start_node)

Traversal results

A traversal can give you one of three different result types: nodes, relationships or paths.

Traversals are performed lazily, which means that the graph is traversed as you loop through the result.

traverser = db.traversal()\
    .relationships('related_to')\
    .traverse(start_node)

# Get each possible path
for path in traverser:
    pass

# Get each node
for node in traverser.nodes:
    pass

# Get each relationship
for relationship in traverser.relationships:
    pass

Uniqueness

To avoid infinite loops, it’s important to define what parts of the graph can be re-visited during a traversal. By default, uniqueness is set to NODE_GLOBAL, which means that each node is only visited once.

Here are the other options that are available.

from neo4j import Uniqueness

# Available options are:

Uniqueness.NONE
# Any position in the graph may be revisited.

Uniqueness.NODE_GLOBAL
# Default option
# No node in the entire graph may be visited
# more than once. This could potentially
# consume a lot of memory since it requires
# keeping an in-memory data structure
# remembering all the visited nodes.

Uniqueness.RELATIONSHIP_GLOBAL
# No relationship in the entire graph may be
# visited more than once. For the same
# reasons as NODE_GLOBAL uniqueness, this
# could use up a lot of memory. But since
# graphs typically have a larger number of
# relationships than nodes, the memory
# overhead of this uniqueness level could
# grow even quicker.

Uniqueness.NODE_PATH
# A node may not occur previously in the
# path reaching up to it.

Uniqueness.RELATIONSHIP_PATH
# A relationship may not occur previously in
# the path reaching up to it.

Uniqueness.NODE_RECENT
# Similar to NODE_GLOBAL uniqueness in that
# there is a global collection of visited
# nodes each position is checked against.
# This uniqueness level does however have a
# cap on how much memory it may consume in
# the form of a collection that only
# contains the most recently visited nodes.
# The size of this collection can be
# specified by providing a number as the
# second argument to the
# uniqueness()-method along with the
# uniqueness level.

Uniqueness.RELATIONSHIP_RECENT
# works like NODE_RECENT uniqueness, but
# with relationships instead of nodes.


traverser = db.traversal()\
    .uniqueness(Uniqueness.NODE_PATH)\
    .traverse(start_node)

Ordering

You can traverse either depth first, or breadth first. Depth first is the default, because it has lower memory overhead.

# Depth first traversal, this
# is the default.
traverser = db.traversal()\
    .depthFirst()\
    .traverse(self.source)

# Breadth first traversal
traverser = db.traversal()\
    .breadthFirst()\
    .traverse(start_node)

Evaluators - advanced filtering

In order to traverse based on other critera, such as node properties, or more complex things like neighboring nodes or patterns, we use evaluators. An evaluator is a normal Python method that takes a path as an argument, and returns a description of what to do next.

The path argument is the current position the traverser is at, and the description of what to do can be one of four things, as seen in the example below.

from neo4j import Evaluation

# Evaluation contains the four
# options that an evaluator can
# return. They are:

Evaluation.INCLUDE_AND_CONTINUE
# Include this node in the result and
# continue the traversal

Evaluation.INCLUDE_AND_PRUNE
# Include this node in the result, but don't
# continue the traversal

Evaluation.EXCLUDE_AND_CONTINUE
# Exclude this node from the result, but
# continue the traversal

Evaluation.EXCLUDE_AND_PRUNE
# Exclude this node from the result and
# don't continue the traversal

# An evaluator
def my_evaluator(path):
    # Filter on end node property
    if path.end['message'] == 'world':
        return Evaluation.INCLUDE_AND_CONTINUE

    # Filter on last relationship type
    if path.last_relationship.type.name() == 'related_to':
        return Evaluation.INCLUDE_AND_PRUNE

    # You can do even more complex things here, like subtraversals.

    return Evaluation.EXCLUDE_AND_CONTINUE

# Use the evaluator
traverser = db.traversal()\
    .evaluator(my_evaluator)\
    .traverse(start_node)