5.2. Setup and configuration

Prev		Next

5.2.1. Small
5.2.2. Medium
5.2.3. Large
5.2.4. Installation Notes

Neo4j HA can be set up to accommodate differing requirements for load, fault tolerance and available hardware.

Within a cluster, Neo4j HA uses Apache ZooKeeper ^[2] for master election and propagation of general cluster and machine status information. ZooKeeper can be seen as a distributed coordination service. Neo4j HA requires a ZooKeeper service for initial master election, new master election (current master failing) and to publish general status information about the current Neo4j HA cluster (for example when a machine joined or left the cluster). Read operations through the GraphDatabaseService API will always work and even writes can survive ZooKeeper failures if a master is present.

ZooKeeper requires a majority of the ZooKeeper instances to be available to operate properly. This means that the number of ZooKeeper instances should always be an odd number since that will make best use of available hardware.

To further clarify the fault tolerance characteristics of Neo4j HA here are a few example setups:

5.2.1. Small

3 physical (or virtual) machines
1 ZooKeeper instance running on each machine
1 Neo4j HA instance running on each machine

This setup is conservative in the use of hardware while being able to handle moderate read load. It can fully operate when at least 2 of the ZooKeeper instances are running. Since the ZooKeeper service and Neo4j HA are running together on each machine this will in most scenarios mean that only one server is allowed to go down.

5.2.2. Medium

5-7+ machines
ZooKeeper running on 3, 5 or 7 machines
Neo4j HA can run on 5+ machines

This setup may mean that two different machine setups have to be managed (some machines run both ZooKeeper and Neo4j HA). The fault tolerance will depend on how many machines there are that are running ZooKeeper. With 3 ZooKeeper instances the cluster can survive one ZooKeeper going down, with 5 it can survive 2 and with 7 it can handle 3 ZooKeeper instances failing. The number of Neo4j HA instances that can fail for normal operations is theoretically all but 1 (but for each required master election the ZooKeeper service must be available).

5.2.3. Large

8+ total machines
3+ Neo4j HA machines
5+ Zookeeper, on separate dedicated machines

In this setup all ZooKeeper instances are running on separate machines as a dedicated ZooKeeper service. The dedicated ZooKeeper cluster can handle half of the instances, minus 1, going down. The Neo4j HA cluster will be able to operate with at least a single live machine. Adding more Neo4j HA instances is very easy in this setup since Zookeeper is operating as a separate service.

5.2.4. Installation Notes

For installation instructions of a High Availability cluster please visit the Neo4j Wiki ^[3].

Note that while the HighlyAvailableGraphDatabase supports the same API as the EmbeddedGraphDatabase, it does have additional configuration parameters.

Table 5.1. HighlyAvailableGraphDatabase configuration parameters

Parameter Name	Value	Example value	Required?
`ha.machine_id`	integer > 0	`1`	yes
`ha.server`	host & port to bind when acting as master	`localhost:6001`	yes
`ha.zoo_keeper_servers`	comma delimited zookeeper connections	`localhost:2181,localhost:2182,localhost:2183`	yes
`ha.pull_interval`	interval for polling master from a slave, in seconds	`30`	no

^[2]http://hadoop.apache.org/zookeeper/

^[3]http://wiki.neo4j.org/content/High_Availability_Cluster