-------------------------------------------------------------------------------
The Neo4j Manual v1.8-SNAPSHOT
-------------------------------------------------------------------------------
The Neo4j Team neo4j.org neotechnology.com
Copyright © 2012 Neo Technology
License: Creative Commons 3.0
2012-05-16 14:44:41
Quickstart
* Section 2.1, “What is a Graph Database?”
* Chapter 4, Using Neo4j embedded in Java applications
* Chapter 11, Using Neo4j embedded in Python applications
* Chapter 18, Neo4j Server
* Chapter 19, REST API
-------------------------------------------------------------------------------
Table of Contents
Preface
I. Introduction
1. Neo4j Highlights
2. Graph Database Concepts
2.1. What is a Graph Database?
2.1.1. A Graph contains Nodes and Relationships
2.1.2. Relationships organize the Graph
2.1.3. Query a Graph with a Traversal
2.1.4. Indexes look-up Nodes or Relationships
2.1.5. Neo4j is a Graph Database
2.2. Comparing Database Models
2.2.1. A Graph Database transforms a RDBMS
2.2.2. A Graph Database elaborates a Key-Value Store
2.2.3. A Graph Database relates Column-Family
2.2.4. A Graph Database navigates a Document Store
3. The Neo4j Graph Database
3.1. Nodes
3.2. Relationships
3.3. Properties
3.4. Paths
3.5. Traversal
II. Tutorials
4. Using Neo4j embedded in Java applications
4.1. Include Neo4j in your project
4.1.1. Add Neo4j to the build path
4.1.2. Add Neo4j as a dependency
4.1.3. Starting and stopping
4.2. Hello World
4.2.1. Prepare the database
4.2.2. Wrap mutating operations in a transaction
4.2.3. Create a small graph
4.2.4. Print the result
4.2.5. Remove the data
4.2.6. Shut down the database server
4.3. User database with index
4.4. Basic unit testing
4.5. Traversal
4.5.1. The Matrix
4.5.2. New traversal framework
4.5.3. Uniqueness of Paths in traversals
4.5.4. Social network
4.6. Domain entities
4.7. Graph Algorithm examples
4.8. Reading a management attribute
4.9. OSGi setup
4.9.1. Simple OSGi Activator scenario
4.10. Execute Cypher Queries from Java
5. Cypher Cookbook
5.1. Hyperedges and Cypher
5.1.1. Find Groups
5.1.2. Find all groups and roles for a user
5.1.3. Find common groups based on shared roles
5.2. Basic Friend finding based on social neighborhood
5.2.1. Simple Friend Finder
5.3. Co-favorited places
5.3.1. Co-Favorited Places - Users Who Like x Also Like y
5.3.2. Co-Tagged Places - Places Related through Tags
5.4. Find people based on similar favorites
5.4.1. Find people based on similar favorites
5.5. Find people based on mutual friends and groups
5.5.1. Find mutual friends and groups
5.6. Find friends based on similar tagging
5.6.1. Find people based on similar tagged favorties
5.7. Multirelational (social) graphs
5.7.1. Who FOLLOWS or LOVES me back
5.8. A multilevel indexing structure (path tree)
5.8.1. Return zero range
5.8.2. Return the full range
5.8.3. Return partly shared path ranges
6. Using the Neo4j REST API
6.1. How to use the REST API from Java
6.1.1. Creating a graph through the REST API from Java
6.1.2. Start the server
6.1.3. Creating a node
6.1.4. Adding properties
6.1.5. Adding relationships
6.1.6. Add properties to a relationship
6.1.7. Querying graphs
6.1.8. Phew, is that it?
6.1.9. What’s next?
6.1.10. Appendix: the code
7. Extending the Neo4j Server
7.1. Server Plugins
7.2. Unmanaged Extensions
8. The Traversal Framework
8.1. Main concepts
8.2. Traversal Framework Java API
8.2.1. TraversalDescription
8.2.2. Evaluator
8.2.3. Traverser
8.2.4. Uniqueness
8.2.5. Order - How to move through branches?
8.2.6. BranchSelector
8.2.7. Path
8.2.8. RelationshipExpander
8.2.9. Expander
8.2.10. How to use the Traversal framework
9. Domain Modeling Gallery
9.1. User roles in graphs
9.1.1. Get the admins
9.1.2. Get the group memberships of a user
9.1.3. Get all groups
9.1.4. Get all members of all groups
9.2. ACL structures in graphs
9.2.1. Generic approach
9.2.2. Read-permission example
10. Languages
11. Using Neo4j embedded in Python applications
11.1. Hello, world!
11.2. A sample app using traversals and indexes
11.2.1. Domain logic
11.2.2. Creating data and getting it back
III. Reference
12. Capabilities
12.1. Data Security
12.2. Data Integrity
12.2.1. Core Graph Engine
12.2.2. Different Data Sources
12.3. Data Integration
12.3.1. Event-based Synchronization
12.3.2. Periodic Synchronization
12.3.3. Periodic Full Export/Import of Data
12.4. Availability and Reliability
12.4.1. Operational Availability
12.4.2. Disaster Recovery/ Resiliency
12.5. Capacity
12.5.1. File Sizes
12.5.2. Read speed
12.5.3. Write speed
12.5.4. Data size
13. Transaction Management
13.1. Interaction cycle
13.2. Isolation levels
13.3. Default locking behavior
13.4. Deadlocks
13.5. Delete semantics
13.6. Creating unique nodes
13.6.1. Single thread
13.6.2. Get or create
13.6.3. Pessimistic locking
13.7. Transaction events
14. Data Import
14.1. Batch Insertion
14.1.1. Batch Inserter Examples
14.1.2. Batch Graph Database
14.1.3. Index Batch Insertion
15. Indexing
15.1. Introduction
15.2. Create
15.3. Delete
15.4. Add
15.5. Remove
15.6. Update
15.7. Search
15.7.1. Get
15.7.2. Query
15.8. Relationship indexes
15.9. Scores
15.10. Configuration and fulltext indexes
15.11. Extra features for Lucene indexes
15.11.1. Numeric ranges
15.11.2. Sorting
15.11.3. Querying with Lucene Query objects
15.11.4. Compound queries
15.11.5. Default operator
15.11.6. Caching
15.12. Automatic Indexing
15.12.1. Configuration
15.12.2. Search
15.12.3. Runtime Configuration
15.12.4. Updating the Automatic Index
16. Cypher Query Language
16.1. Operators
16.2. Expressions
16.3. Parameters
16.4. Identifiers
16.5. Comments
16.6. Updating the graph with Cypher
16.6.1. Updating query structure
16.6.2. Query Parts & Structure
16.6.3. Returning data
16.7. Transactions and Cypher
16.8. Start
16.8.1. Node by id
16.8.2. Relationship by id
16.8.3. Multiple nodes by id
16.8.4. All nodes
16.8.5. Node by index lookup
16.8.6. Relationship by index lookup
16.8.7. Node by index query
16.8.8. Multiple start points
16.9. Match
16.9.1. introduction
16.9.2. Related nodes
16.9.3. Outgoing relationships
16.9.4. Directed relationships and identifier
16.9.5. Match by relationship type
16.9.6. Match by multiple relationship types
16.9.7. Match by relationship type and use an identifier
16.9.8. Relationship types with uncommon characters
16.9.9. Multiple relationships
16.9.10. Variable length relationships
16.9.11. Relationship identifier in variable length relationships
16.9.12. Zero length paths
16.9.13. Optional relationship
16.9.14. Optional typed and named relationship
16.9.15. Properties on optional elements
16.9.16. Complex matching
16.9.17. Shortest path
16.9.18. All shortest paths
16.9.19. Named path
16.9.20. Matching on a bound relationship
16.10. Where
16.10.1. Boolean operations
16.10.2. Filter on node property
16.10.3. Regular expressions
16.10.4. Escaping in regular expressions
16.10.5. Case insensitive regular expressions
16.10.6. Filtering on relationship type
16.10.7. Property exists
16.10.8. Default true if property is missing
16.10.9. Default false if property is missing
16.10.10. Filter on null values
16.10.11. Filter on relationships
16.10.12. IN operator
16.11. Return
16.11.1. Return nodes
16.11.2. Return relationships
16.11.3. Return property
16.11.4. Return all elements
16.11.5. Identifier with uncommon characters
16.11.6. Column alias
16.11.7. Optional properties
16.11.8. Unique results
16.12. Aggregation
16.12.1. Introduction
16.12.2. COUNT
16.12.3. Count nodes
16.12.4. Group Count Relationship Types
16.12.5. Count entities
16.12.6. Count non-null values
16.12.7. SUM
16.12.8. AVG
16.12.9. MAX
16.12.10. MIN
16.12.11. COLLECT
16.12.12. DISTINCT
16.13. Order by
16.13.1. Order nodes by property
16.13.2. Order nodes by multiple properties
16.13.3. Order nodes in descending order
16.13.4. Ordering null
16.14. Skip
16.14.1. Skip first three
16.14.2. Return middle two
16.15. Limit
16.15.1. Return first part
16.16. With
16.16.1. Filter on aggregate function results
16.16.2. Alternative syntax of with
16.17. Create
16.17.1. Create single node
16.17.2. Create single node and set properties
16.17.3. Return created node
16.17.4. Create a relationship between two nodes
16.17.5. Create a relationship and set properties
16.17.6. Create single node from map
16.17.7. Create multiple nodes from maps
16.18. Delete
16.18.1. Delete single node
16.18.2. Remove a node and connected relationships
16.18.3. Remove a property
16.19. Set
16.19.1. Set a property
16.20. Relate
16.20.1. Create relationship if it is missing
16.20.2. Create node if missing
16.20.3. Create nodes with values
16.20.4. Create relationship with values
16.21. Foreach
16.21.1. Mark all nodes along a path
16.22. Functions
16.22.1. Predicates
16.22.2. Scalar functions
16.22.3. Iterable functions
16.22.4. Mathematical functions
16.23. Compatibility
17. Graph Algorithms
17.1. Introduction
18. Neo4j Server
18.1. Server Installation
18.1.1. As a Windows service
18.1.2. Linux Service
18.1.3. Mac OSX
18.1.4. Multiple Server instances on one machine
18.2. Server Configuration
18.2.1. Important server configurations parameters
18.2.2. Neo4j Database performance configuration
18.2.3. Server logging configuration
18.2.4. HTTP logging configuration
18.2.5. Other configuration options
18.3. Setup for remote debugging
18.4. Using the server (including web administration) with an embedded
database
18.4.1. Getting the libraries
18.4.2. Starting the Server from Java
18.4.3. Providing custom configuration
18.5. Server Performance Tuning
18.5.1. Specifying Neo4j tuning properties
18.5.2. Specifying JVM tuning properties
18.6. Server Installation in the Cloud
18.6.1. Heroku
19. REST API
19.1. Service root
19.1.1. Get service root
19.2. Nodes
19.2.1. Create Node
19.2.2. Create Node with properties
19.2.3. Get node
19.2.4. Get non-existent node
19.2.5. Delete node
19.2.6. Nodes with relationships can not be deleted
19.3. Relationships
19.3.1. Get Relationship by ID
19.3.2. Create relationship
19.3.3. Create a relationship with properties
19.3.4. Delete relationship
19.3.5. Get all properties on a relationship
19.3.6. Set all properties on a relationship
19.3.7. Get single property on a relationship
19.3.8. Set single property on a relationship
19.3.9. Get all relationships
19.3.10. Get incoming relationships
19.3.11. Get outgoing relationships
19.3.12. Get typed relationships
19.3.13. Get relationships on a node without relationships
19.4. Relationship types
19.4.1. Get relationship types
19.5. Node properties
19.5.1. Set property on node
19.5.2. Update node properties
19.5.3. Get properties for node
19.5.4. Property values can not be null
19.5.5. Property values can not be nested
19.5.6. Delete all properties from node
19.5.7. Delete a named property from a node
19.6. Relationship properties
19.6.1. Update relationship properties
19.6.2. Remove property from a relationship
19.6.3. Remove non-existent property from a relationship
19.6.4. Remove properties from a non-existing relationship
19.6.5. Remove property from a non-existing relationship
19.7. Indexes
19.7.1. Create node index
19.7.2. Create node index with configuration
19.7.3. Delete node index
19.7.4. List node indexes
19.7.5. Add node to index
19.7.6. Remove all entries with a given node from an index
19.7.7. Remove all entries with a given node and key from an index
19.7.8. Remove all entries with a given node, key and value from an
index
19.7.9. Find node by exact match
19.7.10. Find node by query
19.8. Unique Indexes
19.8.1. Create a unique node in an index
19.8.2. Create a unique node in an index (the case where it exists)
19.8.3. Add a node to an index unless a node already exists for the
given mapping
19.8.4. Create a unique relationship in an index
19.8.5. Add a relationship to an index unless a relationship
already exists for the given mapping
19.9. Automatic Indexes
19.9.1. Find node by exact match from an automatic index
19.9.2. Find node by query from an automatic index
19.10. Configurable Automatic Indexing
19.10.1. Create an auto index for nodes with specific configuration
19.10.2. Create an auto index for relationships with specific
configuration
19.10.3. Get current status for autoindexing on nodes
19.10.4. Enable node autoindexing
19.10.5. Lookup list of properties being autoindexed
19.10.6. Add a property for autoindexing on nodes
19.10.7. Remove a property for autoindexing on nodes
19.11. Traversals
19.11.1. Traversal using a return filter
19.11.2. Return relationships from a traversal
19.11.3. Return paths from a traversal
19.11.4. Traversal returning nodes below a certain depth
19.11.5. Creating a paged traverser
19.11.6. Paging through the results of a paged traverser
19.11.7. Paged traverser page size
19.11.8. Paged traverser timeout
19.12. Cypher queries
19.12.1. Send a Query
19.12.2. Return paths
19.12.3. Send queries with parameters
19.12.4. Nested results
19.12.5. Server errors
19.13. Built-in Graph Algorithms
19.13.1. Find all shortest paths
19.13.2. Find one of the shortest paths between nodes
19.13.3. Execute a Dijkstra algorithm with similar weights on
relationships
19.13.4. Execute a Dijkstra algorithm with weights on relationships
19.14. Batch operations
19.14.1. Execute multiple operations in batch
19.14.2. Refer to items created earlier in the same batch job
19.14.3. Execute multiple operations in batch streaming
19.15. Cypher Plugin
19.15.1. Send a Query
19.15.2. Return paths
19.15.3. Send queries with parameters
19.15.4. Server errors
19.16. Gremlin Plugin
19.16.1. Send a Gremlin Script - URL encoded
19.16.2. Load a sample graph
19.16.3. Sort a result using raw Groovy operations
19.16.4. Send a Gremlin Script - JSON encoded with table results
19.16.5. Returning nested pipes
19.16.6. Set script variables
19.16.7. Send a Gremlin Script with variables in a JSON Map
19.16.8. Return paths from a Gremlin script
19.16.9. Send an arbitrary Groovy script - Lucene sorting
19.16.10. Emit a sample graph
19.16.11. HyperEdges - find user roles in groups
19.16.12. Group count
19.16.13. Collect multiple traversal results
19.16.14. Collaborative filtering
19.16.15. Chunking and offsetting in Gremlin
19.16.16. Modify the graph while traversing
19.16.17. Flow algorithms with Gremlin
19.16.18. Script execution errors
20. Python embedded bindings
20.1. Installation
20.1.1. Installation on OSX/Linux
20.1.2. Installation on Windows
20.2. Core API
20.2.1. Getting started
20.2.2. Transactions
20.2.3. Nodes
20.2.4. Relationships
20.2.5. Properties
20.2.6. Paths
20.3. Indexes
20.3.1. Index management
20.3.2. Indexing things
20.3.3. Searching the index
20.4. Cypher Queries
20.4.1. Querying and reading the result
20.4.2. Parameterized and prepared queries
20.5. Traversals
20.5.1. Basic traversals
20.5.2. Traversal results
20.5.3. Uniqueness
20.5.4. Ordering
20.5.5. Evaluators - advanced filtering
IV. Operations
21. Installation & Deployment
21.1. Deployment Scenarios
21.1.1. Server
21.1.2. Embedded
21.2. System Requirements
21.2.1. CPU
21.2.2. Memory
21.2.3. Disk
21.2.4. Filesystem
21.2.5. Software
21.2.6. JDK Version
21.3. Installation
21.3.1. Embedded Installation
21.3.2. Server Installation
21.4. Upgrading
21.4.1. Automatic Upgrade
21.4.2. Explicit Upgrade
21.4.3. Upgrade 1.6 → 1.7
21.4.4. Upgrade 1.5 → 1.6
21.4.5. Upgrade 1.4 → 1.5
21.5. Usage Data Collector
21.5.1. Technical Information
21.5.2. How to disable UDC
22. Configuration & Performance
22.1. Introduction
22.1.1. How to add configuration settings
22.2. Performance Guide
22.2.1. Try this first
22.2.2. Neo4j primitives' lifecycle
22.2.3. Configuring Neo4j
22.3. Caches in Neo4j
22.3.1. File buffer cache
22.3.2. Object cache
22.4. JVM Settings
22.4.1. Configuring heap size and GC
22.5. Compressed storage of short strings
22.6. Compressed storage of short arrays
22.7. Memory mapped IO settings
22.7.1. Optimizing for traversal speed example
22.7.2. Batch insert example
22.8. Linux Performance Guide
22.8.1. Setup
22.8.2. Running the benchmark
22.8.3. Fixing the problem
22.9. Linux specific notes
22.9.1. File system tuning for high IO
22.9.2. Setting the number of open files
23. High Availability
23.1. Architecture
23.2. Setup and configuration
23.2.1. Small
23.2.2. Medium
23.2.3. Large
23.2.4. Installation Notes
23.3. How Neo4j HA operates
23.4. High Availability setup tutorial
23.4.1. Background
23.4.2. Setup and start the Coordinator cluster
23.4.3. Start the Neo4j Servers in HA mode
23.4.4. Start Neo4j Embedded in HA mode
23.5. Setting up HAProxy as a load balancer
23.5.1. Installing HAProxy
23.5.2. Configuring HAProxy
23.5.3. Configuring separate sets for master and slaves
23.5.4. Cache-based sharding with HAProxy
24. Backup
24.1. Embedded and Server
24.2. Online Backup from Java
24.3. High Availability
24.4. Restoring Your Data
25. Security
25.1. Securing access to the Neo4j Server
25.1.1. Secure the port and remote client connection accepts
25.1.2. Arbitrary code execution
25.1.3. HTTPS support
25.1.4. Server Authorization Rules
25.1.5. Hosted Scripting
25.1.6. Security in Depth
25.1.7. Rewriting URLs with a Proxy installation
26. Monitoring
26.1. JMX
26.1.1. Adjusting remote JMX access to the Neo4j Server
26.1.2. How to connect to a Neo4j instance using JMX and JConsole
26.1.3. How to connect to the JMX monitoring programmatically
26.1.4. Reference of supported JMX MBeans
V. Tools
27. Web Administration
27.1. Dashboard tab
27.1.1. Entity chart
27.1.2. Status monitoring
27.2. Data tab
27.3. Console tab
27.4. The Server Info tab
28. Neo4j Shell
28.1. Starting the shell
28.1.1. Enabling the shell server
28.1.2. Connecting to a shell server
28.1.3. Pointing the shell to a path
28.1.4. Read-only mode
28.1.5. Run a command and then exit
28.2. Passing options and arguments
28.3. Enum options
28.4. Filters
28.5. Node titles
28.6. How to use (individual commands)
28.6.1. Current node/relationship and path
28.6.2. Listing the contents of a node/relationship
28.6.3. Creating nodes and relationships
28.6.4. Setting, renaming and removing properties
28.6.5. Deleting nodes and relationships
28.6.6. Environment variables
28.6.7. Executing groovy/python scripts
28.6.8. Traverse
28.6.9. Query with Cypher
28.6.10. Indexing
28.7. Extending the shell: Adding your own commands
28.8. An example shell session
28.9. A Matrix example
VI. Community
29. Community Support
30. Contributing to Neo4j
30.1. Contributor License Agreement
30.1.1. Summary
30.1.2. Common questions
30.1.3. How to sign
30.2. Writing Neo4j Documentation
30.2.1. Overall Flow
30.2.2. File Structure in docs.jar
30.2.3. Headings and document structure
30.2.4. Writing
30.2.5. Gotchas
30.2.6. Links
30.2.7. Text Formatting
30.2.8. Admonitions
30.2.9. Images
30.2.10. Attributes
30.2.11. Comments
30.2.12. Code Snippets
30.2.13. A sample Java based documentation test
30.2.14. Hello world Sample Chapter
30.2.15. Integrated Live Console
30.2.16. Toolchain
30.3. Areas for contribution
30.3.1. Neo4j Distribution
30.3.2. Maintaining Neo4j Documentation
30.3.3. Drivers and bindings to Neo4j
30.4. Contributors
A. Manpages
neo4j — Neo4j Server control and management
neo4j-shell — a command-line tool for exploring and manipulating a graph
database
neo4j-backup — Neo4j Backup Tool
neo4j-coordinator — Neo4j Coordinator for High-Availability clusters
neo4j-coordinator-shell — Neo4j Coordinator Shell interactive interface
B. Questions & Answers
List of Figures
2.1. RDBMS
2.2. Graph Database as RDBMS
2.3. Key-Value Store
2.4. Graph Database as Key-Value Store
2.5. Document Store
2.6. Graph Database as Document Store
4.1. Hello World Graph
4.2. Node space view of users
4.3. Matrix node space view
4.4. Descendants Example Graph
4.5. Social network data model
8.1. Traversal Example Graph
15.1. Movie and Actor Graph
16.1. Example Graph
19.1. Final Graph
19.2. Final Graph
19.3. Final Graph
19.4. Final Graph
19.5. Final Graph
19.6. Final Graph
19.7. Final Graph
19.8. Final Graph
19.9. Final Graph
19.10. Starting Graph
19.11. Final Graph
19.12. Starting Graph
19.13. Final Graph
19.14. Final Graph
19.15. Starting Graph
19.16. Final Graph
19.17. Final Graph
19.18. Starting Graph
19.19. Final Graph
19.20. Final Graph
19.21. Final Graph
19.22. Final Graph
19.23. Final Graph
19.24. Final Graph
19.25. Final Graph
19.26. Final Graph
19.27. Final Graph
19.28. Final Graph
19.29. Final Graph
19.30. Final Graph
19.31. Final Graph
19.32. Starting Graph
19.33. Final Graph
19.34. Final Graph
19.35. Starting Graph
19.36. Final Graph
19.37. Final Graph
19.38. Final Graph
19.39. Final Graph
19.40. Final Graph
19.41. Final Graph
19.42. Final Graph
19.43. Final Graph
19.44. Final Graph
19.45. Final Graph
19.46. Final Graph
19.47. Final Graph
19.48. Final Graph
19.49. Final Graph
19.50. Final Graph
19.51. Final Graph
19.52. Final Graph
19.53. Final Graph
19.54. Final Graph
19.55. Final Graph
19.56. Final Graph
19.57. Final Graph
19.58. Final Graph
19.59. Final Graph
19.60. Final Graph
19.61. Final Graph
19.62. Final Graph
19.63. Final Graph
19.64. Final Graph
19.65. Final Graph
19.66. Final Graph
19.67. Final Graph
19.68. Final Graph
19.69. Final Graph
19.70. Final Graph
19.71. Final Graph
19.72. Final Graph
19.73. Final Graph
19.74. Final Graph
19.75. Final Graph
19.76. Final Graph
19.77. Final Graph
19.78. Final Graph
19.79. Final Graph
19.80. Final Graph
19.81. Final Graph
19.82. Final Graph
19.83. Final Graph
19.84. Final Graph
19.85. Final Graph
19.86. Final Graph
19.87. Final Graph
19.88. Final Graph
19.89. Final Graph
19.90. Final Graph
19.91. Final Graph
19.92. Final Graph
19.93. Final Graph
19.94. Final Graph
19.95. Final Graph
19.96. Final Graph
19.97. Starting Graph
19.98. Final Graph
19.99. Starting Graph
19.100. Final Graph
19.101. Final Graph
23.1. Typical setup when running multiple Neo4j instances in HA mode
26.1. Connecting JConsole to the Neo4j Java process
26.2. Neo4j MBeans View
27.1. Web Administration Dashboard
27.2. Entity charting
27.3. Status indicator panels
27.4. Browsing and manipulating data
27.5. Editing properties
27.6. Traverse data with Gremlin
27.7. Query data with Cypher
27.8. Interact over HTTP
27.9. JMX Attributes
28.1. Shell Matrix Example
30.1. Hello World Graph
List of Tables
3.1. Using relationship direction and type
3.2. Property value types
5.1. Result
5.2. Result
5.3. Result
5.4. Result
5.5. Result
5.6. Result
5.7. Result
5.8. Result
5.9. Result
5.10. Result
5.11. Result
5.12. Result
5.13. Result
6.1. Neo4j REST clients contributed by the community.
10.1. Neo4j embedded drivers contributed by the community.
15.1. Lucene indexing configuration parameters
16.1. Result
16.2. Result
16.3. Result
16.4. Result
16.5. Result
16.6. Result
16.7. Result
16.8. Result
16.9. Result
16.10. Result
16.11. Result
16.12. Result
16.13. Result
16.14. Result
16.15. Result
16.16. Result
16.17. Result
16.18. Result
16.19. Result
16.20. Result
16.21. Result
16.22. Result
16.23. Result
16.24. Result
16.25. Result
16.26. Result
16.27. Result
16.28. Result
16.29. Result
16.30. Result
16.31. Result
16.32. Result
16.33. Result
16.34. Result
16.35. Result
16.36. Result
16.37. Result
16.38. Result
16.39. Result
16.40. Result
16.41. Result
16.42. Result
16.43. Result
16.44. Result
16.45. Result
16.46. Result
16.47. Result
16.48. Result
16.49. Result
16.50. Result
16.51. Result
16.52. Result
16.53. Result
16.54. Result
16.55. Result
16.56. Result
16.57. Result
16.58. Result
16.59. Result
16.60. Result
16.61. Result
16.62. Result
16.63. Result
16.64. Result
16.65. Result
16.66. Result
16.67. Result
16.68. Result
16.69. Result
16.70. Result
16.71. Result
16.72. Result
16.73. Result
16.74. Result
16.75. Result
16.76. Result
16.77. Result
16.78. Result
16.79. Result
16.80. Result
16.81. Result
16.82. Result
16.83. Result
16.84. Result
16.85. Result
16.86. Result
16.87. Result
16.88. Result
16.89. Result
16.90. Result
16.91. Result
16.92. Result
16.93. Result
16.94. Result
16.95. Result
16.96. Result
16.97. Result
16.98. Result
16.99. Result
16.100. Result
16.101. Result
16.102. Result
18.1. neo4j-wrapper.conf JVM tuning properties
21.1. Neo4j deployment options
21.2. Neo4j editions
21.3. Upgrade process for Neo4J version
22.1. Guidelines for heap size
23.1. HighlyAvailableGraphDatabase configuration parameters
26.1. MBeans exposed by the Neo4j Kernel
26.2. MBean Memory Mapping
26.3. MBean Locking
26.4. MBean Transactions
26.5. MBean Cache
26.6. MBean Configuration
26.7. MBean Primitive count
26.8. MBean XA Resources
26.9. MBean Store file sizes
26.10. MBean Kernel
26.11. MBean High Availability
30.1. Result
-------------------------------------------------------------------------------
Preface
-------------------------------------------------------------------------------
This is the reference manual for Neo4j version 1.8-SNAPSHOT, written by the
Neo4j Team.
The main parts of the manual are:
* Part I, “Introduction” — introducing graph database concepts and Neo4j.
* Part II, “Tutorials” — learn how to use Neo4j.
* Part III, “Reference” — detailed information on Neo4j.
* Part IV, “Operations” — how to install and maintain Neo4j.
* Part V, “Tools” — guides on tools.
* Part VI, “Community” — getting help from, contributing to.
* Appendix A, Manpages — command line documentation.
* Appendix B, Questions & Answers — common questions.
The material is practical, technical, and focused on answering specific
questions. It addresses how things work, what to do and what to avoid to
successfully run Neo4j in a production environment.
The goal is to be thumb-through and rule-of-thumb friendly.
Each section should stand on its own, so you can hop right to whatever
interests you. When possible, the sections distill "rules of thumb" which you
can keep in mind whenever you wander out of the house without this manual in
your back pocket.
The included code examples are executed when Neo4j is built and tested. Also,
the REST API request and response examples are captured from real interaction
with a Neo4j server. Thus, the examples are always in sync with Neo4j.
Who should read this?
The topics should be relevant to architects, administrators, developers and
operations personnel.
Part I. Introduction
This part gives a bird’s eye view of what a graph database is, and then
outlines some specifics of Neo4j.
Table of Contents
1. Neo4j Highlights
2. Graph Database Concepts
2.1. What is a Graph Database?
2.1.1. A Graph contains Nodes and Relationships
2.1.2. Relationships organize the Graph
2.1.3. Query a Graph with a Traversal
2.1.4. Indexes look-up Nodes or Relationships
2.1.5. Neo4j is a Graph Database
2.2. Comparing Database Models
2.2.1. A Graph Database transforms a RDBMS
2.2.2. A Graph Database elaborates a Key-Value Store
2.2.3. A Graph Database relates Column-Family
2.2.4. A Graph Database navigates a Document Store
3. The Neo4j Graph Database
3.1. Nodes
3.2. Relationships
3.3. Properties
3.4. Paths
3.5. Traversal
Chapter 1. Neo4j Highlights
As a robust, scalable and high-performance database, Neo4j is suitable for full
enterprise deployment or a subset of the full server can be used in lightweight
projects.
It features:
* true ACID transactions
* high availability
* scales to billions of nodes and relationships
* high speed querying through traversals
Proper ACID behavior is the foundation of data reliability. Neo4j enforces that
all mutating operations occur within a transaction, guaranteeing consistent
data. This robustness extends from single instance embedded graphs to
multi-server high availability installations. For details, see Chapter 13,
Transaction Management.
Reliable graph storage can easily be added to any application. A property graph
can scale in size and complexity as the application evolves, with little impact
on performance. Whether starting new development, or augmenting existing
functionality, Neo4j is only limited by physical hardware.
A single server instance can handle a graph of billions of nodes and
relationships. When data throughput is insufficient, the graph database can be
distributed among multiple servers in a high availability configuration. See
Chapter 23, High Availability to learn more.
The graph database storage shines when storing richly-connected data. Querying
is performed through traversals, which can perform millions of traversal steps
per second. A traversal step resembles a join in a RDBMS.
Chapter 2. Graph Database Concepts
This chapter contains an introduction to the graph data model and also compares
it to other data models used when persisting data.
2.1. What is a Graph Database?
A graph database stores data in a graph, the most generic of data structures,
capable of elegantly representing any kind of data in a highly accessible way.
Let’s follow along some graphs, using them to express graph concepts. We’ll
“read” a graph by following arrows around the diagram to form sentences.
2.1.1. A Graph contains Nodes and Relationships
“A Graph —records data in→ Nodes —which have→ Properties”
The simplest possible graph is a single Node, a record that has named values
referred to as Properties. A Node could start with a single Property and grow
to a few million, though that can get a little awkward. At some point it makes
sense to distribute the data into multiple nodes, organized with explicit
Relationships.
graphdb-GVE.svg
2.1.2. Relationships organize the Graph
“Nodes —are organized by→ Relationships —which also have→ Properties”
Relationships organize Nodes into arbitrary structures, allowing a Graph to
resemble a List, a Tree, a Map, or a compound Entity – any of which can be
combined into yet more complex, richly inter-connected structures.
2.1.3. Query a Graph with a Traversal
“A Traversal —navigates→ a Graph; it —identifies→ Paths —which order→
Nodes”
A Traversal is how you query a Graph, navigating from starting Nodes to related
Nodes according to an algorithm, finding answers to questions like “what music
do my friends like that I don’t yet own,” or “if this power supply goes down,
what web services are affected?”
graphdb-traversal.svg
2.1.4. Indexes look-up Nodes or Relationships
“An Index —maps from→ Properties —to either→ Nodes or Relationships”
Often, you want to find a specific Node or Relationship according to a Property
it has. Rather than traversing the entire graph, use an Index to perform a
look-up, for questions like “find the Account for username master-of-graphs.”
graphdb-indexes.svg
2.1.5. Neo4j is a Graph Database
“A Graph Database —manages a→ Graph and —also manages related→ Indexes”
Neo4j is a commercially supported open-source graph database. It was designed
and built from the ground-up to be a reliable database, optimized for graph
structures instead of tables. Working with Neo4j, your application gets all the
expressiveness of a graph, with all the dependability you expect out of a
database.
graphdb-overview.svg
2.2. Comparing Database Models
A Graph Database stores data structured in the Nodes and Relationships of a
graph. How does this compare to other persistence models? Because a graph is a
generic structure, let’s compare how a few models would look in a graph.
2.2.1. A Graph Database transforms a RDBMS
Topple the stacks of records in a relational database while keeping all the
relationships, and you’ll see a graph. Where an RDBMS is optimized for
aggregated data, Neo4j is optimized for highly connected data.
Figure 2.1. RDBMS
graphdb-compare-rdbms.svg
Figure 2.2. Graph Database as RDBMS
graphdb-compare-rdbms-g.svg
2.2.2. A Graph Database elaborates a Key-Value Store
A Key-Value model is great for lookups of simple values or lists. When the
values are themselves interconnected, you’ve got a graph. Neo4j lets you
elaborate the simple data structures into more complex, interconnected data.
Figure 2.3. Key-Value Store
graphdb-compare-kvstore.svg
K* represents a key, V* a value. Note that some keys point to other keys as
well as plain values.
Figure 2.4. Graph Database as Key-Value Store
graphdb-compare-kvstore-g.svg
2.2.3. A Graph Database relates Column-Family
Column Family (BigTable-style) databases are an evolution of key-value, using
"families" to allow grouping of rows. Stored in a graph, the families could
become hierarchical, and the relationships among data becomes explicit.
2.2.4. A Graph Database navigates a Document Store
The container hierarchy of a document database accommodates nice, schema-free
data that can easily be represented as a tree. Which is of course a graph.
Refer to other documents (or document elements) within that tree and you have a
more expressive representation of the same data. When in Neo4j, those
relationships are easily navigable.
Figure 2.5. Document Store
graphdb-compare-docdb.svg
D=Document, S=Subdocument, V=Value, D2/S2 = reference to subdocument in (other)
document.
Figure 2.6. Graph Database as Document Store
graphdb-compare-docdb-g.svg
Chapter 3. The Neo4j Graph Database
This chapter goes into more detail on the data model and behavior of Neo4j.
3.1. Nodes
The fundamental units that form a graph are nodes and relationships. In Neo4j,
both nodes and relationships can contain properties.
Nodes are often used to represent entities, but depending on the domain
relationships may be used for that purpose as well.
graphdb-nodes-overview.svg
Let’s start out with a really simple graph, containing only a single node with
one property:
graphdb-nodes.svg
3.2. Relationships
Relationships between nodes are a key part of a graph database. They allow for
finding related data. Just like nodes, relationships can have properties.
graphdb-rels-overview.svg
A relationship connects two nodes, and is guaranteed to have valid start and
end nodes.
graphdb-rels.svg
As relationships are always directed, they can be viewed as outgoing or
incoming relative to a node, which is useful when traversing the graph:
graphdb-rels-dir.svg
Relationships are equally well traversed in either direction. This means that
there is no need to add duplicate relationships in the opposite direction (with
regard to traversal or performance).
While relationships always have a direction, you can ignore the direction where
it is not useful in your application.
Note that a node can have relationships to itself as well:
graphdb-rels-loop.svg
To further enhance graph traversal all relationships have a relationship type.
Note that the word type might be misleading here, you could rather think of it
as a label. The following example shows a simple social network with two
relationship types.
graphdb-rels-twitter.svg
Table 3.1. Using relationship direction and type
+------------------------------------------------------------------------+
|What |How |
|------------------------------+-----------------------------------------|
|get who a person follows |outgoing follows relationships, depth one|
|------------------------------+-----------------------------------------|
|get the followers of a person |incoming follows relationships, depth one|
|------------------------------+-----------------------------------------|
|get who a person blocks |outgoing blocks relationships, depth one |
|------------------------------+-----------------------------------------|
|get who a person is blocked by|incoming blocks relationships, depth one |
+------------------------------------------------------------------------+
This example is a simple model of a file system, which includes symbolic links:
graphdb-rels-filesys.svg
Depending on what you are looking for, you will use the direction and type of
relationships during traversal.
+-----------------------------------------------------------------------------+
|What |How |
|-------------------------------------+---------------------------------------|
|get the full path of a file |incoming file relationships |
|-------------------------------------+---------------------------------------|
|get all paths for a file |incoming file and symbolic link |
| |relationships |
|-------------------------------------+---------------------------------------|
|get all files in a directory |outgoing file and symbolic link |
| |relationships, depth one |
|-------------------------------------+---------------------------------------|
|get all files in a directory, |outgoing file relationships, depth one |
|excluding symbolic links | |
|-------------------------------------+---------------------------------------|
|get all files in a directory, |outgoing file and symbolic link |
|recursively |relationships |
+-----------------------------------------------------------------------------+
3.3. Properties
Both nodes and relationships can have properties.
Properties are key-value pairs where the key is a string. Property values can
be either a primitive or an array of one primitive type. For example String,
int and int[] values are valid for properties.
Note
null is not a valid property value. Nulls can instead be modeled by the absence
of a key.
graphdb-properties.svg
Table 3.2. Property value types
+-----------------------------------------------------------------------------+
|Type |Description |Value range |
|-------+-----------------------------------+---------------------------------|
|boolean| |true/false |
|-------+-----------------------------------+---------------------------------|
|byte |8-bit integer |-128 to 127, inclusive |
|-------+-----------------------------------+---------------------------------|
|short |16-bit integer |-32768 to 32767, inclusive |
|-------+-----------------------------------+---------------------------------|
|int |32-bit integer |-2147483648 to 2147483647, |
| | |inclusive |
|-------+-----------------------------------+---------------------------------|
|long |64-bit integer |-9223372036854775808 to |
| | |9223372036854775807, inclusive |
|-------+-----------------------------------+---------------------------------|
|float |32-bit IEEE 754 floating-point | |
| |number | |
|-------+-----------------------------------+---------------------------------|
|double |64-bit IEEE 754 floating-point | |
| |number | |
|-------+-----------------------------------+---------------------------------|
|char |16-bit unsigned integers |u0000 to uffff (0 to 65535) |
| |representing Unicode characters | |
|-------+-----------------------------------+---------------------------------|
|String |sequence of Unicode characters | |
+-----------------------------------------------------------------------------+
For further details on float/double values, see Java Language Specification
.
3.4. Paths
A path is one or more nodes with connecting relationships, typically retrieved
as a query or traversal result.
graphdb-path.svg
The shortest possible path has length zero and looks like this:
graphdb-path-example1.svg
A path of length one:
graphdb-path-example2.svg
3.5. Traversal
Traversing a graph means visiting its nodes, following relationships according
to some rules. In most cases only a subgraph is visited, as you already know
where in the graph the interesting nodes and relationships are found.
Neo4j comes with a callback based traversal API which lets you specify the
traversal rules. At a basic level there’s a choice between traversing breadth-
or depth-first.
For an in-depth introduction to the traversal framework, see Chapter 8, The
Traversal Framework. For Java code examples see Section 4.5, “Traversal”.
Other options to traverse or query graphs in Neo4j are Cypher and Gremlin.
Part II. Tutorials
The tutorial part describes how to set up your environment, and write programs
using Neo4j. It takes you from Hello World to advanced usage of graphs.
Table of Contents
4. Using Neo4j embedded in Java applications
4.1. Include Neo4j in your project
4.1.1. Add Neo4j to the build path
4.1.2. Add Neo4j as a dependency
4.1.3. Starting and stopping
4.2. Hello World
4.2.1. Prepare the database
4.2.2. Wrap mutating operations in a transaction
4.2.3. Create a small graph
4.2.4. Print the result
4.2.5. Remove the data
4.2.6. Shut down the database server
4.3. User database with index
4.4. Basic unit testing
4.5. Traversal
4.5.1. The Matrix
4.5.2. New traversal framework
4.5.3. Uniqueness of Paths in traversals
4.5.4. Social network
4.6. Domain entities
4.7. Graph Algorithm examples
4.8. Reading a management attribute
4.9. OSGi setup
4.9.1. Simple OSGi Activator scenario
4.10. Execute Cypher Queries from Java
5. Cypher Cookbook
5.1. Hyperedges and Cypher
5.1.1. Find Groups
5.1.2. Find all groups and roles for a user
5.1.3. Find common groups based on shared roles
5.2. Basic Friend finding based on social neighborhood
5.2.1. Simple Friend Finder
5.3. Co-favorited places
5.3.1. Co-Favorited Places - Users Who Like x Also Like y
5.3.2. Co-Tagged Places - Places Related through Tags
5.4. Find people based on similar favorites
5.4.1. Find people based on similar favorites
5.5. Find people based on mutual friends and groups
5.5.1. Find mutual friends and groups
5.6. Find friends based on similar tagging
5.6.1. Find people based on similar tagged favorties
5.7. Multirelational (social) graphs
5.7.1. Who FOLLOWS or LOVES me back
5.8. A multilevel indexing structure (path tree)
5.8.1. Return zero range
5.8.2. Return the full range
5.8.3. Return partly shared path ranges
6. Using the Neo4j REST API
6.1. How to use the REST API from Java
6.1.1. Creating a graph through the REST API from Java
6.1.2. Start the server
6.1.3. Creating a node
6.1.4. Adding properties
6.1.5. Adding relationships
6.1.6. Add properties to a relationship
6.1.7. Querying graphs
6.1.8. Phew, is that it?
6.1.9. What’s next?
6.1.10. Appendix: the code
7. Extending the Neo4j Server
7.1. Server Plugins
7.2. Unmanaged Extensions
8. The Traversal Framework
8.1. Main concepts
8.2. Traversal Framework Java API
8.2.1. TraversalDescription
8.2.2. Evaluator
8.2.3. Traverser
8.2.4. Uniqueness
8.2.5. Order - How to move through branches?
8.2.6. BranchSelector
8.2.7. Path
8.2.8. RelationshipExpander
8.2.9. Expander
8.2.10. How to use the Traversal framework
9. Domain Modeling Gallery
9.1. User roles in graphs
9.1.1. Get the admins
9.1.2. Get the group memberships of a user
9.1.3. Get all groups
9.1.4. Get all members of all groups
9.2. ACL structures in graphs
9.2.1. Generic approach
9.2.2. Read-permission example
10. Languages
11. Using Neo4j embedded in Python applications
11.1. Hello, world!
11.2. A sample app using traversals and indexes
11.2.1. Domain logic
11.2.2. Creating data and getting it back
Chapter 4. Using Neo4j embedded in Java applications
It’s easy to use Neo4j embedded in Java applications. In this chapter you will
find everything needed — from setting up the environment to doing something
useful with your data.
4.1. Include Neo4j in your project
After selecting the appropriate edition for your platform, embed Neo4j in your
Java application by including the Neo4j library jars in your build. The
following sections will show how to do this by either altering the build path
directly or by using dependency management.
4.1.1. Add Neo4j to the build path
Get the Neo4j libraries from one of these sources:
* Extract a Neo4j download zip/tarball, and use
the jar files found in the lib/ directory.
* Use the jar files available from Maven Central Repository
Add the jar files to your project:
JDK tools
Append to -classpath
Eclipse
* Right-click on the project and then go Build Path → Configure Build
Path. In the dialog, choose Add External JARs, browse to the Neo4j lib/
directory and select all of the jar files.
* Another option is to use User Libraries .
IntelliJ IDEA
See Libraries, Global Libraries, and the Configure Library dialog
NetBeans
* Right-click on the Libraries node of the project, choose Add JAR/Folder
, browse to the Neo4j lib/ directory and select all of the jar files.
* You can also handle libraries from the project node, see Managing a
Project’s Classpath .
4.1.2. Add Neo4j as a dependency
For an overview of the main Neo4j artifacts, see Table 21.2, “Neo4j editions”.
The artifacts listed there are top-level artifacts that will transitively
include the actual Neo4j implementation. You can either go with the top-level
artifact or include the individual components directly. The examples included
here use the top-level artifact approach.
4.1.2.1. Maven
Maven dependency.
...
org.neo4j
neo4j
${neo4j-version}
...
...
Where ${neo4j-version} is the desired version and the artifactId is found in
Table 21.2, “Neo4j editions”.
4.1.2.2. Eclipse and Maven
For development in Eclipse , it is recommended to
install the M2Eclipse plugin and let Maven manage
the project build classpath instead, see above. This also adds the possibility
to build your project both via the command line with Maven and have a working
Eclipse setup for development.
4.1.2.3. Ivy
Make sure to resolve dependencies from Maven Central, for example using this
configuration in your ivysettings.xml file:
With that in place you can add Neo4j to the mix by having something along these
lines to your ivy.xml file:
..
..
..
..
Where ${neo4j-version} is the desired version and the name is found in
Table 21.2, “Neo4j editions”.
4.1.2.4. Gradle
The example below shows an example gradle build script for including the Neo4j
libraries.
def neo4jVersion = "[set-version-here]"
apply plugin: 'java'
repositories {
mavenCentral()
}
dependencies {
compile "org.neo4j:neo4j:${neo4jVersion}"
}
Where neo4jVersion is the desired version and the name ("neo4j" in the example)
is found in Table 21.2, “Neo4j editions”.
4.1.3. Starting and stopping
To create a new database or ópen an existing one you instantiate an
EmbeddedGraphDatabase .
graphDb = new GraphDatabaseFactory().newEmbeddedDatabase( DB_PATH );
registerShutdownHook( graphDb );
Note
The EmbeddedGraphDatabase instance can be shared among multiple threads. Note
however that you can’t create multiple instances pointing to the same database.
To stop the database, call the shutdown() method:
graphDb.shutdown();
To make sure Neo4j is shut down properly you can add a shutdown hook:
private static void registerShutdownHook( final GraphDatabaseService graphDb )
{
// Registers a shutdown hook for the Neo4j instance so that it
// shuts down nicely when the VM exits (even if you "Ctrl-C" the
// running example before it's completed)
Runtime.getRuntime().addShutdownHook( new Thread()
{
@Override
public void run()
{
graphDb.shutdown();
}
} );
}
If you want a read-only view of the database, use EmbeddedReadOnlyGraphDatabase
.
To start Neo4j with configuration settings, a Neo4j properties file can be
loaded like this:
GraphDatabaseService graphDb = new GraphDatabaseFactory().
newEmbeddedDatabaseBuilder( "target/database/location" ).
loadPropertiesFromFile( pathToConfig + "neo4j.properties" ).
newGraphDatabase();
Or you could of course create you own Map programatically and
use that instead.
For configuration settings, see Chapter 22, Configuration & Performance.
4.2. Hello World
Learn how to create and access nodes and relationships. For information on
project setup, see Section 4.1, “Include Neo4j in your project”.
Remember, from Section 2.1, “What is a Graph Database?”, that a Neo4j graph
consist of:
* Nodes that are connected by
* Relationships, with
* Properties on both nodes and relationships.
All relationships have a type. For example, if the graph represents a social
network, a relationship type could be KNOWS. If a relationship of the type
KNOWS connects two nodes, that probably represents two people that know each
other. A lot of the semantics (that is the meaning) of a graph is encoded in
the relationship types of the application. And although relationships are
directed they are equally well traversed regardless of which direction they are
traversed.
Tip
The source code of this example is found here: EmbeddedNeo4j.java
4.2.1. Prepare the database
Relationship types can be created by using an enum. In this example we only
need a single relationship type. This is how to define it:
private static enum RelTypes implements RelationshipType
{
KNOWS
}
We also prepare some variables to use:
GraphDatabaseService graphDb;
Node firstNode;
Node secondNode;
Relationship relationship;
The next step is to start the database server. Note that if the directory given
for the database doesn’t already exist, it will be created.
graphDb = new GraphDatabaseFactory().newEmbeddedDatabase( DB_PATH );
registerShutdownHook( graphDb );
Note that starting a database server is an expensive operation, so don’t start
up a new instance every time you need to interact with the database! The
instance can be shared by multiple threads. Transactions are thread confined.
As seen, we register a shutdown hook that will make sure the database shuts
down when the JVM exits. Now it’s time to interact with the database.
4.2.2. Wrap mutating operations in a transaction
All mutating transactions have to be performed in a transaction. This is a
conscious design decision, since we believe transaction demarcation to be an
important part of working with a real enterprise database. Now, transaction
handling in Neo4j is very easy:
Transaction tx = graphDb.beginTx();
try
{
// Mutating operations go here
tx.success();
}
finally
{
tx.finish();
}
For more information on transactions, see Chapter 13, Transaction Management
and Java API for Transaction .
4.2.3. Create a small graph
Now, let’s create a few nodes. The API is very intuitive. Feel free to have a
look at the JavaDocs at http://components.neo4j.org/neo4j/1.8-SNAPSHOT/apidocs/
. They’re included in
the distribution, as well. Here’s how to create a small graph consisting of two
nodes, connected with one relationship and some properties:
firstNode = graphDb.createNode();
firstNode.setProperty( "message", "Hello, " );
secondNode = graphDb.createNode();
secondNode.setProperty( "message", "World!" );
relationship = firstNode.createRelationshipTo( secondNode, RelTypes.KNOWS );
relationship.setProperty( "message", "brave Neo4j " );
We now have a graph that looks like this:
Figure 4.1. Hello World Graph
Hello-World-Graph-java.svg
4.2.4. Print the result
After we’ve created our graph, let’s read from it and print the result.
System.out.print( firstNode.getProperty( "message" ) );
System.out.print( relationship.getProperty( "message" ) );
System.out.print( secondNode.getProperty( "message" ) );
Which will output:
Hello, brave Neo4j World!
4.2.5. Remove the data
In this case we’ll remove the data before committing:
// let's remove the data
firstNode.getSingleRelationship( RelTypes.KNOWS, Direction.OUTGOING ).delete();
firstNode.delete();
secondNode.delete();
Note that deleting a node which still has relationships when the transaction
commits will fail. This is to make sure relationships always have a start node
and an end node.
4.2.6. Shut down the database server
Finally, shut down the database server when the application finishes:
graphDb.shutdown();
4.3. User database with index
You have a user database, and want to retrieve users by name. To begin with,
this is the structure of the database we want to create:
Figure 4.2. Node space view of users
users.png
That is, the reference node is connected to a users-reference node to which all
users are connected.
Tip
The source code used in this example is found here:
EmbeddedNeo4jWithIndexing.java
To begin with, we define the relationship types we want to use:
private static enum RelTypes implements RelationshipType
{
USERS_REFERENCE,
USER
}
Then we have created two helper methods to handle user names and adding users
to the database:
private static String idToUserName( final int id )
{
return "user" + id + "@neo4j.org";
}
private static Node createAndIndexUser( final String username )
{
Node node = graphDb.createNode();
node.setProperty( USERNAME_KEY, username );
nodeIndex.add( node, USERNAME_KEY, username );
return node;
}
The next step is to start the database server:
graphDb = new GraphDatabaseFactory().newEmbeddedDatabase( DB_PATH );
nodeIndex = graphDb.index().forNodes( "nodes" );
registerShutdownHook();
It’s time to add the users:
Transaction tx = graphDb.beginTx();
try
{
// Create users sub reference node
Node usersReferenceNode = graphDb.createNode();
graphDb.getReferenceNode().createRelationshipTo(
usersReferenceNode, RelTypes.USERS_REFERENCE );
// Create some users and index their names with the IndexService
for ( int id = 0; id < 100; id++ )
{
Node userNode = createAndIndexUser( idToUserName( id ) );
usersReferenceNode.createRelationshipTo( userNode,
RelTypes.USER );
}
And here’s how to find a user by Id:
int idToFind = 45;
Node foundUser = nodeIndex.get( USERNAME_KEY,
idToUserName( idToFind ) ).getSingle();
System.out.println( "The username of user " + idToFind + " is "
+ foundUser.getProperty( USERNAME_KEY ) );
4.4. Basic unit testing
The basic pattern of unit testing with Neo4j is illustrated by the following
example.
To access the Neo4j testing facilities you should have the neo4j-kernel
tests.jar on the classpath during tests. You can download it from Maven
Central: org.neo4j:neo4j-kernel .
Using Maven as a dependency manager you would typically add this dependency as:
Maven dependency.
...
org.neo4j
neo4j-kernel
${neo4j-version}
test-jar
test
...
...
Where ${neo4j-version} is the desired version of Neo4j.
With that in place, we’re ready to code our tests.
Tip
For the full source code of this example see: Neo4jBasicTest.java
Before each test, create a fresh database:
@Before
public void prepareTestDatabase()
{
graphDb = new TestGraphDatabaseFactory().newImpermanentDatabaseBuilder().newGraphDatabase();
}
After the test has executed, the database should be shut down:
@After
public void destroyTestDatabase()
{
graphDb.shutdown();
}
During a test, create nodes and check to see that they are there, while
enclosing write operations in a transaction.
Transaction tx = graphDb.beginTx();
Node n = null;
try
{
n = graphDb.createNode();
n.setProperty( "name", "Nancy" );
tx.success();
}
catch ( Exception e )
{
tx.failure();
}
finally
{
tx.finish();
}
// The node should have an id greater than 0, which is the id of the
// reference node.
assertThat( n.getId(), is( greaterThan( 0l ) ) );
// Retrieve a node by using the id of the created node. The id's and
// property should match.
Node foundNode = graphDb.getNodeById( n.getId() );
assertThat( foundNode.getId(), is( n.getId() ) );
assertThat( (String) foundNode.getProperty( "name" ), is( "Nancy" ) );
If you want to set configuration parameters at database creation, it’s done
like this:
Map config = new HashMap();
config.put( "neostore.nodestore.db.mapped_memory", "10M" );
config.put( "string_block_size", "60" );
config.put( "array_block_size", "300" );
GraphDatabaseService db = new ImpermanentGraphDatabase( config );
4.5. Traversal
For reading about traversals, see Chapter 8, The Traversal Framework.
For more examples of traversals, see Chapter 9, Domain Modeling Gallery.
4.5.1. The Matrix
This is the first node space we want to traverse into:
Figure 4.3. Matrix node space view
examples-matrix.png
Tip
The source code of the examples is found here: Matrix.java
Friends and friends of friends.
private static Traverser getFriends( final Node person )
{
return person.traverse( Order.BREADTH_FIRST,
StopEvaluator.END_OF_GRAPH,
ReturnableEvaluator.ALL_BUT_START_NODE, RelTypes.KNOWS,
Direction.OUTGOING );
}
Let’s perform the actual traversal and print the results:
int numberOfFriends = 0;
String output = neoNode.getProperty( "name" ) + "'s friends:\n";
Traverser friendsTraverser = getFriends( neoNode );
for ( Node friendNode : friendsTraverser )
{
output += "At depth " +
friendsTraverser.currentPosition().depth() +
" => " +
friendNode.getProperty( "name" ) + "\n";
numberOfFriends++;
}
output += "Number of friends found: " + numberOfFriends + "\n";
Which will give us the following output:
Thomas Anderson's friends:
At depth 1 => Trinity
At depth 1 => Morpheus
At depth 2 => Cypher
At depth 3 => Agent Smith
Number of friends found: 4
Who coded the Matrix?
private static Traverser findHackers( final Node startNode )
{
return startNode.traverse( Order.BREADTH_FIRST,
StopEvaluator.END_OF_GRAPH, new ReturnableEvaluator()
{
@Override
public boolean isReturnableNode(
final TraversalPosition currentPos )
{
return !currentPos.isStartNode()
&& currentPos.lastRelationshipTraversed()
.isType( RelTypes.CODED_BY );
}
}, RelTypes.CODED_BY, Direction.OUTGOING, RelTypes.KNOWS,
Direction.OUTGOING );
}
Print out the result:
String output = "Hackers:\n";
int numberOfHackers = 0;
Traverser traverser = findHackers( getNeoNode() );
for ( Node hackerNode : traverser )
{
output += "At depth " +
traverser.currentPosition().depth() +
" => " +
hackerNode.getProperty( "name" ) + "\n";
numberOfHackers++;
}
output += "Number of hackers found: " + numberOfHackers + "\n";
Now we know who coded the Matrix:
Hackers:
At depth 4 => The Architect
Number of hackers found: 1
4.5.2. New traversal framework
Note
The following examples use a new experimental traversal API. It shares the
underlying implementation with the old traversal API, so performance-wise they
should be equal. However, expect the new API to evolve and thus undergo
changes.
4.5.2.1. The Matrix
The traversals from the Matrix example above, this time using the new traversal
API:
Tip
The source code of the examples is found here: NewMatrix.java
Friends and friends of friends.
private static Traverser getFriends(
final Node person )
{
TraversalDescription td = Traversal.description()
.breadthFirst()
.relationships( RelTypes.KNOWS, Direction.OUTGOING )
.evaluator( Evaluators.excludeStartPosition() );
return td.traverse( person );
}
Let’s perform the actual traversal and print the results:
int numberOfFriends = 0;
String output = neoNode.getProperty( "name" ) + "'s friends:\n";
Traverser friendsTraverser = getFriends( neoNode );
for ( Path friendPath : friendsTraverser )
{
output += "At depth " + friendPath.length() + " => "
+ friendPath.endNode()
.getProperty( "name" ) + "\n";
numberOfFriends++;
}
output += "Number of friends found: " + numberOfFriends + "\n";
Which will give us the following output:
Thomas Anderson's friends:
At depth 1 => Trinity
At depth 1 => Morpheus
At depth 2 => Cypher
At depth 3 => Agent Smith
Number of friends found: 4
Who coded the Matrix?
private static Traverser findHackers( final Node startNode )
{
TraversalDescription td = Traversal.description()
.breadthFirst()
.relationships( RelTypes.CODED_BY, Direction.OUTGOING )
.relationships( RelTypes.KNOWS, Direction.OUTGOING )
.evaluator(
Evaluators.returnWhereLastRelationshipTypeIs( RelTypes.CODED_BY ) );
return td.traverse( startNode );
}
Print out the result:
String output = "Hackers:\n";
int numberOfHackers = 0;
Traverser traverser = findHackers( getNeoNode() );
for ( Path hackerPath : traverser )
{
output += "At depth " + hackerPath.length() + " => "
+ hackerPath.endNode()
.getProperty( "name" ) + "\n";
numberOfHackers++;
}
output += "Number of hackers found: " + numberOfHackers + "\n";
Now we know who coded the Matrix:
Hackers:
At depth 4 => The Architect
Number of hackers found: 1
4.5.2.2. Walking an ordered path
This example shows how to use a path context holding a representation of a
path.
Tip
The source code of this example is found here: OrderedPath.java
Create a toy graph.
Node A = db.createNode();
Node B = db.createNode();
Node C = db.createNode();
Node D = db.createNode();
A.createRelationshipTo( B, REL1 );
B.createRelationshipTo( C, REL2 );
C.createRelationshipTo( D, REL3 );
A.createRelationshipTo( C, REL2 );
example-ordered-path.svg
Now, the order of relationships (REL1 → REL2 → REL3) is stored in an ArrayList.
Upon traversal, the Evaluator can check against it to ensure that only paths
are included and returned that have the predefined order of relationships:
Define how to walk the path.
final ArrayList orderedPathContext = new ArrayList();
orderedPathContext.add( REL1 );
orderedPathContext.add( withName( "REL2" ) );
orderedPathContext.add( withName( "REL3" ) );
TraversalDescription td = Traversal.description()
.evaluator( new Evaluator()
{
@Override
public Evaluation evaluate( final Path path )
{
if ( path.length() == 0 )
{
return Evaluation.EXCLUDE_AND_CONTINUE;
}
RelationshipType expectedType = orderedPathContext.get( path.length() - 1 );
boolean isExpectedType = path.lastRelationship()
.isType( expectedType );
boolean included = path.length() == orderedPathContext.size()
&& isExpectedType;
boolean continued = path.length() < orderedPathContext.size()
&& isExpectedType;
return Evaluation.of( included, continued );
}
} );
Perform the traversal and print the result.
Traverser traverser = td.traverse( A );
PathPrinter pathPrinter = new PathPrinter( "name" );
for ( Path path : traverser )
{
output += Traversal.pathToString( path, pathPrinter );
}
Which will output:
(A)--[REL1]-->(B)--[REL2]-->(C)--[REL3]-->(D)
In this case we use a custom class to format the path output. This is how it’s
done:
static class PathPrinter implements Traversal.PathDescriptor
{
private final String nodePropertyKey;
public PathPrinter( String nodePropertyKey )
{
this.nodePropertyKey = nodePropertyKey;
}
@Override
public String nodeRepresentation( Path path, Node node )
{
return "(" + node.getProperty( nodePropertyKey, "" ) + ")";
}
@Override
public String relationshipRepresentation( Path path, Node from,
Relationship relationship )
{
String prefix = "--", suffix = "--";
if ( from.equals( relationship.getEndNode() ) )
{
prefix = "<--";
}
else
{
suffix = "-->";
}
return prefix + "[" + relationship.getType().name() + "]" + suffix;
}
}
For options regarding output of a Path, see the Traversal class.
4.5.3. Uniqueness of Paths in traversals
This example is demonstrating the use of node uniqueness. Below an imaginary
domain graph with Principals that own pets that are descendant to other pets.
Figure 4.4. Descendants Example Graph
Descendants-Example-Graph-Uniqueness-of-Paths-in-traversals.svg
In order to return all descendants of Pet0 which have the relation owns to
Principal1 (Pet1 and Pet3), the Uniqueness of the traversal needs to be set to
NODE_PATH rather than the default NODE_GLOBAL so that nodes can be traversed
more that once, and paths that have different nodes but can have some nodes in
common (like the start and end node) can be returned.
final Node target = data.get().get( "Principal1" );
TraversalDescription td = Traversal.description()
.uniqueness( Uniqueness.NODE_PATH )
.evaluator( new Evaluator()
{
@Override
public Evaluation evaluate( Path path )
{
if ( path.endNode().equals( target ) )
{
return Evaluation.INCLUDE_AND_PRUNE;
}
return Evaluation.EXCLUDE_AND_CONTINUE;
}
} );
Traverser results = td.traverse( start );
This will return the following paths:
(3)--[descendant,0]-->(1)<--[owns,3]--(5)
(3)--[descendant,2]-->(4)<--[owns,5]--(5)
In the default path.toString() implementation, (1)--[knows,2]-->(4) denotes a
node with ID=1 having a relationship with ID 2 or type knows to a node with
ID-4.
Let’s create a new TraversalDescription from the old one, having NODE_GLOBAL
uniqueness to see the difference.
Tip
The TraversalDescription object is immutable, so we have to use the new
instance returned with the new uniqueness setting.
TraversalDescription nodeGlobalTd = td.uniqueness( Uniqueness.NODE_GLOBAL );
results = nodeGlobalTd.traverse( start );
Now only one path is returned:
(3)--[descendant,0]-->(1)<--[owns,3]--(5)
4.5.4. Social network
Note
The following example uses the new experimental traversal API.
Social networks (know as social graphs out on the web) are natural to model
with a graph. This example shows a very simple social model that connects
friends and keeps track of status updates.
Tip
The source code of the example is found here: socnet
4.5.4.1. Simple social model
Figure 4.5. Social network data model
socnet-model.png
The data model for a social network is pretty simple: Persons with names and
StatusUpdates with timestamped text. These entities are then connected by
specific relationships.
* Person
o friend: relates two distinct Person instances (no self-reference)
o status: connects to the most recent StatusUpdate
* StatusUpdate
o next: points to the next StatusUpdate in the chain, which was posted
before the current one
4.5.4.2. Status graph instance
The StatusUpdate list for a Person is a linked list. The head of the list (the
most recent status) is found by following status. Each subsequent StatusUpdate
is connected by next.
Here’s an example where Andreas Kollegger micro-blogged his way to work in the
morning:
andreas-status-updates.svg
To read the status updates, we can create a traversal, like so:
TraversalDescription traversal = Traversal.description().
depthFirst().
relationships( NEXT ).
filter( Traversal.returnAll() );
This gives us a traverser that will start at one StatusUpdate, and will follow
the chain of updates until they run out. Traversers are lazy loading, so it’s
performant even when dealing with thousands of statuses - they are not loaded
until we actually consume them.
4.5.4.3. Activity stream
Once we have friends, and they have status messages, we might want to read our
friends status' messages, in reverse time order - latest first. To do this, we
go through these steps:
1. Gather all friend’s status update iterators in a list - latest date first.
2. Sort the list.
3. Return the first item in the list.
4. If the first iterator is exhausted, remove it from the list. Otherwise, get
the next item in that iterator.
5. Go to step 2 until there are no iterators left in the list.
Animated, the sequence looks like this .
The code looks like:
PositionedIterator first = statuses.get(0);
StatusUpdate returnVal = first.current();
if ( !first.hasNext() )
{
statuses.remove( 0 );
}
else
{
first.next();
sort();
}
return returnVal;
4.6. Domain entities
This page demonstrates one way to handle domain entities when using Neo4j. The
principle at use is to wrap the entities around a node (the same approach can
be used with relationships as well).
Tip
The source code of the examples is found here: Person.java
First off, store the node and make it accessible inside the package:
private final Node underlyingNode;
Person( Node personNode )
{
this.underlyingNode = personNode;
}
protected Node getUnderlyingNode()
{
return underlyingNode;
}
Delegate attributes to the node:
public String getName()
{
return (String)underlyingNode.getProperty( NAME );
}
Make sure to override these methods:
@Override
public int hashCode()
{
return underlyingNode.hashCode();
}
@Override
public boolean equals( Object o )
{
return o instanceof Person &&
underlyingNode.equals( ( (Person)o ).getUnderlyingNode() );
}
@Override
public String toString()
{
return "Person[" + getName() + "]";
}
4.7. Graph Algorithm examples
Tip
The source code used in the example is found here: PathFindingExamplesTest.java
Calculating the shortest path (least number of relationships) between two
nodes:
Node startNode = graphDb.createNode();
Node middleNode1 = graphDb.createNode();
Node middleNode2 = graphDb.createNode();
Node middleNode3 = graphDb.createNode();
Node endNode = graphDb.createNode();
createRelationshipsBetween( startNode, middleNode1, endNode );
createRelationshipsBetween( startNode, middleNode2, middleNode3, endNode );
// Will find the shortest path between startNode and endNode via
// "MY_TYPE" relationships (in OUTGOING direction), like f.ex:
//
// (startNode)-->(middleNode1)-->(endNode)
//
PathFinder finder = GraphAlgoFactory.shortestPath(
Traversal.expanderForTypes( ExampleTypes.MY_TYPE, Direction.OUTGOING ), 15 );
Iterable paths = finder.findAllPaths( startNode, endNode );
Using Dijkstra’s algorithm to calculate cheapest path between node A and B where
each relationship can have a weight (i.e. cost) and the path(s) with least cost
are found.
PathFinder finder = GraphAlgoFactory.dijkstra(
Traversal.expanderForTypes( ExampleTypes.MY_TYPE, Direction.BOTH ), "cost" );
WeightedPath path = finder.findSinglePath( nodeA, nodeB );
// Get the weight for the found path
path.weight();
Using A* to calculate the
cheapest path between node A and B, where cheapest is for example the path in a
network of roads which has the shortest length between node A and B. Here’s our
example graph:
A* algorithm example graph
Node nodeA = createNode( "name", "A", "x", 0d, "y", 0d );
Node nodeB = createNode( "name", "B", "x", 7d, "y", 0d );
Node nodeC = createNode( "name", "C", "x", 2d, "y", 1d );
Relationship relAB = createRelationship( nodeA, nodeC, "length", 2d );
Relationship relBC = createRelationship( nodeC, nodeB, "length", 3d );
Relationship relAC = createRelationship( nodeA, nodeB, "length", 10d );
EstimateEvaluator estimateEvaluator = new EstimateEvaluator()
{
public Double getCost( final Node node, final Node goal )
{
double dx = (Double) node.getProperty( "x" ) - (Double) goal.getProperty( "x" );
double dy = (Double) node.getProperty( "y" ) - (Double) goal.getProperty( "y" );
double result = Math.sqrt( Math.pow( dx, 2 ) + Math.pow( dy, 2 ) );
return result;
}
};
PathFinder astar = GraphAlgoFactory.aStar(
Traversal.expanderForAllTypes(),
CommonEvaluators.doubleCostEvaluator( "length" ), estimateEvaluator );
WeightedPath path = astar.findSinglePath( nodeA, nodeB );
4.8. Reading a management attribute
The EmbeddedGraphDatabase class includes a
convenience method
to get instances of Neo4j management beans. The common JMX service can be used
as well, but from your code you probably rather want to use the approach
outlined here.
Tip
The source code of the example is found here: JmxTest.java
This example shows how to get the start time of a database:
private static Date getStartTimeFromManagementBean(
GraphDatabaseService graphDbService )
{
GraphDatabaseAPI graphDb = (GraphDatabaseAPI) graphDbService;
Kernel kernel = graphDb.getSingleManagementBean( Kernel.class );
Date startTime = kernel.getKernelStartTime();
return startTime;
}
Depending on which Neo4j edition you are using different sets of management
beans are available.
* For all editions, see the org.neo4j.jmx package.
* For the Advanced and Enterprise editions, see the org.neo4j.management
package as well.
4.9. OSGi setup
In OSGi - related contexts like a number like
Application Servers (e.g. Glassfish ) and Eclipse
- based systems, Neo4j can be setup up explicitly
instead of discovering services via the Java Service Loader mechanism.
4.9.1. Simple OSGi Activator scenario
As seen in the following example, instead of relying on the Classloading of the
Neo4j kernal, the Neo4j bundles are treated as library bundles, and the
services like the IndexProviders and CacheProviders are explicitly
instantiated, configured and registered. Just make the necessary jars available
as wrapped library bundles , so all needed classes are exported and seen by the
bundle containing the Activator.
public class Neo4jActivator implements BundleActivator
{
private static GraphDatabaseService db;
private ServiceRegistration serviceRegistration;
private ServiceRegistration indexServiceRegistration;
@Override
public void start( BundleContext context ) throws Exception
{
//the cache providers
ArrayList cacheList = new ArrayList();
cacheList.add( new SoftCacheProvider() );
//the index providers
IndexProvider lucene = new LuceneIndexProvider();
ArrayList provs = new ArrayList();
provs.add( lucene );
ListIndexIterable providers = new ListIndexIterable();
providers.setIndexProviders( provs );
//the database setup
GraphDatabaseFactory gdbf = new GraphDatabaseFactory();
gdbf.setIndexProviders( providers );
gdbf.setCacheProviders( cacheList );
db = gdbf.newEmbeddedDatabase( "target/db" );
//the OSGi registration
serviceRegistration = context.registerService(
GraphDatabaseService.class.getName(), db, new Hashtable() );
System.out.println( "registered " + serviceRegistration.getReference() );
indexServiceRegistration = context.registerService(
Index.class.getName(), db.index().forNodes( "nodes" ),
new Hashtable() );
Transaction tx = db.beginTx();
try
{
Node firstNode = db.createNode();
Node secondNode = db.createNode();
Relationship relationship = firstNode.createRelationshipTo(
secondNode, DynamicRelationshipType.withName( "KNOWS" ) );
firstNode.setProperty( "message", "Hello, " );
secondNode.setProperty( "message", "world!" );
relationship.setProperty( "message", "brave Neo4j " );
db.index().forNodes( "nodes" ).add( firstNode, "message", "Hello" );
tx.success();
}
catch ( Exception e )
{
e.printStackTrace();
throw new RuntimeException( e );
}
finally
{
tx.finish();
}
}
@Override
public void stop( BundleContext context ) throws Exception
{
serviceRegistration.unregister();
indexServiceRegistration.unregister();
db.shutdown();
}
}
Tip
The source code of the example above is found here .
4.10. Execute Cypher Queries from Java
Tip
The full source code of the example: JavaQuery.java
In Java, you can use the Cypher query language like this:
GraphDatabaseService db = new GraphDatabaseFactory().newEmbeddedDatabase( DB_PATH );
// add some data first
Transaction tx = db.beginTx();
try
{
Node refNode = db.getReferenceNode();
refNode.setProperty( "name", "reference node" );
tx.success();
}
finally
{
tx.finish();
}
// let's execute a query now
ExecutionEngine engine = new ExecutionEngine( db );
ExecutionResult result = engine.execute( "start n=node(0) return n, n.name" );
System.out.println( result );
Which will output:
+----------------------------------------------------+
| n | n.name |
+----------------------------------------------------+
| Node[0]{name->"reference node"} | "reference node" |
+----------------------------------------------------+
1 row
0 ms
Caution
The classes used here are from the org.neo4j.cypher.javacompat package, not
org.neo4j.cypher, see link to the Java API below.
You can get a list of the columns in the result:
List columns = result.columns();
System.out.println( columns );
This outputs:
[n, n.name]
To fetch the result items in a single column, do like this:
Iterator n_column = result.columnAs( "n" );
for ( Node node : IteratorUtil.asIterable( n_column ) )
{
// note: we're grabbing the name property from the node,
// not from the n.name in this case.
nodeResult = node + ": " + node.getProperty( "name" );
System.out.println( nodeResult );
}
In this case there’s only one node in the result:
Node[0]: reference node
To get all columns, do like this instead:
for ( Map row : result )
{
for ( Entry column : row.entrySet() )
{
rows += column.getKey() + ": " + column.getValue() + "; ";
}
rows += "\n";
}
System.out.println( rows );
This outputs:
n.name: reference node; n: Node[0];
For more information on the Java interface to Cypher, see the Java API .
For more information and examples for Cypher, see Chapter 16, Cypher Query
Language and Chapter 5, Cypher Cookbook.
Chapter 5. Cypher Cookbook
The following cookbook aims to provide a few snippets, examples and use-cases
and their query-solutions in Cypher. For the Cypher reference documentation,
see Chapter 16, Cypher Query Language.
5.1. Hyperedges and Cypher
Imagine a user being part of different groups. A group can have different
roles, and a user can be part of different groups. He also can have different
roles in different groups apart from the membership. The association of a User,
a Group and a Role can be referred to as a HyperEdge. However, it can be easily
modeled in a property graph as a node that captures this n-ary relationship, as
depicted below in the U1G2R1 node.
Graph
cypher-hyperedge-graph.svg
5.1.1. Find Groups
To find out in what roles a user is for a particular groups (here Group2), the
following Cypher Query can traverse this HyperEdge node and provide answers.
Query
START n=node:node_auto_index(name = "User1")
MATCH n-[:hasRoleInGroup]->hyperEdge-[:hasGroup]->group, hyperEdge-[:hasRole]->role
WHERE group.name = "Group2"
RETURN role.name
The role of User1:
Table 5.1. Result
+---------+
|role.name|
|---------|
|1 row |
|---------|
|0 ms |
|---------|
|"Role1" |
+---------+
5.1.2. Find all groups and roles for a user
Here, find all groups and the roles a user has, sorted by the roles names.
Query
START n=node:node_auto_index(name = "User1")
MATCH n-[:hasRoleInGroup]->hyperEdge-[:hasGroup]->group, hyperEdge-[:hasRole]->role
RETURN role.name, group.name
ORDER BY role.name asc
The groups and roles of User1
Table 5.2. Result
+---------------------+
|role.name|group.name |
|---------------------|
|2 rows |
|---------------------|
|0 ms |
|---------------------|
|"Role1" |"Group2" |
|---------+-----------|
|"Role2" |"Group1" |
+---------------------+
5.1.3. Find common groups based on shared roles
Assume you have a more complicated graph:
1. 2 user nodes User1, User2
2. User1 is in Group1, Group2, Group3.
3. User1 has Role1, Role2 in Group1; Role2, Role3 in Group2; Role3, Role4 in
Group3 (hyper edges)
4. User2 is in Group1, Group2, Group3
5. User2 has Role2, Role5 in Group1; Role3, Role4 in Group2; Role5, Role6 in
Group3 (hyper edges)
The graph for this looks like the following (nodes like U1G2R23 representing
the HyperEdges):
Graph
cypher-hyperedgecommongroups-graph.svg
To return Group1 and Group2 as User1 and User2 share at least one common role
in those 2 groups, the cypher query looks like:
Query
START u1=node:node_auto_index(name = "User1"),u2=node:node_auto_index(name = "User2")
MATCH u1-[:hasRoleInGroup]->hyperEdge1-[:hasGroup]->group,
hyperEdge1-[:hasRole]->role,
u2-[:hasRoleInGroup]->hyperEdge2-[:hasGroup]->group,
hyperEdge2-[:hasRole]->role
RETURN group.name, count(role)
ORDER BY group.name asc
The groups where User1 and User2 share at least one common role:
Table 5.3. Result
+-----------------------+
|group.name|count(role) |
|-----------------------|
|2 rows |
|-----------------------|
|0 ms |
|-----------------------|
|"Group1" |1 |
|----------+------------|
|"Group2" |1 |
+-----------------------+
5.2. Basic Friend finding based on social neighborhood
Imagine an example graph like
Graph
cypher-collabfiltering-graph.svg
5.2.1. Simple Friend Finder
To find out the friends of Joes friends that are not already his friends,
Cypher looks like:
Query
START joe=node:node_auto_index(name = "Joe")
MATCH joe-[:knows]->friend-[:knows]->friend_of_friend, joe-[r?:knows]->friend_of_friend
WHERE r IS NULL
RETURN friend_of_friend.name, COUNT(*)
ORDER BY COUNT(*) DESC, friend_of_friend.name
The list of Friends-of-friends order by the number of connections to them,
secondly by their name.
Table 5.4. Result
+-------------------------------+
|friend_of_friend.name |COUNT(*)|
|-------------------------------|
|3 rows |
|-------------------------------|
|0 ms |
|-------------------------------|
|"Ian" |2 |
|----------------------+--------|
|"Derrick" |1 |
|----------------------+--------|
|"Jill" |1 |
+-------------------------------+
5.3. Co-favorited places
Graph
cypher-cofavoritedplaces-graph.svg
5.3.1. Co-Favorited Places - Users Who Like x Also Like y
Find places that people also like who favorite this place:
* Determine who has favorited place x.
* What else have they favorited that is not place x.
Query
START place=node:node_auto_index(name = "CoffeeShop1")
MATCH place<-[:favorite]-person-[:favorite]->stuff
RETURN stuff.name, count(*)
ORDER BY count(*) DESC, stuff.name
The list of places that are favorited by people that favorited the start place.
Table 5.5. Result
+----------------------+
|stuff.name |count(*)|
|----------------------|
|3 rows |
|----------------------|
|0 ms |
|----------------------|
|"MelsPlace" |2 |
|-------------+--------|
|"CoffeShop2" |1 |
|-------------+--------|
|"SaunaX" |1 |
+----------------------+
5.3.2. Co-Tagged Places - Places Related through Tags
Find places that are tagged with the same tags:
* Determine the tags for place x.
* What else is tagged the same as x that is not x.
Query
START place=node:node_auto_index(name = "CoffeeShop1")
MATCH place-[:tagged]->tag<-[:tagged]-otherPlace
RETURN otherPlace.name, collect(tag.name)
ORDER By otherPlace.name desc
The list of possible friends ranked by them liking similar stuff that are not
yet friends.
Table 5.6. Result
+---------------------------------+
|otherPlace.name|collect(tag.name)|
|---------------------------------|
|3 rows |
|---------------------------------|
|0 ms |
|---------------------------------|
|"MelsPlace" |["Cool","Cosy"] |
|---------------+-----------------|
|"CoffeeShop3" |["Cosy"] |
|---------------+-----------------|
|"CoffeeShop2" |["Cool"] |
+---------------------------------+
5.4. Find people based on similar favorites
Graph
cypher-peoplesimilarityfavorites-graph.svg
5.4.1. Find people based on similar favorites
To find out the possible new friends based on them liking similar things as the
asking person:
Query
START me=node:node_auto_index(name = "Joe")
MATCH me-[:favorite]->stuff<-[:favorite]-person
WHERE NOT(me-[:friend]-person)
RETURN person.name, count(stuff)
ORDER BY count(stuff) DESC
The list of possible friends ranked by them liking similar stuff that are not
yet friends.
Table 5.7. Result
+-------------------------+
|person.name|count(stuff) |
|-------------------------|
|2 rows |
|-------------------------|
|0 ms |
|-------------------------|
|"Derrick" |2 |
|-----------+-------------|
|"Jill" |1 |
+-------------------------+
5.5. Find people based on mutual friends and groups
Graph
cypher-mutualfriendsandgroups-graph.svg
5.5.1. Find mutual friends and groups
In this scenario, the problem is to determine mutual friends and groups, if
any, between persons. If no mutual groups or friends are found, there should be
a 0 returned.
Query
START me=node(5), other=node(4, 3)
MATCH pGroups=me-[?:member_of_group]->mg<-[?:member_of_group]-other, pMutualFriends=me-[?:knows]->mf<-[?:knows]-other
RETURN other.name as name,
count(distinct pGroups) AS mutualGroups,
count(distinct pMutualFriends) AS mutualFriends ORDER By mutualFriends DESC
The question we are asking is — how many unique paths are there between me and
Jill, the paths being common group memberships, and common friends. If the
paths are mandatory, no results will be returned if me and Bob lack any common
friends, and we don’t want that. To make a path optional, you have to make at
least one of it’s relationships optional. That makes the whole path optional.
Table 5.8. Result
+---------------------------------+
|name |mutualGroups|mutualFriends|
|---------------------------------|
|2 rows |
|---------------------------------|
|0 ms |
|---------------------------------|
|"Jill"|1 |1 |
|------+------------+-------------|
|"Bob" |1 |0 |
+---------------------------------+
5.6. Find friends based on similar tagging
Graph
cypher-peoplesimilaritytags-graph.svg
5.6.1. Find people based on similar tagged favorties
To find out people similar to me based on taggings of their favorited items, an
approach could be: * Determine the tags associated with what I favorite. * What
else is tagged with those tags? * Who favorites items tagged with the same
tags. * Sort the result by how many of the same things these people like.
Query
START me=node(9)
MATCH me-[:favorite]->myFavorites-[:tagged]->tag<-[:tagged]-theirFavorites<-[:favorite]-people
WHERE NOT(me=people)
RETURN people.name as name, count(*) as similar_favs
ORDER BY similar_favs DESC
The list of possible friends ranked by them liking similar stuff that are not
yet friends.
Table 5.9. Result
+-----------------------+
|name |similar_favs |
|-----------------------|
|2 rows |
|-----------------------|
|0 ms |
|-----------------------|
|"Sara" |2 |
|---------+-------------|
|"Derrick"|1 |
+-----------------------+
5.7. Multirelational (social) graphs
Graph
cypher-multirelationalsocialnetwork-graph.svg
5.7.1. Who FOLLOWS or LOVES me back
This example shows a multi-relational network between persons and things they
like. A multi-relational graph is a graph with more than one kind of
relationship between nodes.
Query
START me=node:node_auto_index(name = 'Joe')
MATCH me-[r1]->other-[r2]->me
WHERE type(r1)=type(r2) AND type(r1) =~ /FOLLOWS|LOVES/
RETURN other.name, type(r1)
People that FOLLOWS or LOVES Joe back.
Table 5.10. Result
+---------------------+
|other.name |type(r1) |
|---------------------|
|3 rows |
|---------------------|
|0 ms |
|---------------------|
|"Sara" |"FOLLOWS"|
|-----------+---------|
|"Maria" |"FOLLOWS"|
|-----------+---------|
|"Maria" |"LOVES" |
+---------------------+
5.8. A multilevel indexing structure (path tree)
In this example, a multi-level tree structure is used to index event nodes
(here Event1, Event2 and Event3, in this case with a YEAR-MONTH-DAY
granularity, making this a timeline indexing structure. However, this approach
should work for a wide range of multi-level ranges.
The structure follows a couple of rules:
* Events can be indexed multiple times by connecting the indexing structure
leafs with the events via a VALUE relationship.
* The querying is done in a path-range fashion. That is, the start- and end
path from the indexing root to the start and end leafs in the tree are
calculated
* Using Cypher, the queries following different strategies can be expressed
as path sections and put together using one single query.
The graph below depicts a structure with 3 Events being attached to an index
structure at different leafs.
Graph
cypher-pathtree-layout-path.svg
5.8.1. Return zero range
Here, only the events indexed under one leaf (2010-12-31) are returned. The
query only needs one path segment rootPath (color Green) through the index.
Graph
cypher-pathtree-layout-zero-range.svg
Query
START root=node:node_auto_index(name = 'Root')
MATCH rootPath=root-[:`2010`]->()-[:`12`]->()-[:`31`]->leaf, leaf-[:VALUE]->event
RETURN event.name
ORDER BY event.name ASC
Returning all events on the date 2010-12-31, in this case Event1 and Event2
Table 5.11. Result
+----------+
|event.name|
|----------|
|2 rows |
|----------|
|0 ms |
|----------|
|"Event1" |
|----------|
|"Event2" |
+----------+
5.8.2. Return the full range
In this case, the range goes from the first to the last leaf of the index tree.
Here, startPath (color Greenyellow) and endPath (color Green) span up the
range, valuePath (color Blue) is then connecting the leafs, and the values can
be read from the middle node, hanging off the values (color Red) path.
Graph
cypher-pathtree-layout-full-range-path.svg
Query
START root=node:node_auto_index(name = 'Root')
MATCH startPath=root-[:`2010`]->()-[:`12`]->()-[:`31`]->startLeaf,
endPath=root-[`:2011`]->()-[:`01`]->()-[:`03`]->endLeaf,
valuePath=startLeaf-[:NEXT*0..]->middle-[:NEXT*0..]->endLeaf,
values=middle-[:VALUE]->event
RETURN event.name
ORDER BY event.name ASC
Returning all events between 2010-12-31 and 2011-01-03, in this case all
events.
Table 5.12. Result
+----------+
|event.name|
|----------|
|4 rows |
|----------|
|0 ms |
|----------|
|"Event1" |
|----------|
|"Event2" |
|----------|
|"Event2" |
|----------|
|"Event3" |
+----------+
5.8.3. Return partly shared path ranges
Here, the query range results in partly shared paths when querying the index,
making the introduction of and common path segment commonPath (color Black)
necessary, before spanning up startPath (color Greenyellow) and endPath (color
Darkgreen) . After that, valuePath (color Blue) connects the leafs and the
indexed values are returned off values (color Red) path.
Graph
cypher-pathtree-layout-shared-root-path.svg
Query
START root=node:node_auto_index(name = 'Root')
MATCH commonPath=root-[:`2011`]->()-[:`01`]->commonRootEnd,
startPath=commonRootEnd-[:`01`]->startLeaf,
endPath=commonRootEnd-[:`03`]->endLeaf,
valuePath=startLeaf-[:NEXT*0..]->middle-[:NEXT*0..]->endLeaf,
values=middle-[:VALUE]->event
RETURN event.name
ORDER BY event.name ASC
Returning all events between 2011-01-01 and 2011-01-03, in this case Event2 and
Event3.
Table 5.13. Result
+----------+
|event.name|
|----------|
|2 rows |
|----------|
|0 ms |
|----------|
|"Event2" |
|----------|
|"Event3" |
+----------+
Chapter 6. Using the Neo4j REST API
The included Java example shows a “low-level” approach to using the Neo4j REST
API from Java. For other options, see below.
Table 6.1. Neo4j REST clients contributed by the community.
+-----------------------------------------------------------------------------+
|name |language / |URL |
| |framework | |
|-----------------+-----------+-----------------------------------------------|
|Java-Rest-Binding|Java |https://github.com/neo4j/java-rest-binding/ |
| | | |
|-----------------+-----------+-----------------------------------------------|
|Neo4jClient |.NET |http://hg.readify.net/neo4jclient/ |
|-----------------+-----------+-----------------------------------------------|
|Neo4jRestNet |.NET |https://github.com/SepiaGroup/Neo4jRestNet |
| | | |
|-----------------+-----------+-----------------------------------------------|
|py2neo |Python |http://py2neo.org/ |
|-----------------+-----------+-----------------------------------------------|
|Bulbflow |Python |http://bulbflow.com/ |
|-----------------+-----------+-----------------------------------------------|
|neo4jrestclient |Python |https://github.com/versae/neo4j-rest-client |
| | | |
|-----------------+-----------+-----------------------------------------------|
|neo4django |Django |https://github.com/scholrly/neo4django |
|-----------------+-----------+-----------------------------------------------|
|Neo4jPHP |PHP |https://github.com/jadell/Neo4jPHP |
|-----------------+-----------+-----------------------------------------------|
|neography |Ruby |https://github.com/maxdemarzi/neography |
|-----------------+-----------+-----------------------------------------------|
|neoid |Ruby |https://github.com/elado/neoid |
|-----------------+-----------+-----------------------------------------------|
|node.js |JavaScript |https://github.com/thingdom/node-neo4j |
|-----------------+-----------+-----------------------------------------------|
|Neocons |Clojure |https://github.com/michaelklishin/neocons |
| | | |
+-----------------------------------------------------------------------------+
6.1. How to use the REST API from Java
6.1.1. Creating a graph through the REST API from Java
The REST API uses HTTP and JSON, so that it can be used from many languages and
platforms. Still, when geting started it’s useful to see some patterns that can
be re-used. In this brief overview, we’ll show you how to create and manipulate
a simple graph through the REST API and also how to query it.
For these examples, we’ve chosen the Jersey client
components, which are easily downloaded via Maven.
6.1.2. Start the server
Before we can perform any actions on the server, we need to start it as per
Section 18.1, “Server Installation”.
WebResource resource = Client.create()
.resource( SERVER_ROOT_URI );
ClientResponse response = resource.get( ClientResponse.class );
System.out.println( String.format( "GET on [%s], status code [%d]",
SERVER_ROOT_URI, response.getStatus() ) );
response.close();
If the status of the response is 200 OK, then we know the server is running
fine and we can continue. If the code fails to conenct to the server, then
please have a look at Chapter 18, Neo4j Server.
Note
If you get any other response than 200 OK (particularly 4xx or 5xx responses)
then please check your configuration and look in the log files in the data/log
directory.
6.1.3. Creating a node
The REST API uses POST to create nodes. Encapsulating that in Java is
straightforward using the Jersey client:
final String nodeEntryPointUri = SERVER_ROOT_URI + "node";
// http://localhost:7474/db/data/node
WebResource resource = Client.create()
.resource( nodeEntryPointUri );
// POST {} to the node entry point URI
ClientResponse response = resource.accept( MediaType.APPLICATION_JSON )
.type( MediaType.APPLICATION_JSON )
.entity( "{}" )
.post( ClientResponse.class );
final URI location = response.getLocation();
System.out.println( String.format(
"POST to [%s], status code [%d], location header [%s]",
nodeEntryPointUri, response.getStatus(), location.toString() ) );
response.close();
return location;
If the call completes successfully, under the covers it will have sent a HTTP
request containing a JSON payload to the server. The server will then have
created a new node in the database and responded with a 201 Created response
and a Location header with the URI of the newly created node.
In our example, we call this functionality twice to create two nodes in our
database.
6.1.4. Adding properties
Once we have nodes in our datatabase, we can use them to store useful data. In
this case, we’re going to store information about music in our database. Let’s
start by looking at the code that we use to create nodes and add properties.
Here we’ve added nodes to represent "Joe Strummer" and a band called "The
Clash".
URI firstNode = createNode();
addProperty( firstNode, "name", "Joe Strummer" );
URI secondNode = createNode();
addProperty( secondNode, "band", "The Clash" );
Inside the addProperty method we determine the resource that represents
properties for the node and decide on a name for that property. We then proceed
to PUT the value of that property to the server.
String propertyUri = nodeUri.toString() + "/properties/" + propertyName;
// http://localhost:7474/db/data/node/{node_id}/properties/{property_name}
WebResource resource = Client.create()
.resource( propertyUri );
ClientResponse response = resource.accept( MediaType.APPLICATION_JSON )
.type( MediaType.APPLICATION_JSON )
.entity( "\"" + propertyValue + "\"" )
.put( ClientResponse.class );
System.out.println( String.format( "PUT to [%s], status code [%d]",
propertyUri, response.getStatus() ) );
response.close();
If everything goes well, we’ll get a 204 No Content back indicating that the
server processed the request but didn’t echo back the property value.
6.1.5. Adding relationships
Now that we have nodes to represent Joe Strummer and The Clash, we can relate
them. The REST API supports this through a POST of a relationship
representation to the start node of the relationship. Correspondingly in Java
we POST some JSON to the URI of our node that represents Joe Strummer, to
establish a relationship between that node and the node representing The Clash.
URI relationshipUri = addRelationship( firstNode, secondNode, "singer",
"{ \"from\" : \"1976\", \"until\" : \"1986\" }" );
Inside the addRelationship method, we determine the URI of the Joe Strummer
node’s relationships, and then POST a JSON description of our intended
relationship. This description contains the destination node, a label for the
relationship type, and any attributes for the relation as a JSON collection.
private static URI addRelationship( URI startNode, URI endNode,
String relationshipType, String jsonAttributes )
throws URISyntaxException
{
URI fromUri = new URI( startNode.toString() + "/relationships" );
String relationshipJson = generateJsonRelationship( endNode,
relationshipType, jsonAttributes );
WebResource resource = Client.create()
.resource( fromUri );
// POST JSON to the relationships URI
ClientResponse response = resource.accept( MediaType.APPLICATION_JSON )
.type( MediaType.APPLICATION_JSON )
.entity( relationshipJson )
.post( ClientResponse.class );
final URI location = response.getLocation();
System.out.println( String.format(
"POST to [%s], status code [%d], location header [%s]",
fromUri, response.getStatus(), location.toString() ) );
response.close();
return location;
}
If all goes well, we receive a 201 Created status code and a Location header
which contains a URI of the newly created relation.
6.1.6. Add properties to a relationship
Like nodes, relationships can have properties. Since we’re big fans of both Joe
Strummer and the Clash, we’ll add a rating to the relationship so that others
can see he’s a 5-star singer with the band.
addMetadataToProperty( relationshipUri, "stars", "5" );
Inside the addMetadataToProperty method, we determine the URI of the properties
of the relationship and PUT our new values (since it’s PUT it will always
overwrite existing values, so be careful).
private static void addMetadataToProperty( URI relationshipUri,
String name, String value ) throws URISyntaxException
{
URI propertyUri = new URI( relationshipUri.toString() + "/properties" );
String entity = toJsonNameValuePairCollection( name, value );
WebResource resource = Client.create()
.resource( propertyUri );
ClientResponse response = resource.accept( MediaType.APPLICATION_JSON )
.type( MediaType.APPLICATION_JSON )
.entity( entity )
.put( ClientResponse.class );
System.out.println( String.format(
"PUT [%s] to [%s], status code [%d]", entity, propertyUri,
response.getStatus() ) );
response.close();
}
Assuming all goes well, we’ll get a 200 OK response back from the server (which
we can check by calling ClientResponse.getStatus()) and we’ve now established a
very small graph that we can query.
6.1.7. Querying graphs
As with the embedded version of the database, the Neo4j server uses graph
traversals to look for data in graphs. Currently the Neo4j server expects a
JSON payload describing the traversal to be POST-ed at the starting node for
the traversal (though this is likely to change in time to a GET-based
approach).
To start this process, we use a simple class that can turn itself into the
equivalent JSON, ready for POST-ing to the server, and in this case we’ve
hardcoded the traverser to look for all nodes with outgoing relationships with
the type "singer".
// TraversalDescription turns into JSON to send to the Server
TraversalDescription t = new TraversalDescription();
t.setOrder( TraversalDescription.DEPTH_FIRST );
t.setUniqueness( TraversalDescription.NODE );
t.setMaxDepth( 10 );
t.setReturnFilter( TraversalDescription.ALL );
t.setRelationships( new Relationship( "singer", Relationship.OUT ) );
Once we have defined the parameters of our traversal, we just need to transfer
it. We do this by determining the URI of the traversers for the start node, and
then POST-ing the JSON representation of the traverser to it.
URI traverserUri = new URI( startNode.toString() + "/traverse/node" );
WebResource resource = Client.create()
.resource( traverserUri );
String jsonTraverserPayload = t.toJson();
ClientResponse response = resource.accept( MediaType.APPLICATION_JSON )
.type( MediaType.APPLICATION_JSON )
.entity( jsonTraverserPayload )
.post( ClientResponse.class );
System.out.println( String.format(
"POST [%s] to [%s], status code [%d], returned data: "
+ System.getProperty( "line.separator" ) + "%s",
jsonTraverserPayload, traverserUri, response.getStatus(),
response.getEntity( String.class ) ) );
response.close();
Once that request has completed, we get back our dataset of singers and the
bands they belong to:
[ {
"outgoing_relationships" : "http://localhost:7474/db/data/node/82/relationships/out",
"data" : {
"band" : "The Clash",
"name" : "Joe Strummer"
},
"traverse" : "http://localhost:7474/db/data/node/82/traverse/{returnType}",
"all_typed_relationships" : "http://localhost:7474/db/data/node/82/relationships/all/{-list|&|types}",
"property" : "http://localhost:7474/db/data/node/82/properties/{key}",
"all_relationships" : "http://localhost:7474/db/data/node/82/relationships/all",
"self" : "http://localhost:7474/db/data/node/82",
"properties" : "http://localhost:7474/db/data/node/82/properties",
"outgoing_typed_relationships" : "http://localhost:7474/db/data/node/82/relationships/out/{-list|&|types}",
"incoming_relationships" : "http://localhost:7474/db/data/node/82/relationships/in",
"incoming_typed_relationships" : "http://localhost:7474/db/data/node/82/relationships/in/{-list|&|types}",
"create_relationship" : "http://localhost:7474/db/data/node/82/relationships"
}, {
"outgoing_relationships" : "http://localhost:7474/db/data/node/83/relationships/out",
"data" : {
},
"traverse" : "http://localhost:7474/db/data/node/83/traverse/{returnType}",
"all_typed_relationships" : "http://localhost:7474/db/data/node/83/relationships/all/{-list|&|types}",
"property" : "http://localhost:7474/db/data/node/83/properties/{key}",
"all_relationships" : "http://localhost:7474/db/data/node/83/relationships/all",
"self" : "http://localhost:7474/db/data/node/83",
"properties" : "http://localhost:7474/db/data/node/83/properties",
"outgoing_typed_relationships" : "http://localhost:7474/db/data/node/83/relationships/out/{-list|&|types}",
"incoming_relationships" : "http://localhost:7474/db/data/node/83/relationships/in",
"incoming_typed_relationships" : "http://localhost:7474/db/data/node/83/relationships/in/{-list|&|types}",
"create_relationship" : "http://localhost:7474/db/data/node/83/relationships"
} ]
6.1.8. Phew, is that it?
That’s a flavor of what we can do with the REST API. Naturally any of the HTTP
idioms we provide on the server can be easily wrapped, including removing nodes
and relationships through DELETE. Still if you’ve gotten this far, then
switching .post() for .delete() in the Jersey client code should be
straightforward.
6.1.9. What’s next?
The HTTP API provides a good basis for implementers of client libraries, it’s
also great for HTTP and REST folks. In the future though we expect that
idiomatic language bindings will appear to take advantage of the REST API while
providing comfortable language-level constructs for developers to use, much as
there are similar bindings for the embedded database. For a list of current
Neo4j REST clients and embedded wrappers, see http://www.delicious.com/neo4j/
drivers .
6.1.10. Appendix: the code
* CreateSimpleGraph.java
* Relationship.java
* TraversalDescription.java
Chapter 7. Extending the Neo4j Server
The Neo4j Server can be extended by either plugins or unmanaged extensions. For
more information on the server, see Chapter 18, Neo4j Server.
7.1. Server Plugins
Quick info
* The server’s functionality can be extended by adding plugins.
* Plugins are user-specified code which extend the capabilities of the
database, nodes, or relationships.
* The neo4j server will then advertise the plugin functionality within
representations as clients interact via HTTP.
Plugins provide an easy way to extend the Neo4j REST API with new
functionality, without the need to invent your own API. Think of plugins as
server-side scripts that can add functions for retrieving and manipulating
nodes, relationships, paths, properties or indices.
Tip
If you want to have full control over your API, and are willing to put in the
effort, and understand the risks, then Neo4j server also provides hooks for
unmanaged extensions based on JAX-RS.
The needed classes reside in the org.neo4j:server-api jar file. See
the linked page for downloads and instructions on how to include it using
dependency management. For Maven projects, add the Server API dependencies in
your pom.xml like this:
org.neo4j
server-api
${neo4j-version}
Where ${neo4j-version} is the intended version.
To create a plugin, your code must inherit from the ServerPlugin class. Your plugin should also:
* ensure that it can produce an (Iterable of) Node, Relationship or Path, or
any Java primitive or String
* specify parameters,
* specify a point of extension and of course
* contain the application logic.
* make sure that the discovery point type in the @PluginTarget and the
@Source parameter are of the same type.
An example of a plugin which augments the database (as opposed to nodes or
relationships) follows:
Get all nodes or relationships plugin.
@Description( "An extension to the Neo4j Server for getting all nodes or relationships" )
public class GetAll extends ServerPlugin
{
@Name( "get_all_nodes" )
@Description( "Get all nodes from the Neo4j graph database" )
@PluginTarget( GraphDatabaseService.class )
public Iterable getAllNodes( @Source GraphDatabaseService graphDb )
{
return GlobalGraphOperations.at( graphDb ).getAllNodes();
}
@Description( "Get all relationships from the Neo4j graph database" )
@PluginTarget( GraphDatabaseService.class )
public Iterable getAllRelationships( @Source GraphDatabaseService graphDb )
{
return GlobalGraphOperations.at( graphDb ).getAllRelationships();
}
}
The full source code is found here: GetAll.java
Find the shortest path between two nodes plugin.
public class ShortestPath extends ServerPlugin
{
@Description( "Find the shortest path between two nodes." )
@PluginTarget( Node.class )
public Iterable shortestPath(
@Source Node source,
@Description( "The node to find the shortest path to." )
@Parameter( name = "target" ) Node target,
@Description( "The relationship types to follow when searching for the shortest path(s). " +
"Order is insignificant, if omitted all types are followed." )
@Parameter( name = "types", optional = true ) String[] types,
@Description( "The maximum path length to search for, default value (if omitted) is 4." )
@Parameter( name = "depth", optional = true ) Integer depth )
{
Expander expander;
if ( types == null )
{
expander = Traversal.expanderForAllTypes();
}
else
{
expander = Traversal.emptyExpander();
for ( int i = 0; i < types.length; i++ )
{
expander = expander.add( DynamicRelationshipType.withName( types[i] ) );
}
}
PathFinder shortestPath = GraphAlgoFactory.shortestPath(
expander, depth == null ? 4 : depth.intValue() );
return shortestPath.findAllPaths( source, target );
}
}
The full source code is found here: ShortestPath.java
To deploy the code, simply compile it into a .jar file and place it onto the
server classpath (which by convention is the plugins directory under the Neo4j
server home directory).
Tip
Make sure the directories listings are retained in the jarfile by either
building with default Maven, or with jar -cvf myext.jar *, making sure to jar
directories instead of specifying single files.
The .jar file must include the file META-INF/services/
org.neo4j.server.plugins.ServerPlugin with the fully qualified name of the
implementation class. This is an example with multiple entries, each on a
separate line:
org.neo4j.examples.server.plugins.GetAll
org.neo4j.examples.server.plugins.DepthTwo
org.neo4j.examples.server.plugins.ShortestPath
The code above makes an extension visible in the database representation (via
the @PluginTarget annotation) whenever it is served from the Neo4j Server.
Simply changing the @PluginTarget parameter to Node.class or Relationship.class
allows us to target those parts of the data model should we wish. The
functionality extensions provided by the plugin are automatically advertised in
representations on the wire. For example, clients can discover the extension
implemented by the above plugin easily by examining the representations they
receive as responses from the server, e.g. by performing a GET on the default
database URI:
curl -v http://localhost:7474/db/data/
The response to the GET request will contain (by default) a JSON container that
itself contains a container called "extensions" where the available plugins are
listed. In the following case, we only have the GetAll plugin registered with
the server, so only its extension functionality is available. Extension names
will be automatically assigned, based on method names, if not specifically
specified using the @Name annotation.
{
"extensions-info" : "http://localhost:7474/db/data/ext",
"node" : "http://localhost:7474/db/data/node",
"node_index" : "http://localhost:7474/db/data/index/node",
"relationship_index" : "http://localhost:7474/db/data/index/relationship",
"reference_node" : "http://localhost:7474/db/data/node/0",
"extensions_info" : "http://localhost:7474/db/data/ext",
"extensions" : {
"GetAll" : {
"get_all_nodes" : "http://localhost:7474/db/data/ext/GetAll/graphdb/get_all_nodes",
"get_all_relationships" : "http://localhost:7474/db/data/ext/GetAll/graphdb/getAllRelationships"
}
}
Performing a GET on one of the two extension URIs gives back the meta
information about the service:
curl http://localhost:7474/db/data/ext/GetAll/graphdb/get_all_nodes
{
"extends" : "graphdb",
"description" : "Get all nodes from the Neo4j graph database",
"name" : "get_all_nodes",
"parameters" : [ ]
}
To use it, just POST to this URL, with parameters as specified in the
description and encoded as JSON data content. F.ex for calling the shortest
path extension (URI gotten from a GET to http://localhost:7474/db/data/node/123
):
curl -X POST http://localhost:7474/db/data/ext/GetAll/node/123/shortestPath \
-H "Content-Type: application/json" \
-d '{"target":"http://localhost:7474/db/data/node/456&depth=5"}'
If everything is OK a response code 200 and a list of zero or more items will
be returned. If nothing is returned (null returned from extension) an empty
result and response code 204 will be returned. If the extension throws an
exception response code 500 and a detailed error message is returned.
Extensions that do any kind of write operation will have to manage their own
transactions, i.e. transactions aren’t managed automatically.
Through this model, any plugin can naturally fit into the general hypermedia
scheme that Neo4j espouses - meaning that clients can still take advantage of
abstractions like Nodes, Relationships and Paths with a straightforward upgrade
path as servers are enriched with plugins (old clients don’t break).
7.2. Unmanaged Extensions
Quick info
* Danger: Men at Work! The unmanaged extensions are a way of deploying
arbitrary JAX-RS code into the Neo4j server.
* The unmanaged extensions are exactly that: unmanaged. If you drop poorly
tested code into the server, it’s highly likely you’ll degrade its
performance, so be careful.
Some projects want extremely fine control over their server-side code. For this
we’ve introduced an unmanaged extension API.
Warning
This is a sharp tool, allowing users to deploy arbitrary JAX-RS classes to the server and so you should be
careful when thinking about using this. In particular you should understand
that it’s easy to consume lots of heap space on the server and hinder
performance if you’re not careful.
Still, if you understand the disclaimer, then you load your JAX-RS classes into
the Neo4j server simply by adding adding a @Context annotation to your code,
compiling against the JAX-RS jar and any Neo4j jars you’re making use of. Then
add your classes to the runtime classpath (just drop it in the lib directory of
the Neo4j server). In return you get access to the hosted environment of the
Neo4j server like logging through the org.neo4j.server.logging.Logger.
In your code, you get access to the underlying GraphDatabaseService through the
@Context annotation like so:
public MyCoolService( @Context GraphDatabaseService database )
{
// Have fun here, but be safe!
}
Remember, the unmanaged API is a very sharp tool. It’s all to easy to
compromise the server by deploying code this way, so think first and see if you
can’t use the managed extensions in preference. However, a number of context
parameters can be automatically provided for you, like the reference to the
database.
In order to specify the mount point of your extension, a full class looks like
this:
Unmanaged extension example.
@Path( "/helloworld" )
public class HelloWorldResource
{
private final GraphDatabaseService database;
public HelloWorldResource( @Context GraphDatabaseService database )
{
this.database = database;
}
@GET
@Produces( MediaType.TEXT_PLAIN )
@Path( "/{nodeId}" )
public Response hello( @PathParam( "nodeId" ) long nodeId )
{
// Do stuff with the database
return Response.status( Status.OK ).entity(
( "Hello World, nodeId=" + nodeId ).getBytes() ).build();
}
}
The full source code is found here: HelloWorldResource.java
Build this code, and place the resulting jar file (and any custom dependencies)
into the $NEO4J_SERVER_HOME/plugins directory, and include this class in the
neo4j-server.properties file, like so:
Tip
Make sure the directories listings are retained in the jarfile by either
building with default Maven, or with jar -cvf myext.jar *, making sure to jar
directories instead of specifying single files.
#Comma separated list of JAXRS packages containing JAXRS Resource, one package name for each mountpoint.
org.neo4j.server.thirdparty_jaxrs_classes=org.neo4j.examples.server.unmanaged=/examples/unmanaged
Which binds the hello method to respond to GET requests at the URI: http://
{neo4j_server}:{neo4j_port}/examples/unmanaged/helloworld/{nodeId}
curl http://localhost:7474/examples/unmanaged/helloworld/123
which results in
Hello World, nodeId=123
Chapter 8. The Traversal Framework
The Neo4j Traversal API is a callback based, lazily
executed way of specifying desired movements through a graph in Java. Some
traversal examples are collected under Section 4.5, “Traversal”.
There is also a more restricted way to perform traversals, Node.traverse()
is a good starting point to read more about that.
Other options to traverse or query graphs in Neo4j are Cypher and Gremlin.
8.1. Main concepts
Here follows a short explanation of all different methods that can modify or
add to a traversal description.
* Expanders — define what to traverse, typically in terms of relationships
direction and type.
* Order — for example depth-first or breadth-first.
* Uniqueness — visit nodes (relationships, paths) only once.
* Evaluator — decide what to return and whether to stop or continue traversal
beyond the current position.
* A starting node where the traversal will begin.
graphdb-traversal-description.svg
See Section 8.2, “Traversal Framework Java API” for more details.
8.2. Traversal Framework Java API
The traversal framework consists of a few main interfaces in addition to Node
and Relationship: TraversalDescription, Evaluator, Traverser and Uniqueness are
the main ones. The Path interface also has a special purpose in traversals,
since it is used to represent a position in the graph when evaluating that
position. Furthermore the RelationshipExpander and Expander interfaces are
central to traversals, but users of the API rarely need to implement them.
There are also a set of interfaces for advanced use, when explicit control over
the traversal order is required: BranchSelector, BranchOrderingPolicy and
TraversalBranch.
8.2.1. TraversalDescription
The TraversalDescription is the main
interface used for defining and initializing traversals. It is not meant to be
implemented by users of the traversal framework, but rather to be provided by
the implementation of the traversal framework as a way for the user to describe
traversals. TraversalDescription instances are immutable and its methods
returns a new TraversalDescription that is modified compared to the object the
method was invoked on with the arguments of the method.
8.2.1.1. Relationships
Adds a relationship type to the list of relationship types to traverse. By
default that list is empty and it means that it will traverse all relationships
, irregardless of type. If one or more relationships are added to this list
only the added types will be traversed. There are two methods, one including
direction and another one
excluding direction , where the
latter traverses relationships in both directions .
8.2.2. Evaluator
Evaluator s are used for deciding, at each position
(represented as a Path): should the traversal continue, and/or should the node
be included in the result. Given a Path, it asks for one of four actions for
that branch of the traversal:
* Evaluation.INCLUDE_AND_CONTINUE: Include this node in the result and
continue the traversal
* Evaluation.INCLUDE_AND_PRUNE: Include this node in the result, but don’t
continue the traversal
* Evaluation.EXCLUDE_AND_CONTINUE: Exclude this node from the result, but
continue the traversal
* Evaluation.EXCLUDE_AND_PRUNE: Exclude this node from the result and don’t
continue the traversal
More than one evaluator can be added. Note that evaluators will be called for
all positions the traverser encounters, even for the start node.
8.2.3. Traverser
The Traverser object is the result of invoking traverse()
of a TraversalDescription object. It represents a traversal positioned in the
graph, and a specification of the format of the result. The actual traversal is
performed lazily each time the next()-method of the iterator of the Traverser
is invoked.
8.2.4. Uniqueness
Sets the rules for how positions can be revisited during a traversal as stated
in Uniqueness . Default if not set is NODE_GLOBAL .
A Uniqueness can be supplied to the TraversalDescription to dictate under what
circumstances a traversal may revisit the same position in the graph. The
various uniqueness levels that can be used in Neo4j are:
* NONE - any position in the graph may be revisited.
* NODE_GLOBAL uniqueness - no node in the entire graph may be visited more
than once. This could potentially consume a lot of memory since it requires
keeping an in-memory data structure remembering all the visited nodes.
* RELATIONSHIP_GLOBAL uniqueness - no relationship in the entire graph may be
visited more than once. For the same reasons as NODE_GLOBAL uniqueness,
this could use up a lot of memory. But since graphs typically have a larger
number of relationships than nodes, the memory overhead of this uniqueness
level could grow even quicker.
* NODE_PATH uniqueness - a node may not occur previously in the path reaching
up to it.
* RELATIONSHIP_PATH uniqueness - a relationship may not occur previously in
the path reaching up to it.
* NODE_RECENT uniqueness - Similar to NODE_GLOBAL uniqueness in that there is
a global collection of visited nodes each position is checked against. This
uniqueness level does however have a cap on how much memory it may consume
in the form of a collection that only contains the most recently visited
nodes. The size of this collection can be specified by providing a number
as the second argument to the TraversalDescription.uniqueness()-method
along with the uniqueness level.
* RELATIONSHIP_RECENT uniqueness - works like NODE_RECENT uniqueness, but
with relationships instead of nodes.
8.2.4.1. Depth First / Breadth First
These are convenience methods for setting preorder depth-first / breadth-first BranchSelector|ordering policies.
The same result can be achieved by calling the order method with ordering policies from the
Traversal factory , or to write your own BranchSelector/BranchOrderingPolicy
and pass in.
8.2.5. Order - How to move through branches?
A more generic version of depthFirst/breadthFirst methods in that it allows an
arbitrary BranchOrderingPolicy to be injected
into the description.
8.2.6. BranchSelector
A BranchSelector is used for selecting which branch of the traversal to attempt
next. This is used for implementing traversal orderings. The traversal
framework provides a few basic ordering implementations:
* Traversal.preorderDepthFirst() - Traversing depth first, visiting each node
before visiting its child nodes.
* Traversal.postorderDepthFirst() - Traversing depth first, visiting each
node after visiting its child nodes.
* Traversal.preorderBreadthFirst() - Traversing breadth first, visiting each
node before visiting its child nodes.
* Traversal.postorderBreadthFirst() - Traversing breadth first, visiting each
node after visiting its child nodes.
Note
Please note that breadth first traversals have a higher memory overhead than
depth first traversals.
BranchSelectors carries state and hence needs to be uniquely instantiated for
each traversal. Therefore it is supplied to the TraversalDescription through a
BranchOrderingPolicy interface, which is a factory of BranchSelector instances.
A user of the Traversal framework rarely needs to implement his own
BranchSelector or BranchOrderingPolicy, it is provided to let graph algorithm
implementors provide their own traversal orders. The Neo4j Graph Algorithms
package contains for example a BestFirst order BranchSelector/
BranchOrderingPolicy that is used in BestFirst search algorithms such as A* and
Dijkstra.
8.2.6.1. BranchOrderingPolicy
A factory for creating BranchSelectors to decide in what order branches are
returned (where a branch’s position is represented as a Path
from the start node to the current node). Common policies are depth-first
and breadth-first
and that’s why
there are convenience methods for those. For example, calling
TraversalDescription#depthFirst() is equivalent to:
description.order( Traversal.preorderDepthFirst() );
8.2.6.2. TraversalBranch
An object used by the BranchSelector to get more branches from a certain
branch. In essence these are a composite of a Path and a RelationshipExpander
that can be used to get new TraversalBranch es from
the current one.
8.2.7. Path
A Path is a general interface that is part of the Neo4j API. In the
traversal API of Neo4j the use of Paths are twofold. Traversers can return
their results in the form of the Paths of the visited positions in the graph
that are marked for being returned. Path objects are also used in the
evaluation of positions in the graph, for determining if the traversal should
continue from a certain point or not, and whether a certain position should be
included in the result set or not.
8.2.8. RelationshipExpander
The traversal framework use RelationshipExpanders to discover the relationships
that should be followed from a particular node to further branches in the
traversal.
8.2.9. Expander
A more generic version of relationships where a RelationshipExpander is
injected, defining all relationships to be traversed for any given node. By
default (and when using relationships) a default expander is used, where any particular order of relationships isn’t
guaranteed. There’s another implementation which guarantees that relationships
are traversed in order of relationship type , where types
are iterated in the order they were added.
The Expander interface is an extension of the RelationshipExpander interface
that makes it possible to build customized versions of an Expander. The
implementation of TraversalDescription uses this to provide methods for
defining which relationship types to traverse, this is the usual way a user of
the API would define a RelationshipExpander — by building it internally in the
TraversalDescription.
All the RelationshipExpanders provided by the Neo4j traversal framework also
implement the Expander interface. For a user of the traversal API it is easier
to implement the RelationshipExpander interface, since it only contains one
method — the method for getting the relationships from a node, the methods that
the Expander interface adds are just for building new Expanders.
8.2.10. How to use the Traversal framework
In contrary to Node#traverse a traversal description is built (using a fluent interface) and such a
description can spawn traversers .
Figure 8.1. Traversal Example Graph
Traversal-Example-Graph-how-to-use-the-Traversal-framework.svg
With the definition of the RelationshipTypes as
private enum Rels implements RelationshipType
{
LIKES, KNOWS
}
The graph can be traversed with for example the following traverser, starting
at the “Joe” node:
for ( Path position : Traversal.description()
.depthFirst()
.relationships( Rels.KNOWS )
.relationships( Rels.LIKES, Direction.INCOMING )
.evaluator( Evaluators.toDepth( 5 ) )
.traverse( node ) )
{
output += position + "\n";
}
The traversal will output:
(7)
(7)<--[LIKES,1]--(4)
(7)<--[LIKES,1]--(4)--[KNOWS,6]-->(1)
(7)<--[LIKES,1]--(4)--[KNOWS,6]-->(1)--[KNOWS,4]-->(6)
(7)<--[LIKES,1]--(4)--[KNOWS,6]-->(1)--[KNOWS,4]-->(6)--[KNOWS,3]-->(5)
(7)<--[LIKES,1]--(4)--[KNOWS,6]-->(1)--[KNOWS,4]-->(6)--[KNOWS,3]-->(5)--[KNOWS,2]-->(2)
(7)<--[LIKES,1]--(4)--[KNOWS,6]-->(1)<--[KNOWS,5]--(3)
Since TraversalDescription s are immutable
it is also useful to create template descriptions which holds common settings
shared by different traversals. For example, let’s start with this traverser:
final TraversalDescription FRIENDS_TRAVERSAL = Traversal.description()
.depthFirst()
.relationships( Rels.KNOWS )
.uniqueness( Uniqueness.RELATIONSHIP_GLOBAL );
This traverser would yield the following output (we will keep starting from the
“Joe” node):
(7)
(7)--[KNOWS,0]-->(2)
(7)--[KNOWS,0]-->(2)<--[KNOWS,2]--(5)
(7)--[KNOWS,0]-->(2)<--[KNOWS,2]--(5)<--[KNOWS,3]--(6)
(7)--[KNOWS,0]-->(2)<--[KNOWS,2]--(5)<--[KNOWS,3]--(6)<--[KNOWS,4]--(1)
(7)--[KNOWS,0]-->(2)<--[KNOWS,2]--(5)<--[KNOWS,3]--(6)<--[KNOWS,4]--(1)<--[KNOWS,5]--(3)
(7)--[KNOWS,0]-->(2)<--[KNOWS,2]--(5)<--[KNOWS,3]--(6)<--[KNOWS,4]--(1)<--[KNOWS,6]--(4)
Now let’s create a new traverser from it, restricting depth to three:
for ( Path path : FRIENDS_TRAVERSAL
.evaluator( Evaluators.toDepth( 3 ) )
.traverse( node ) )
{
output += path + "\n";
}
This will give us the following result:
(7)
(7)--[KNOWS,0]-->(2)
(7)--[KNOWS,0]-->(2)<--[KNOWS,2]--(5)
(7)--[KNOWS,0]-->(2)<--[KNOWS,2]--(5)<--[KNOWS,3]--(6)
Or how about from depth two to four? That’s done like this:
for ( Path path : FRIENDS_TRAVERSAL
.evaluator( Evaluators.fromDepth( 2 ) )
.evaluator( Evaluators.toDepth( 4 ) )
.traverse( node ) )
{
output += path + "\n";
}
This traversal gives us:
(7)--[KNOWS,0]-->(2)<--[KNOWS,2]--(5)
(7)--[KNOWS,0]-->(2)<--[KNOWS,2]--(5)<--[KNOWS,3]--(6)
(7)--[KNOWS,0]-->(2)<--[KNOWS,2]--(5)<--[KNOWS,3]--(6)<--[KNOWS,4]--(1)
For various useful evaluators, see the Evaluators Java
API or simply implement the Evaluator interface
yourself.
If you’re not interested in the Path s, but the Node s
you can transform the traverser into an iterable of nodes like this:
for ( Node currentNode : FRIENDS_TRAVERSAL
.traverse( node )
.nodes() )
{
output += currentNode.getProperty( "name" ) + "\n";
}
In this case we use it to retrieve the names:
Joe
Sara
Peter
Dirk
Lars
Ed
Lisa
Relationships are fine as well, here’s how
to get them:
for ( Relationship relationship : FRIENDS_TRAVERSAL
.traverse( node )
.relationships() )
{
output += relationship.getType() + "\n";
}
Here the relationship types are written, and we get:
KNOWS
KNOWS
KNOWS
KNOWS
KNOWS
KNOWS
The source code for the traversers in this example is available at:
TraversalExample.java
Chapter 9. Domain Modeling Gallery
The following chapters contain simplified examples of how different domains can
be modeled using Neo4j. The aim is not to give full examples, but to suggest
possible ways to think using the graph patterns and data locality in
traversals.
9.1. User roles in graphs
This is an example showing a hierarchy of roles. What’s interesting is that a
tree is not sufficient for storing this structure, as elaborated below.
roles.png
This is an implementation of an example found in the article A Model to
Represent Directed Acyclic Graphs (DAG) on SQL Databases by Kemal Erdogan
.
The article discusses how to store directed acyclic graphs (DAGs) in SQL based DBs. DAGs are
almost trees, but with a twist: it may be possible to reach the same node
through different paths. Trees are restricted from this possibility, which
makes them much easier to handle. In our case it is "Ali" and "Engin", as they
are both admins and users and thus reachable through these group nodes. Reality
often looks this way and can’t be captured by tree structures.
In the article an SQL Stored Procedure solution is provided. The main idea,
that also have some support from scientists, is to pre-calculate all possible
(transitive) paths. Pros and cons of this approach:
* decent performance on read
* low performance on insert
* wastes lots of space
* relies on stored procedures
In Neo4j storing the roles is trivial. In this case we use PART_OF (green
edges) relationships to model the group hierarchy and MEMBER_OF (blue edges) to
model membership in groups. We also connect the top level groups to the
reference node by ROOT relationships. This gives us a useful partitioning of
the graph. Neo4j has no predefined relationship types, you are free to create
any relationship types and give them any semantics you want.
Lets now have a look at how to retrieve information from the graph. The Java
code is using the Neo4j Traversal API (see Section 8.2, “Traversal Framework
Java API”), the queries are done using Cypher.
9.1.1. Get the admins
Node admins = getNodeByName( "Admins" );
Traverser traverser = admins.traverse(
Traverser.Order.BREADTH_FIRST,
StopEvaluator.END_OF_GRAPH,
ReturnableEvaluator.ALL_BUT_START_NODE,
RoleRels.PART_OF, Direction.INCOMING,
RoleRels.MEMBER_OF, Direction.INCOMING );
resulting in the output
Found: Ali at depth: 0
Found: HelpDesk at depth: 0
Found: Engin at depth: 1
Found: Demet at depth: 1
The result is collected from the traverser using this code:
String output = "";
for ( Node node : traverser )
{
output += "Found: " + node.getProperty( NAME ) + " at depth: "
+ ( traverser.currentPosition().depth() - 1 ) + "\n";
}
In Cypher, a similar query would be:
START admins=node(14)
MATCH admins<-[:PART_OF*0..]-group<-[:MEMBER_OF]-user
RETURN user.name, group.name
resulting in:
+---------------------+
|user.name|group.name |
|---------------------|
|3 rows |
|---------------------|
|4 ms |
|---------------------|
|"Ali" |"Admins" |
|---------+-----------|
|"Engin" |"HelpDesk" |
|---------+-----------|
|"Demet" |"HelpDesk" |
+---------------------+
9.1.2. Get the group memberships of a user
Using the Neo4j Java Traversal API, this query looks like:
Node jale = getNodeByName( "Jale" );
traverser = jale.traverse(
Traverser.Order.DEPTH_FIRST,
StopEvaluator.END_OF_GRAPH,
ReturnableEvaluator.ALL_BUT_START_NODE,
RoleRels.MEMBER_OF, Direction.OUTGOING,
RoleRels.PART_OF, Direction.OUTGOING );
resuling in:
Found: ABCTechnicians at depth: 0
Found: Technicians at depth: 1
Found: Users at depth: 2
In Cypher:
START jale=node(10)
MATCH jale-[:MEMBER_OF]->()-[:PART_OF*0..]->group
RETURN group.name
+----------------+
|group.name |
|----------------|
|3 rows |
|----------------|
|1 ms |
|----------------|
|"ABCTechnicians"|
|----------------|
|"Technicians" |
|----------------|
|"Users" |
+----------------+
9.1.3. Get all groups
In Java:
Node referenceNode = getNodeByName( "Reference_Node") ;
traverser = referenceNode.traverse(
Traverser.Order.BREADTH_FIRST,
StopEvaluator.END_OF_GRAPH,
ReturnableEvaluator.ALL_BUT_START_NODE,
RoleRels.ROOT, Direction.INCOMING,
RoleRels.PART_OF, Direction.INCOMING );
resulting in:
Found: Admins at depth: 0
Found: Users at depth: 0
Found: HelpDesk at depth: 1
Found: Managers at depth: 1
Found: Technicians at depth: 1
Found: ABCTechnicians at depth: 2
In Cypher:
START refNode=node(16)
MATCH refNode<-[:ROOT]->()<-[:PART_OF*0..]-group
RETURN group.name
+----------------+
|group.name |
|----------------|
|6 rows |
|----------------|
|3 ms |
|----------------|
|"Admins" |
|----------------|
|"HelpDesk" |
|----------------|
|"Users" |
|----------------|
|"Managers" |
|----------------|
|"Technicians" |
|----------------|
|"ABCTechnicians"|
+----------------+
9.1.4. Get all members of all groups
Now, let’s try to find all users in the system being part of any group.
in Java:
traverser = referenceNode.traverse(
Traverser.Order.BREADTH_FIRST,
StopEvaluator.END_OF_GRAPH,
new ReturnableEvaluator()
{
@Override
public boolean isReturnableNode(
TraversalPosition currentPos )
{
if ( currentPos.isStartNode() )
{
return false;
}
Relationship rel = currentPos.lastRelationshipTraversed();
return rel.isType( RoleRels.MEMBER_OF );
}
},
RoleRels.ROOT, Direction.INCOMING,
RoleRels.PART_OF, Direction.INCOMING,
RoleRels.MEMBER_OF, Direction.INCOMING );
Found: Ali at depth: 1
Found: Engin at depth: 1
Found: Burcu at depth: 1
Found: Can at depth: 1
Found: Demet at depth: 2
Found: Gul at depth: 2
Found: Fuat at depth: 2
Found: Hakan at depth: 2
Found: Irmak at depth: 2
Found: Jale at depth: 3
In Cypher, this looks like:
START refNode=node(16)
MATCH refNode<-[:ROOT]->root, p=root<-[PART_OF*0..]-()<-[:MEMBER_OF]-user
RETURN user.name, min(length(p))
ORDER BY min(length(p)), user.name
and results in the following output:
+-------------------------+
|user.name|min(length(p)) |
|-------------------------|
|10 rows |
|-------------------------|
|38 ms |
|-------------------------|
|"Ali" |1 |
|---------+---------------|
|"Burcu" |1 |
|---------+---------------|
|"Can" |1 |
|---------+---------------|
|"Engin" |1 |
|---------+---------------|
|"Demet" |2 |
|---------+---------------|
|"Fuat" |2 |
|---------+---------------|
|"Gul" |2 |
|---------+---------------|
|"Hakan" |2 |
|---------+---------------|
|"Irmak" |2 |
|---------+---------------|
|"Jale" |3 |
+-------------------------+
As seen above, querying even more complex scenarios can be done using
comparatively short constructs in Java and other query mechanisms.
9.2. ACL structures in graphs
This example gives a generic overview of an approach to handling Access Control
Lists (ACLs) in graphs, and a simplified example with concrete queries.
9.2.1. Generic approach
In many scenarios, an application needs to handle security on some form of
managed objects. This example describes one pattern to handle this through the
use of a graph structure and traversers that build a full permissions-structure
for any managed object with exclude and include overriding possibilities. This
results in a dynamic construction of ACLs based on the position and context of
the managed object.
The result is a complex security scheme that can easily be implemented in a
graph structure, supporting permissions overriding, principal and content
composition, without duplicating data anywhere.
ACL.png
9.2.1.1. Technique
As seen in the example graph layout, there are some key concepts in this domain
model:
* The managed content (folders and files) that are connected by
HAS_CHILD_CONTENT relationships
* The Principal subtree pointing out principals that can act as ACL members,
pointed out by the PRINCIPAL relationships.
* The aggregation of principals into groups, connected by the IS_MEMBER_OF
relationship. One principal (user or group) can be part of many groups at
the same time.
* The SECURITY - relationships, connecting the content composite structure to
the principal composite structure, containing a addition/removal modifier
property ("+RW")
9.2.1.2. Constructing the ACL
The calculation of the effective permissions (e.g. Read, Write, Execute) for a
principal for any given ACL-managed node (content) follows a number of rules
that will be encoded into the permissions-traversal:
9.2.1.3. Top-down-Traversal
This approach will let you define a generic permission pattern on the root
content, and then refine that for specific sub-content nodes and specific
principals.
1. Start at the content node in question traverse upwards to the content root
node to determine the path to it.
2. Start with a effective optimistic permissions list of "all permitted" (111
in a bit encoded ReadWriteExecute case) or 000 if you like pessimistic
security handling (everything is forbidden unless explicitly allowed).
3. Beginning from the topmost content node, look for any SECURITY
relationships on it.
4. If found, look if the principal in question is part of the end-principal of
the SECURITY relationship.
5. If yes, add the "+" permission modifiers to the existing permission
pattern, revoke the "-" permission modifiers from the pattern.
6. If two principal nodes link to the same content node, first apply the more
generic prinipals modifiers.
7. Repeat the security modifier search all the way down to the target content
node, thus overriding more generic permissions with the set on nodes closer
to the target node.
The same algorithm is applicable for the bottom-up approach, basically just
traversing from the target content node upwards and applying the security
modifiers dynamically as the traverser goes up.
9.2.1.4. Example
Now, to get the resulting access rights for e.g. "user 1" on the "My File.pdf"
in a Top-Down approach on the model in the graph above would go like:
1. Traveling upward, we start with "Root folder", and set the permissions to
11 initially (only considering Read, Write).
2. There are two SECURITY relationships to that folder. User 1 is contained in
both of them, but "root" is more generic, so apply it first then "All
principals" +W +R → 11.
3. "Home" has no SECURITY instructions, continue.
4. "user1 Home" has SECURITY. First apply "Regular Users" (-R -W) → 00, Then
"user 1" (+R +W) → 11.
5. The target node "My File.pdf" has no SECURITY modifiers on it, so the
effective permissions for "User 1" on "My File.pdf" are ReadWrite → 11.
9.2.2. Read-permission example
In this example, we are going to examine a tree structure of directories and
files. Also, there are users that own files and roles that can be assigned to
users. Roles can have permissions on directory or files structures (here we
model only canRead, as opposed to full rwx Unix permissions) and be nested. A
more thorough example of modeling ACL structures can be found at How to Build
Role-Based Access Control in SQL .
The-Domain-Structure-ACL-structures-in-graphs.svg
9.2.2.1. Find all files in the directory structure
In order to find all files contained in this structure, we need a variable
length query that follows all contains relationships and retrieves the nodes at
the other end of the leaf relationships.
START root=node:node_auto_index(name = 'FileRoot')
MATCH root-[:contains*0..]->(parentDir)-[:leaf]->file
RETURN file
resulting in:
+-----------------------+
|file |
|-----------------------|
|2 rows |
|-----------------------|
|191 ms |
|-----------------------|
|Node[11]{name->"File1"}|
|-----------------------|
|Node[10]{name->"File2"}|
+-----------------------+
9.2.2.2. What files are owned by whom?
If we introduce the concept of ownership on files, we then can ask for the
owners of the files we find — connected via owns relationships to file nodes.
START root=node:node_auto_index(name = 'FileRoot')
MATCH root-[:contains*0..]->()-[:leaf]->file<-[:owns]-user
RETURN file, user
Returning the owners of all files below the FileRoot node.
+----------------------------------------------+
|file |user |
|----------------------------------------------|
|2 rows |
|----------------------------------------------|
|3 ms |
|----------------------------------------------|
|Node[11]{name->"File1"}|Node[8]{name->"User1"}|
|-----------------------+----------------------|
|Node[10]{name->"File2"}|Node[7]{name->"User2"}|
+----------------------------------------------+
9.2.2.3. Who has access to a File?
If we now want to check what users have read access to all Files, and define
our ACL as
* The root directory has no access granted.
* Any user having a role that has been granted canRead access to one of the
parent folders of a File has read access.
In order to find users that can read any part of the parent folder hierarchy
above the files, Cypher provides optional variable length path.
START file=node:node_auto_index('name:File*')
MATCH file<-[:leaf]-()<-[:contains*0..]-dir<-[?:canRead]-role-[:member]->readUser
RETURN file.name, dir.name, role.name, readUser.name
This will return the file, and the directory where the user has the canRead
permission along with the user and their role.
+--------------------------------------------+
|file.name|dir.name |role.name|readUser.name|
|--------------------------------------------|
|9 rows |
|--------------------------------------------|
|67 ms |
|--------------------------------------------|
|"File2" |"Desktop" | | |
|---------+----------+---------+-------------|
|"File2" |"HomeU2" | | |
|---------+----------+---------+-------------|
|"File2" |"Home" | | |
|---------+----------+---------+-------------|
|"File2" |"FileRoot"|"SUDOers"|"Admin1" |
|---------+----------+---------+-------------|
|"File2" |"FileRoot"|"SUDOers"|"Admin2" |
|---------+----------+---------+-------------|
|"File1" |"HomeU1" | | |
|---------+----------+---------+-------------|
|"File1" |"Home" | | |
|---------+----------+---------+-------------|
|"File1" |"FileRoot"|"SUDOers"|"Admin1" |
|---------+----------+---------+-------------|
|"File1" |"FileRoot"|"SUDOers"|"Admin2" |
+--------------------------------------------+
The results listed above contain null values for optional path segments, which
can be mitigated by either asking several queries or returning just the really
needed values.
Chapter 10. Languages
The table below lists community contributed language- and framework bindings
for using Neo4j in embedded mode.
Table 10.1. Neo4j embedded drivers contributed by the community.
+-----------------------------------------------------------------------------+
|name |language /|URL |
| |framework | |
|-----------+----------+------------------------------------------------------|
|Neo4j.rb |JRuby |https://github.com/andreasronge/neo4j |
|-----------+----------+------------------------------------------------------|
|Neo4django |Python, |https://github.com/scholrly/neo4django |
|-----------+----------+------------------------------------------------------|
|Neo4js |JavaScript|https://github.com/neo4j/neo4js |
|-----------+----------+------------------------------------------------------|
|Gremlin |Java, |Section 19.16, “Gremlin Plugin”, https://github.com/ |
| |Groovy |tinkerpop/gremlin/wiki |
|-----------+----------+------------------------------------------------------|
|Neo4j-Scala|Scala |https://github.com/FaKod/neo4j-scala |
|-----------+----------+------------------------------------------------------|
|Borneo |Clojure |https://github.com/wagjo/borneo |
+-----------------------------------------------------------------------------+
For information on REST clients for different languages, see Chapter 6, Using
the Neo4j REST API.
Chapter 11. Using Neo4j embedded in Python applications
For instructions on how to install the Python Neo4j driver, see Section 20.1,
“Installation”.
For general information on the Python language binding, see Chapter 20, Python
embedded bindings.
11.1. Hello, world!
Here is a simple example to get you started.
from neo4j import GraphDatabase
# Create a database
db = GraphDatabase(folder_to_put_db_in)
# All write operations happen in a transaction
with db.transaction:
firstNode = db.node(name='Hello')
secondNode = db.node(name='world!')
# Create a relationship with type 'knows'
relationship = firstNode.knows(secondNode, name='graphy')
# Read operations can happen anywhere
message = ' '.join([firstNode['name'], relationship['name'], secondNode['name']])
print message
# Delete the data
with db.transaction:
firstNode.knows.single.delete()
firstNode.delete()
secondNode.delete()
# Always shut down your database when your application exits
db.shutdown()
11.2. A sample app using traversals and indexes
For detailed documentation on the concepts use here, see Section 20.3,
“Indexes” and Section 20.5, “Traversals”.
This example shows you how to get started building something like a simple
invoice tracking application with Neo4j.
We start out by importing Neo4j, and creating some meta data that we will use
to organize our actual data with.
from neo4j import GraphDatabase, INCOMING, Evaluation
# Create a database
db = GraphDatabase(folder_to_put_db_in)
# All write operations happen in a transaction
with db.transaction:
# A node to connect customers to
customers = db.node()
# A node to connect invoices to
invoices = db.node()
# Connected to the reference node, so
# that we can always find them.
db.reference_node.CUSTOMERS(customers)
db.reference_node.INVOICES(invoices)
# An index, helps us rapidly look up customers
customer_idx = db.node.indexes.create('customers')
11.2.1. Domain logic
Then we define some domain logic that we want our application to be able to
perform. Our application has two domain objects, Customers and Invoices. Let’s
create methods to add new customers and invoices.
def create_customer(name):
with db.transaction:
customer = db.node(name=name)
customer.INSTANCE_OF(customers)
# Index the customer by name
customer_idx['name'][name] = customer
return customer
def create_invoice(customer, amount):
with db.transaction:
invoice = db.node(amount=amount)
invoice.INSTANCE_OF(invoices)
invoice.RECIPIENT(customer)
return customer
In the customer case, we create a new node to represent the customer and
connect it to the customers node. This helps us find customers later on, as
well as determine if a given node is a customer.
We also index the name of the customer, to allow for quickly finding customers
by name.
In the invoice case, we do the same, except no indexing. We also connect each
new invoice to the customer it was sent to, using a relationship of type
SENT_TO.
Next, we want to be able to retrieve customers and invoices that we have added.
Because we are indexing customer names, finding them is quite simple.
def get_customer(name):
return customer_idx['name'][name].single
Lets say we also like to do something like finding all invoices for a given
customer that are above some given amount. This could be done by writing a
traversal, like this:
def get_invoices_with_amount_over(customer, min_sum):
def evaluator(path):
node = path.end
if node.has_key('amount') and node['amount'] > min_sum:
return Evaluation.INCLUDE_AND_CONTINUE
return Evaluation.EXCLUDE_AND_CONTINUE
return db.traversal()\
.relationships('RECIPIENT', INCOMING)\
.evaluator(evaluator)\
.traverse(customer)\
.nodes
11.2.2. Creating data and getting it back
Putting it all together, we can create customers and invoices, and use the
search methods we wrote to find them.
for name in ['Acme Inc.', 'Example Ltd.']:
create_customer(name)
# Loop through customers
for relationship in customers.INSTANCE_OF:
customer = relationship.start
for i in range(1,12):
create_invoice(customer, 100 * i)
# Finding large invoices
large_invoices = get_invoices_with_amount_over(get_customer('Acme Inc.'), 500)
# Getting all invoices per customer:
for relationship in get_customer('Acme Inc.').RECIPIENT.incoming:
invoice = relationship.start
Part III. Reference
The reference part is the authoritative source for details on Neo4j usage. It
covers details on capabilities, transactions, indexing and queries among other
topics.
Table of Contents
12. Capabilities
12.1. Data Security
12.2. Data Integrity
12.2.1. Core Graph Engine
12.2.2. Different Data Sources
12.3. Data Integration
12.3.1. Event-based Synchronization
12.3.2. Periodic Synchronization
12.3.3. Periodic Full Export/Import of Data
12.4. Availability and Reliability
12.4.1. Operational Availability
12.4.2. Disaster Recovery/ Resiliency
12.5. Capacity
12.5.1. File Sizes
12.5.2. Read speed
12.5.3. Write speed
12.5.4. Data size
13. Transaction Management
13.1. Interaction cycle
13.2. Isolation levels
13.3. Default locking behavior
13.4. Deadlocks
13.5. Delete semantics
13.6. Creating unique nodes
13.6.1. Single thread
13.6.2. Get or create
13.6.3. Pessimistic locking
13.7. Transaction events
14. Data Import
14.1. Batch Insertion
14.1.1. Batch Inserter Examples
14.1.2. Batch Graph Database
14.1.3. Index Batch Insertion
15. Indexing
15.1. Introduction
15.2. Create
15.3. Delete
15.4. Add
15.5. Remove
15.6. Update
15.7. Search
15.7.1. Get
15.7.2. Query
15.8. Relationship indexes
15.9. Scores
15.10. Configuration and fulltext indexes
15.11. Extra features for Lucene indexes
15.11.1. Numeric ranges
15.11.2. Sorting
15.11.3. Querying with Lucene Query objects
15.11.4. Compound queries
15.11.5. Default operator
15.11.6. Caching
15.12. Automatic Indexing
15.12.1. Configuration
15.12.2. Search
15.12.3. Runtime Configuration
15.12.4. Updating the Automatic Index
16. Cypher Query Language
16.1. Operators
16.2. Expressions
16.3. Parameters
16.4. Identifiers
16.5. Comments
16.6. Updating the graph with Cypher
16.6.1. Updating query structure
16.6.2. Query Parts & Structure
16.6.3. Returning data
16.7. Transactions and Cypher
16.8. Start
16.8.1. Node by id
16.8.2. Relationship by id
16.8.3. Multiple nodes by id
16.8.4. All nodes
16.8.5. Node by index lookup
16.8.6. Relationship by index lookup
16.8.7. Node by index query
16.8.8. Multiple start points
16.9. Match
16.9.1. introduction
16.9.2. Related nodes
16.9.3. Outgoing relationships
16.9.4. Directed relationships and identifier
16.9.5. Match by relationship type
16.9.6. Match by multiple relationship types
16.9.7. Match by relationship type and use an identifier
16.9.8. Relationship types with uncommon characters
16.9.9. Multiple relationships
16.9.10. Variable length relationships
16.9.11. Relationship identifier in variable length relationships
16.9.12. Zero length paths
16.9.13. Optional relationship
16.9.14. Optional typed and named relationship
16.9.15. Properties on optional elements
16.9.16. Complex matching
16.9.17. Shortest path
16.9.18. All shortest paths
16.9.19. Named path
16.9.20. Matching on a bound relationship
16.10. Where
16.10.1. Boolean operations
16.10.2. Filter on node property
16.10.3. Regular expressions
16.10.4. Escaping in regular expressions
16.10.5. Case insensitive regular expressions
16.10.6. Filtering on relationship type
16.10.7. Property exists
16.10.8. Default true if property is missing
16.10.9. Default false if property is missing
16.10.10. Filter on null values
16.10.11. Filter on relationships
16.10.12. IN operator
16.11. Return
16.11.1. Return nodes
16.11.2. Return relationships
16.11.3. Return property
16.11.4. Return all elements
16.11.5. Identifier with uncommon characters
16.11.6. Column alias
16.11.7. Optional properties
16.11.8. Unique results
16.12. Aggregation
16.12.1. Introduction
16.12.2. COUNT
16.12.3. Count nodes
16.12.4. Group Count Relationship Types
16.12.5. Count entities
16.12.6. Count non-null values
16.12.7. SUM
16.12.8. AVG
16.12.9. MAX
16.12.10. MIN
16.12.11. COLLECT
16.12.12. DISTINCT
16.13. Order by
16.13.1. Order nodes by property
16.13.2. Order nodes by multiple properties
16.13.3. Order nodes in descending order
16.13.4. Ordering null
16.14. Skip
16.14.1. Skip first three
16.14.2. Return middle two
16.15. Limit
16.15.1. Return first part
16.16. With
16.16.1. Filter on aggregate function results
16.16.2. Alternative syntax of with
16.17. Create
16.17.1. Create single node
16.17.2. Create single node and set properties
16.17.3. Return created node
16.17.4. Create a relationship between two nodes
16.17.5. Create a relationship and set properties
16.17.6. Create single node from map
16.17.7. Create multiple nodes from maps
16.18. Delete
16.18.1. Delete single node
16.18.2. Remove a node and connected relationships
16.18.3. Remove a property
16.19. Set
16.19.1. Set a property
16.20. Relate
16.20.1. Create relationship if it is missing
16.20.2. Create node if missing
16.20.3. Create nodes with values
16.20.4. Create relationship with values
16.21. Foreach
16.21.1. Mark all nodes along a path
16.22. Functions
16.22.1. Predicates
16.22.2. Scalar functions
16.22.3. Iterable functions
16.22.4. Mathematical functions
16.23. Compatibility
17. Graph Algorithms
17.1. Introduction
18. Neo4j Server
18.1. Server Installation
18.1.1. As a Windows service
18.1.2. Linux Service
18.1.3. Mac OSX
18.1.4. Multiple Server instances on one machine
18.2. Server Configuration
18.2.1. Important server configurations parameters
18.2.2. Neo4j Database performance configuration
18.2.3. Server logging configuration
18.2.4. HTTP logging configuration
18.2.5. Other configuration options
18.3. Setup for remote debugging
18.4. Using the server (including web administration) with an embedded
database
18.4.1. Getting the libraries
18.4.2. Starting the Server from Java
18.4.3. Providing custom configuration
18.5. Server Performance Tuning
18.5.1. Specifying Neo4j tuning properties
18.5.2. Specifying JVM tuning properties
18.6. Server Installation in the Cloud
18.6.1. Heroku
19. REST API
19.1. Service root
19.1.1. Get service root
19.2. Nodes
19.2.1. Create Node
19.2.2. Create Node with properties
19.2.3. Get node
19.2.4. Get non-existent node
19.2.5. Delete node
19.2.6. Nodes with relationships can not be deleted
19.3. Relationships
19.3.1. Get Relationship by ID
19.3.2. Create relationship
19.3.3. Create a relationship with properties
19.3.4. Delete relationship
19.3.5. Get all properties on a relationship
19.3.6. Set all properties on a relationship
19.3.7. Get single property on a relationship
19.3.8. Set single property on a relationship
19.3.9. Get all relationships
19.3.10. Get incoming relationships
19.3.11. Get outgoing relationships
19.3.12. Get typed relationships
19.3.13. Get relationships on a node without relationships
19.4. Relationship types
19.4.1. Get relationship types
19.5. Node properties
19.5.1. Set property on node
19.5.2. Update node properties
19.5.3. Get properties for node
19.5.4. Property values can not be null
19.5.5. Property values can not be nested
19.5.6. Delete all properties from node
19.5.7. Delete a named property from a node
19.6. Relationship properties
19.6.1. Update relationship properties
19.6.2. Remove property from a relationship
19.6.3. Remove non-existent property from a relationship
19.6.4. Remove properties from a non-existing relationship
19.6.5. Remove property from a non-existing relationship
19.7. Indexes
19.7.1. Create node index
19.7.2. Create node index with configuration
19.7.3. Delete node index
19.7.4. List node indexes
19.7.5. Add node to index
19.7.6. Remove all entries with a given node from an index
19.7.7. Remove all entries with a given node and key from an index
19.7.8. Remove all entries with a given node, key and value from an
index
19.7.9. Find node by exact match
19.7.10. Find node by query
19.8. Unique Indexes
19.8.1. Create a unique node in an index
19.8.2. Create a unique node in an index (the case where it exists)
19.8.3. Add a node to an index unless a node already exists for the
given mapping
19.8.4. Create a unique relationship in an index
19.8.5. Add a relationship to an index unless a relationship already
exists for the given mapping
19.9. Automatic Indexes
19.9.1. Find node by exact match from an automatic index
19.9.2. Find node by query from an automatic index
19.10. Configurable Automatic Indexing
19.10.1. Create an auto index for nodes with specific configuration
19.10.2. Create an auto index for relationships with specific
configuration
19.10.3. Get current status for autoindexing on nodes
19.10.4. Enable node autoindexing
19.10.5. Lookup list of properties being autoindexed
19.10.6. Add a property for autoindexing on nodes
19.10.7. Remove a property for autoindexing on nodes
19.11. Traversals
19.11.1. Traversal using a return filter
19.11.2. Return relationships from a traversal
19.11.3. Return paths from a traversal
19.11.4. Traversal returning nodes below a certain depth
19.11.5. Creating a paged traverser
19.11.6. Paging through the results of a paged traverser
19.11.7. Paged traverser page size
19.11.8. Paged traverser timeout
19.12. Cypher queries
19.12.1. Send a Query
19.12.2. Return paths
19.12.3. Send queries with parameters
19.12.4. Nested results
19.12.5. Server errors
19.13. Built-in Graph Algorithms
19.13.1. Find all shortest paths
19.13.2. Find one of the shortest paths between nodes
19.13.3. Execute a Dijkstra algorithm with similar weights on
relationships
19.13.4. Execute a Dijkstra algorithm with weights on relationships
19.14. Batch operations
19.14.1. Execute multiple operations in batch
19.14.2. Refer to items created earlier in the same batch job
19.14.3. Execute multiple operations in batch streaming
19.15. Cypher Plugin
19.15.1. Send a Query
19.15.2. Return paths
19.15.3. Send queries with parameters
19.15.4. Server errors
19.16. Gremlin Plugin
19.16.1. Send a Gremlin Script - URL encoded
19.16.2. Load a sample graph
19.16.3. Sort a result using raw Groovy operations
19.16.4. Send a Gremlin Script - JSON encoded with table results
19.16.5. Returning nested pipes
19.16.6. Set script variables
19.16.7. Send a Gremlin Script with variables in a JSON Map
19.16.8. Return paths from a Gremlin script
19.16.9. Send an arbitrary Groovy script - Lucene sorting
19.16.10. Emit a sample graph
19.16.11. HyperEdges - find user roles in groups
19.16.12. Group count
19.16.13. Collect multiple traversal results
19.16.14. Collaborative filtering
19.16.15. Chunking and offsetting in Gremlin
19.16.16. Modify the graph while traversing
19.16.17. Flow algorithms with Gremlin
19.16.18. Script execution errors
20. Python embedded bindings
20.1. Installation
20.1.1. Installation on OSX/Linux
20.1.2. Installation on Windows
20.2. Core API
20.2.1. Getting started
20.2.2. Transactions
20.2.3. Nodes
20.2.4. Relationships
20.2.5. Properties
20.2.6. Paths
20.3. Indexes
20.3.1. Index management
20.3.2. Indexing things
20.3.3. Searching the index
20.4. Cypher Queries
20.4.1. Querying and reading the result
20.4.2. Parameterized and prepared queries
20.5. Traversals
20.5.1. Basic traversals
20.5.2. Traversal results
20.5.3. Uniqueness
20.5.4. Ordering
20.5.5. Evaluators - advanced filtering
Chapter 12. Capabilities
12.1. Data Security
Some data may need to be protected from unauthorized access (e.g., theft,
modification). Neo4j does not deal with data encryption explicitly, but
supports all means built into the Java programming language and the JVM to
protect data by encrypting it before storing.
Furthermore, data can be easily secured by running on an encrypted datastore at
the file system level. Finally, data protection should be considered in the
upper layers of the surrounding system in order to prevent problems with
scraping, malicious data insertion, and other threats.
12.2. Data Integrity
In order to keep data consistent, there needs to be mechanisms and structures
that guarantee the integrity of all stored data. In Neo4j, data integrity is
maintained for the core graph engine together with other data sources - see
below.
12.2.1. Core Graph Engine
In Neo4j, the whole data model is stored as a graph on disk and persisted as
part of every committed transaction. In the storage layer, Relationships,
Nodes, and Properties have direct pointers to each other. This maintains
integrity without the need for data duplication between the different backend
store files.
12.2.2. Different Data Sources
In a number of scenarios, the core graph engine is combined with other systems
in order to achieve optimal performance for non-graph lookups. For example,
Apache Lucene is frequently used as an additional index system for text queries
that would otherwise be very processing-intensive in the graph layer.
To keep these external systems in synchronization with each other, Neo4j
provides full Two Phase Commit transaction management, with rollback support
over all data sources. Thus, failed index insertions into Lucene can be
transparently rolled back in all data sources and thus keep data up-to-date.
12.3. Data Integration
Most enterprises rely primarily on relational databases to store their data,
but this may cause performance limitations. In some of these cases, Neo4j can
be used as an extension to supplement search/lookup for faster decision making.
However, in any situation where multiple data repositories contain the same
data, synchronization can be an issue.
In some applications, it is acceptable for the search platform to be slightly
out of sync with the relational database. In others, tight data integrity (eg.,
between Neo4j and RDBMS) is necessary. Typically, this has to be addressed for
data changing in real-time and for bulk data changes happening in the RDBMS.
A few strategies for synchronizing integrated data follows.
12.3.1. Event-based Synchronization
In this scenario, all data stores, both RDBMS and Neo4j, are fed with
domain-specific events via an event bus. Thus, the data held in the different
backends is not actually synchronized but rather replicated.
12.3.2. Periodic Synchronization
Another viable scenario is the periodic export of the latest changes in the
RDBMS to Neo4j via some form of SQL query. This allows a small amount of
latency in the synchronization, but has the advantage of using the RDBMS as the
master for all data purposes. The same process can be applied with Neo4j as the
master data source.
12.3.3. Periodic Full Export/Import of Data
Using the Batch Inserter tools for Neo4j, even large amounts of data can be
imported into the database in very short times. Thus, a full export from the
RDBMS and import into Neo4j becomes possible. If the propagation lag between
the RDBMS and Neo4j is not a big issue, this is a very viable solution.
12.4. Availability and Reliability
Most mission-critical systems require the database subsystem to be accessible
at all times. Neo4j ensures availability and reliability through a few
different strategies.
12.4.1. Operational Availability
In order not to create a single point of failure, Neo4j supports different
approaches which provide transparent fallback and/or recovery from failures.
12.4.1.1. Online backup (Cold spare)
In this approach, a single instance of the master database is used, with Online
Backup enabled. In case of a failure, the backup files can be mounted onto a
new Neo4j instance and reintegrated into the application.
12.4.1.2. Online Backup High Availability (Hot spare)
Here, a Neo4j "backup" instance listens to online transfers of changes from the
master. In the event of a failure of the master, the backup is already running
and can directly take over the load.
12.4.1.3. High Availability cluster
This approach uses a cluster of database instances, with one (read/write)
master and a number of (read-only) slaves. Failing slaves can simply be
restarted and brought back online. Alternatively, a new slave may be added by
cloning an existing one. Should the master instance fail, a new master will be
elected by the remaining cluster nodes.
12.4.2. Disaster Recovery/ Resiliency
In cases of a breakdown of major part of the IT infrastructure, there need to
be mechanisms in place that enable the fast recovery and regrouping of the
remaining services and servers. In Neo4j, there are different components that
are suitable to be part of a disaster recovery strategy.
12.4.2.1. Prevention
* Online Backup High Availability to other locations outside the current data
center.
* Online Backup to different file system locations: this is a simpler form of
backup, applying changes directly to backup files; it is thus more suited
for local backup scenarios.
* Neo4j High Availability cluster: a cluster of one write-master Neo4j server
and a number of read-slaves, getting transaction logs from the master.
Write-master failover is handled by quorum election among the read-slaves
for a new master.
12.4.2.2. Detection
* SNMP and JMX monitoring can be used for the Neo4j database.
12.4.2.3. Correction
* Online Backup: A new Neo4j server can be started directly on the backed-up
files and take over new requests.
* Neo4j High Availability cluster: A broken Neo4j read slave can be
reinserted into the cluster, getting the latest updates from the master.
Alternatively, a new server can be inserted by copying an existing server
and applying the latest updates to it.
12.5. Capacity
12.5.1. File Sizes
Neo4j relies on Java’s Non-blocking I/O subsystem for all file handling.
Furthermore, while the storage file layout is optimized for interconnected
data, Neo4j does not require raw devices. Thus, filesizes are only limited by
the underlying operating system’s capacity to handle large files. Physically,
there is no built-in limit of the file handling capacity in Neo4j.
Neo4j tries to memory-map as much of the underlying store files as possible. If
the available RAM is not sufficient to keep all data in RAM, Neo4j will use
buffers in some cases, reallocating the memory-mapped high-performance I/O
windows to the regions with the most I/O activity dynamically. Thus, ACID speed
degrades gracefully as RAM becomes the limiting factor.
12.5.2. Read speed
Enterprises want to optimize the use of hardware to deliver the maximum
business value from available resources. Neo4j’s approach to reading data
provides the best possible usage of all available hardware resources. Neo4j
does not block or lock any read operations; thus, there is no danger for
deadlocks in read operations and no need for read transactions. With a threaded
read access to the database, queries can be run simultaneously on as many
processors as may be available. This provides very good scale-up scenarios with
bigger servers.
12.5.3. Write speed
Write speed is a consideration for many enterprise applications. However, there
are two different scenarios:
1. sustained continuous operation and
2. bulk access (e.g., backup, initial or batch loading).
To support the disparate requirements of these scenarios, Neo4j supports two
modes of writing to the storage layer.
In transactional, ACID-compliant normal operation, isolation level is
maintained and read operations can occur at the same time as the writing
process. At every commit, the data is persisted to disk and can be recovered to
a consistent state upon system failures. This requires disk write access and a
real flushing of data. Thus, the write speed of Neo4j on a single server in
continuous mode is limited by the I/O capacity of the hardware. Consequently,
the use of fast SSDs is highly recommended for production scenarios.
Neo4j has a Batch Inserter that operates directly on the store files. This mode
does not provide transactional security, so it can only be used when there is a
single write thread. Because data is written sequentially, and never flushed to
the logical logs, huge performance boosts are achieved. The Batch Inserter is
optimized for non-transactional bulk import of large amounts of data.
12.5.4. Data size
In Neo4j, data size is mainly limited by the address space of the primary keys
for Nodes, Relationships, Properties and RelationshipTypes. Currently, the
address space is as follows:
* 2ˆ35 (~ 34 billion) nodes
* 2ˆ35 (~ 34 billion) relationships
* 2ˆ36 (~ 68 billion) properties
* 2ˆ15 (~ 32 000) relationship types
Chapter 13. Transaction Management
In order to fully maintain data integrity and ensure good transactional
behavior, Neo4j supports the ACID properties:
* atomicity - if any part of a transaction fails, the database state is left
unchanged
* consistency - any transaction will leave the database in a consistent state
* isolation - during a transaction, modified data cannot be accessed by other
operations
* durability - the DBMS can always recover the results of a committed
transaction
Specifically:
* All modifications to Neo4j data must be wrapped in transactions.
* The default isolation level is READ_COMMITTED.
* Data retrieved by traversals is not protected from modification by other
transactions.
* Non-repeatable reads may occur (i.e., only write locks are acquired and
held until the end of the transaction).
* One can manually acquire write locks on nodes and relationships to achieve
higher level of isolation (SERIALIZABLE).
* Locks are acquired at the Node and Relationship level.
* Deadlock detection is built into the core transaction management.
13.1. Interaction cycle
All write operations that work with the graph must be performed in a
transaction. Transactions are thread confined and can be nested as “flat nested
transactions”. Flat nested transactions means that all nested transactions are
added to the scope of the top level transaction. A nested transaction can mark
the top level transaction for rollback, meaning the entire transaction will be
rolled back. To only rollback changes made in a nested transaction is not
possible.
When working with transactions the interaction cycle looks like this:
1. Begin a transaction.
2. Operate on the graph performing write operations.
3. Mark the transaction as successful or not.
4. Finish the transaction.
It is very important to finish each transaction. The transaction will not
release the locks or memory it has acquired until it has been finished. The
idiomatic use of transactions in Neo4j is to use a try-finally block, starting
the transaction and then try to perform the write operations. The last
operation in the try block should mark the transaction as successful while the
finally block should finish the transaction. Finishing the transaction will
perform commit or rollback depending on the success status.
Caution
All modifications performed in a transaction are kept in memory. This means
that very large updates have to be split into several top level transactions to
avoid running out of memory. It must be a top level transaction since splitting
up the work in many nested transactions will just add all the work to the top
level transaction.
In an environment that makes use of thread pooling other errors may occur when
failing to finish a transaction properly. Consider a leaked transaction that
did not get finished properly. It will be tied to a thread and when that thread
gets scheduled to perform work starting a new (what looks to be a) top level
transaction it will actually be a nested transaction. If the leaked transaction
state is “marked for rollback” (which will happen if a deadlock was detected)
no more work can be performed on that transaction. Trying to do so will result
in error on each call to a write operation.
13.2. Isolation levels
By default a read operation will read the last committed value unless a local
modification within the current transaction exist. The default isolation level
is very similar to READ_COMMITTED: reads do not block or take any locks so
non-repeatable reads can occur. It is possible to achieve a stronger isolation
level (such as REPETABLE_READ and SERIALIZABLE) by manually acquiring read and
write locks.
13.3. Default locking behavior
* When adding, changing or removing a property on a node or relationship a
write lock will be taken on the specific node or relationship.
* When creating or deleting a node a write lock will be taken for the
specific node.
* When creating or deleting a relationship a write lock will be taken on the
specific relationship and both its nodes.
The locks will be added to the transaction and released when the transaction
finishes.
13.4. Deadlocks
Since locks are used it is possible for deadlocks to happen. Neo4j will however
detect any deadlock (caused by acquiring a lock) before they happen and throw
an exception. Before the exception is thrown the transaction is marked for
rollback. All locks acquired by the transaction are still being held but will
be released when the transaction is finished (in the finally block as pointed
out earlier). Once the locks are released other transactions that were waiting
for locks held by the transaction causing the deadlock can proceed. The work
performed by the transaction causing the deadlock can then be retried by the
user if needed.
Experiencing frequent deadlocks is an indication of concurrent write requests
happening in such a way that it is not possible to execute them while at the
same time live up to the intended isolation and consistency. The solution is to
make sure concurrent updates happen in a reasonable way. For example given two
specific nodes (A and B), adding or deleting relationships to both these nodes
in random order for each transaction will result in deadlocks when there are
two or more transactions doing that concurrently. One solution is to make sure
that updates always happens in the same order (first A then B). Another
solution is to make sure that each thread/transaction does not have any
conflicting writes to a node or relationship as some other concurrent
transaction. This can for example be achieved by letting a single thread do all
updates of a specific type.
Important
Deadlocks caused by the use of other synchronization than the locks managed by
Neo4j can still happen. Since all operations in the Neo4j API are thread safe
unless specified otherwise, there is no need for external synchronization.
Other code that requires synchronization should be synchronized in such a way
that it never performs any Neo4j operation in the synchronized block.
13.5. Delete semantics
When deleting a node or a relationship all properties for that entity will be
automatically removed but the relationships of a node will not be removed.
Caution
Neo4j enforces a constraint (upon commit) that all relationships must have a
valid start node and end node. In effect this means that trying to delete a
node that still has relationships attached to it will throw an exception upon
commit. It is however possible to choose in which order to delete the node and
the attached relationships as long as no relationships exist when the
transaction is committed.
The delete semantics can be summarized in the following bullets:
* All properties of a node or relationship will be removed when it is
deleted.
* A deleted node can not have any attached relationships when the transaction
commits.
* It is possible to acquire a reference to a deleted relationship or node
that has not yet been committed.
* Any write operation on a node or relationship after it has been deleted
(but not yet committed) will throw an exception
* After commit trying to acquire a new or work with an old reference to a
deleted node or relationship will throw an exception.
13.6. Creating unique nodes
In many use cases, a certain level of uniqueness is desired among entities. You
could for instance imagine that only one user with a certain e-mail address may
exist in a system. If multiple concurrent threads naively try to create the
user, duplicates will be created. There are three main strategies for ensuring
uniqueness, and they all work across HA and single-instance deployments.
13.6.1. Single thread
By using a single thread, no two threads will even try to create a particular
entity simultaneously. On HA, an external single-threaded client can perform
the operations on the cluster.
13.6.2. Get or create
By using put-if-absent functionality,
entity uniqueness can be guaranteed using an index. Here the index acts as the
lock and will only lock the smallest part needed to guaranteed uniqueness
across threads and transactions. To get the more high-level get-or-create
functionality make use of UniqueFactory as seen in the
example below.
Example code:
public Node getOrCreateUserWithUniqueFactory( String username, GraphDatabaseService graphDb )
{
UniqueFactory factory = new UniqueFactory.UniqueNodeFactory( graphDb, "users" )
{
@Override
protected void initialize( Node created, Map properties )
{
created.setProperty( "name", properties.get( "name" ) );
}
};
return factory.getOrCreate( "name", username );
}
13.6.3. Pessimistic locking
Important
While this is a working solution, please consider using the preferred
Section 13.6.2, “Get or create” instead.
By using explicit, pessimistic locking, unique creation of entities can be
achieved in a multi-threaded environment. It is most commonly done by locking
on a single or a set of common nodes.
One might be tempted to use Java synchronization for this, but it is dangerous.
By mixing locks in the Neo4j kernel and in the Java runtime, it is easy to
produce deadlocks that are not detectable by Neo4j. As long as all locking is
done by Neo4j, all deadlocks will be detected and avoided. Also, a solution
using manual synchronization doesn’t ensure uniqueness in an HA environment.
Example code:
public Node getOrCreateUserPessimistically( String username, GraphDatabaseService graphDb, Node lockNode )
{
Index usersIndex = graphDb.index().forNodes( "users" );
Node userNode = usersIndex.get( "name", username ).getSingle();
if ( userNode != null ) return userNode;
Transaction tx = graphDb.beginTx();
try
{
tx.acquireWriteLock( lockNode );
userNode = usersIndex.get( "name", username ).getSingle();
if ( userNode == null )
{
userNode = graphDb.createNode();
userNode.setProperty( "name", username );
usersIndex.add( userNode, "name", username );
}
tx.success();
return userNode;
}
finally
{
tx.finish();
}
}
13.7. Transaction events
Transaction event handlers can be registered to receive Neo4j Transaction
events. Once it has been registered at a GraphDatabaseService instance it will
receive events about what has happened in each transaction which is about to be
committed. Handlers won’t get notified about transactions which haven’t
performed any write operation or won’t be committed (either if Transaction#
success() hasn’t been called or the transaction has been marked as failed
Transaction#failure(). Right before a transaction is about to be committed the
beforeCommit method is called with the entire diff of modifications made in the
transaction. At this point the transaction is still running so changes can
still be made. However there’s no guarantee that other handlers will see such
changes since the order in which handlers are executed is undefined. This
method can also throw an exception and will, in such a case, prevent the
transaction from being committed (where a call to afterRollback will follow).
If beforeCommit is successfully executed the transaction will be committed and
the afterCommit method will be called with the same transaction data as well as
the object returned from beforeCommit. This assumes that all other handlers (if
more were registered) also executed beforeCommit successfully.
Chapter 14. Data Import
For high-performance data import, the batch insert facilities described in this
chapter are recommended.
Other ways to import data into Neo4j include using Gremlin graph import (see
Section 19.16.2, “Load a sample graph”) or using the Geoff notation (see http:/
/geoff.nigelsmall.net/ ).
14.1. Batch Insertion
Neo4j has a batch insertion facility intended for initial imports, which
bypasses transactions and other checks in favor of performance. This is useful
when you have a big dataset that needs to be loaded once.
Batch insertion is inlcuded in the neo4j-kernel component, which is part of all Neo4j distributions
and editions.
Be aware of the following points when using batch insertion:
* The intended use is for initial import of data.
* Batch insertion is not thread safe.
* Batch insertion is non-transactional.
* Unless shutdown is successfully invoked at the end of the import, the
database files will be corrupt.
Warning
Always perform batch insertion in a single thread (or use synchronization to
make only one thread at a time access the batch inserter) and invoke shutdown
when finished.
14.1.1. Batch Inserter Examples
Creating a batch inserter is similar to how you normally create data in the
database, but in this case the low-level BatchInserter interface is used. As we have already pointed out, you
can’t have multiple threads using the batch inserter concurrently without
external synchronization.
Tip
The source code of the examples is found here: BatchInsertExampleTest.java
To get hold of a BatchInseter, use BatchInserters
and then go from there:
BatchInserter inserter = BatchInserters.inserter( "target/batchinserter-example" );
Map properties = new HashMap();
properties.put( "name", "Mattias" );
long mattiasNode = inserter.createNode( properties );
properties.put( "name", "Chris" );
long chrisNode = inserter.createNode( properties );
RelationshipType knows = DynamicRelationshipType.withName( "KNOWS" );
// To set properties on the relationship, use a properties map
// instead of null as the last parameter.
inserter.createRelationship( mattiasNode, chrisNode, knows, null );
inserter.shutdown();
To gain good performance you probably want to set some configuration settings
for the batch inserter. Read Section 22.7.2, “Batch insert example” for
information on configuring a batch inserter. This is how to start a batch
inserter with configuration options:
Map config = new HashMap();
config.put( "neostore.nodestore.db.mapped_memory", "90M" );
BatchInserter inserter = BatchInserters.inserter(
"target/batchinserter-example-config", config );
// Insert data here ... and then shut down:
inserter.shutdown();
In case you have stored the configuration in a file, you can load it like this:
Map config = MapUtil.load( new File(
"target/batchinsert-config" ) );
BatchInserter inserter = BatchInserters.inserter(
"target/batchinserter-example-config", config );
// Insert data here ... and then shut down:
inserter.shutdown();
14.1.2. Batch Graph Database
In case you already have code for data import written against the normal Neo4j
API, you could consider using a batch inserter exposing that API.
Note
This will not perform as good as using the BatchInserter API directly.
Also be aware of the following:
* Starting a transaction or invoking Transaction.finish() or
Transaction.success() will do nothing.
* Invoking the Transaction.failure() method will generate a NotInTransaction
exception.
* Node.delete() and Node.traverse() are not supported.
* Relationship.delete() is not supported.
* Event handlers and indexes are not supported.
* GraphDatabaseService.getRelationshipTypes(), getAllNodes() and
getAllRelationships() are not supported.
With these precautions in mind, this is how to do it:
GraphDatabaseService batchDb =
BatchInserters.batchDatabase( "target/batchdb-example" );
Node mattiasNode = batchDb.createNode();
mattiasNode.setProperty( "name", "Mattias" );
Node chrisNode = batchDb.createNode();
chrisNode.setProperty( "name", "Chris" );
RelationshipType knows = DynamicRelationshipType.withName( "KNOWS" );
mattiasNode.createRelationshipTo( chrisNode, knows );
batchDb.shutdown();
Tip
The source code of the example is found here: BatchInsertExampleTest.java
14.1.3. Index Batch Insertion
For general notes on batch insertion, see Section 14.1, “Batch Insertion”.
Indexing during batch insertion is done using BatchInserterIndex which are provided via BatchInserterIndexProvider
. An example:
BatchInserter inserter = BatchInserters.inserter( "target/neo4jdb-batchinsert" );
BatchInserterIndexProvider indexProvider =
new LuceneBatchInserterIndexProvider( inserter );
BatchInserterIndex actors =
indexProvider.nodeIndex( "actors", MapUtil.stringMap( "type", "exact" ) );
actors.setCacheCapacity( "name", 100000 );
Map properties = MapUtil.map( "name", "Keanu Reeves" );
long node = inserter.createNode( properties );
actors.add( node, properties );
//make the changes visible for reading, use this sparsely, requires IO!
actors.flush();
// Make sure to shut down the index provider as well
indexProvider.shutdown();
inserter.shutdown();
The configuration parameters are the same as mentioned in Section 15.10,
“Configuration and fulltext indexes”.
14.1.3.1. Best practices
Here are some pointers to get the most performance out of BatchInserterIndex:
* Try to avoid flushing
too often because each flush will result in all additions (since last
flush) to be visible to the querying methods, and publishing those changes
can be a performance penalty.
* Have (as big as possible) phases where one phase is either only writes or
only reads, and don’t forget to flush after a write phase so that those
changes becomes visible to the querying methods.
* Enable caching for keys you know you’re
going to do lookups for later on to increase performance significantly
(though insertion performance may degrade slightly).
Note
Changes to the index are available for reading first after they are flushed to
disk. Thus, for optimal performance, read and lookup operations should be kept
to a minimum during batchinsertion since they involve IO and impact speed
negatively.
Chapter 15. Indexing
Indexing in Neo4j can be done in two different ways:
1. The database itself is a natural index consisting of its relationships of
different types between nodes. For example a tree structure can be layered
on top of the data and used for index lookups performed by a traverser.
2. Separate index engines can be used, with Apache Lucene being the default backend included
with Neo4j.
This chapter demonstrate how to use the second type of indexing, focusing on
Lucene.
15.1. Introduction
Indexing operations are part of the Neo4j index API .
Each index is tied to a unique, user-specified name (for example "first_name"
or "books") and can index either nodes or relationships .
The default index implementation is provided by the neo4j-lucene-index
component, which is included in the standard Neo4j download. It can also be
downloaded separately from http://repo1.maven.org/maven2/org/neo4j/
neo4j-lucene-index/ . For Maven users, the neo4j-lucene-index component has the coordinates
org.neo4j:neo4j-lucene-index and should be used with the same version of
org.neo4j:neo4j-kernel. Different versions of the index and kernel components
are not compatible in the general case. Both components are included
transitively by the org.neo4j:neo4j:pom artifact which makes it simple to keep
the versions in sync.
For initial import of data using indexes, see Section 14.1.3, “Index Batch
Insertion”.
Note
All modifying index operations must be performed inside a transaction, as with
any mutating operation in Neo4j.
15.2. Create
An index is created if it doesn’t exist when you ask for it. Unless you give it
a custom configuration, it will be created with default configuration and
backend.
To set the stage for our examples, let’s create some indexes to begin with:
IndexManager index = graphDb.index();
Index actors = index.forNodes( "actors" );
Index movies = index.forNodes( "movies" );
RelationshipIndex roles = index.forRelationships( "roles" );
This will create two node indexes and one relationship index with default
configuration. See Section 15.8, “Relationship indexes” for more information
specific to relationship indexes.
See Section 15.10, “Configuration and fulltext indexes” for how to create
fulltext indexes.
You can also check if an index exists like this:
IndexManager index = graphDb.index();
boolean indexExists = index.existsForNodes( "actors" );
15.3. Delete
Indexes can be deleted. When deleting, the entire contents of the index will be
removed as well as its associated configuration. A new index can be created
with the same name at a later point in time.
IndexManager index = graphDb.index();
Index actors = index.forNodes( "actors" );
actors.delete();
Note that the actual deletion of the index is made during the commit of the
surrounding transaction. Calls made to such an index instance after delete()
has been called are invalid inside that transaction
as well as outside (if the transaction is successful), but will become valid
again if the transaction is rolled back.
15.4. Add
Each index supports associating any number of key-value pairs with any number
of entities (nodes or relationships), where each association between entity and
key-value pair is performed individually. To begin with, let’s add a few nodes
to the indexes:
// Actors
Node reeves = graphDb.createNode();
reeves.setProperty( "name", "Keanu Reeves" );
actors.add( reeves, "name", reeves.getProperty( "name" ) );
Node bellucci = graphDb.createNode();
bellucci.setProperty( "name", "Monica Bellucci" );
actors.add( bellucci, "name", bellucci.getProperty( "name" ) );
// multiple values for a field, in this case for search only
// and not stored as a property.
actors.add( bellucci, "name", "La Bellucci" );
// Movies
Node theMatrix = graphDb.createNode();
theMatrix.setProperty( "title", "The Matrix" );
theMatrix.setProperty( "year", 1999 );
movies.add( theMatrix, "title", theMatrix.getProperty( "title" ) );
movies.add( theMatrix, "year", theMatrix.getProperty( "year" ) );
Node theMatrixReloaded = graphDb.createNode();
theMatrixReloaded.setProperty( "title", "The Matrix Reloaded" );
theMatrixReloaded.setProperty( "year", 2003 );
movies.add( theMatrixReloaded, "title", theMatrixReloaded.getProperty( "title" ) );
movies.add( theMatrixReloaded, "year", 2003 );
Node malena = graphDb.createNode();
malena.setProperty( "title", "Malèna" );
malena.setProperty( "year", 2000 );
movies.add( malena, "title", malena.getProperty( "title" ) );
movies.add( malena, "year", malena.getProperty( "year" ) );
Note that there can be multiple values associated with the same entity and key.
Next up, we’ll create relationships and index them as well:
// we need a relationship type
DynamicRelationshipType ACTS_IN = DynamicRelationshipType.withName( "ACTS_IN" );
// create relationships
Relationship role1 = reeves.createRelationshipTo( theMatrix, ACTS_IN );
role1.setProperty( "name", "Neo" );
roles.add( role1, "name", role1.getProperty( "name" ) );
Relationship role2 = reeves.createRelationshipTo( theMatrixReloaded, ACTS_IN );
role2.setProperty( "name", "Neo" );
roles.add( role2, "name", role2.getProperty( "name" ) );
Relationship role3 = bellucci.createRelationshipTo( theMatrixReloaded, ACTS_IN );
role3.setProperty( "name", "Persephone" );
roles.add( role3, "name", role3.getProperty( "name" ) );
Relationship role4 = bellucci.createRelationshipTo( malena, ACTS_IN );
role4.setProperty( "name", "Malèna Scordia" );
roles.add( role4, "name", role4.getProperty( "name" ) );
After these operations, our example graph looks like this:
Figure 15.1. Movie and Actor Graph
Movie-and-Actor-Graph-initial.svg
15.5. Remove
Removing
from an index is similar to adding, but can be done by supplying one of the
following combinations of arguments:
* entity
* entity, key
* entity, key, value
// completely remove bellucci from the actors index
actors.remove( bellucci );
// remove any "name" entry of bellucci from the actors index
actors.remove( bellucci, "name" );
// remove the "name" -> "La Bellucci" entry of bellucci
actors.remove( bellucci, "name", "La Bellucci" );
15.6. Update
Important
To update an index entry, the old one must be removed and a new one added. For
details on removing index entries, see Section 15.5, “Remove”.
Remember that a node or relationship can be associated with any number of
key-value pairs in an index. This means that you can index a node or
relationship with many key-value pairs that have the same key. In the case
where a property value changes and you’d like to update the index, it’s not
enough to just index the new value - you’ll have to remove the old value as
well.
Here’s a code example for that demonstrates how it’s done:
// create a node with a property
// so we have something to update later on
Node fishburn = graphDb.createNode();
fishburn.setProperty( "name", "Fishburn" );
// index it
actors.add( fishburn, "name", fishburn.getProperty( "name" ) );
// update the index entry
// when the property value changes
actors.remove( fishburn, "name", fishburn.getProperty( "name" ) );
fishburn.setProperty( "name", "Laurence Fishburn" );
actors.add( fishburn, "name", fishburn.getProperty( "name" ) );
15.7. Search
An index can be searched in two ways, get and query . The get method
will return exact matches to the given key-value pair, whereas query exposes
querying capabilities directly from the backend used by the index. For example
the Lucene query syntax can be used directly with the default indexing backend.
15.7.1. Get
This is how to search for a single exact match:
IndexHits hits = actors.get( "name", "Keanu Reeves" );
Node reeves = hits.getSingle();
IndexHits is an Iterable with some additional useful
methods. For example getSingle()
returns the first and only item from the result iterator, or null if there
isn’t any hit.
Here’s how to get a single relationship by exact matching and retrieve its
start and end nodes:
Relationship persephone = roles.get( "name", "Persephone" ).getSingle();
Node actor = persephone.getStartNode();
Node movie = persephone.getEndNode();
Finally, we can iterate over all exact matches from a relationship index:
for ( Relationship role : roles.get( "name", "Neo" ) )
{
// this will give us Reeves twice
Node reeves = role.getStartNode();
}
Important
In you don’t iterate through all the hits, IndexHits.close() must be called explicitly.
15.7.2. Query
There are two query methods, one which uses a key-value signature where the
value represents a query for values with the given key only. The other method
is more generic and supports querying for more than one key-value pair in the
same query.
Here’s an example using the key-query option:
for ( Node actor : actors.query( "name", "*e*" ) )
{
// This will return Reeves and Bellucci
}
In the following example the query uses multiple keys:
for ( Node movie : movies.query( "title:*Matrix* AND year:1999" ) )
{
// This will return "The Matrix" from 1999 only.
}
Note
Beginning a wildcard search with "*" or "?" is discouraged by Lucene, but will
nevertheless work.
Caution
You can’t have any whitespace in the search term with this syntax. See
Section 15.11.3, “Querying with Lucene Query objects” for how to do that.
15.8. Relationship indexes
An index for relationships is just like an index for nodes, extended by
providing support to constrain a search to relationships with a specific start
and/or end nodes These extra methods reside in the RelationshipIndex interface which extends Index .
Example of querying a relationship index:
// find relationships filtering on start node
// using exact matches
IndexHits reevesAsNeoHits;
reevesAsNeoHits = roles.get( "name", "Neo", reeves, null );
Relationship reevesAsNeo = reevesAsNeoHits.iterator().next();
reevesAsNeoHits.close();
// find relationships filtering on end node
// using a query
IndexHits matrixNeoHits;
matrixNeoHits = roles.query( "name", "*eo", null, theMatrix );
Relationship matrixNeo = matrixNeoHits.iterator().next();
matrixNeoHits.close();
And here’s an example for the special case of searching for a specific
relationship type:
// find relationships filtering on end node
// using a relationship type.
// this is how to add it to the index:
roles.add( reevesAsNeo, "type", reevesAsNeo.getType().name() );
// Note that to use a compound query, we can't combine committed
// and uncommitted index entries, so we'll commit before querying:
tx.success();
tx.finish();
// and now we can search for it:
IndexHits typeHits;
typeHits = roles.query( "type:ACTS_IN AND name:Neo", null, theMatrix );
Relationship typeNeo = typeHits.iterator().next();
typeHits.close();
Such an index can be useful if your domain has nodes with a very large number
of relationships between them, since it reduces the search time for a
relationship between two nodes. A good example where this approach pays
dividends is in time series data, where we have readings represented as a
relationship per occurrence.
15.9. Scores
The IndexHits interface exposes scoring
so that the index can communicate scores for the hits. Note that the result is
not sorted by the score unless you explicitly specify that. See
Section 15.11.2, “Sorting” for how to sort by score.
IndexHits hits = movies.query( "title", "The*" );
for ( Node movie : hits )
{
System.out.println( movie.getProperty( "title" ) + " " + hits.currentScore() );
}
15.10. Configuration and fulltext indexes
At the time of creation extra configuration can be specified to control the
behavior of the index and which backend to use. For example to create a Lucene
fulltext index:
IndexManager index = graphDb.index();
Index fulltextMovies = index.forNodes( "movies-fulltext",
MapUtil.stringMap( IndexManager.PROVIDER, "lucene", "type", "fulltext" ) );
fulltextMovies.add( theMatrix, "title", "The Matrix" );
fulltextMovies.add( theMatrixReloaded, "title", "The Matrix Reloaded" );
// search in the fulltext index
Node found = fulltextMovies.query( "title", "reloAdEd" ).getSingle();
Here’s an example of how to create an exact index which is case-insensitive:
Index index = graphDb.index().forNodes( "my-case-insensitive-index",
stringMap( "analyzer", LowerCaseKeywordAnalyzer.class.getName() ) );
Node node = graphDb.createNode();
index.add( node, "name", "Thomas Anderson" );
assertContains( index.query( "name", "\"Thomas Anderson\"" ), node );
assertContains( index.query( "name", "\"thoMas ANDerson\"" ), node );
Tip
In order to search for tokenized words, the query method has to be used. The
get method will only match the full string value, not the tokens.
The configuration of the index is persisted once the index has been created.
The provider configuration key is interpreted by Neo4j, but any other
configuration is passed onto the backend index (e.g. Lucene) to interpret.
Table 15.1. Lucene indexing configuration parameters
Parameter Possible values Effect
type exact, fulltext exact is the default and uses a
Lucene keyword analyzer . fulltext uses
a white-space tokenizer in its
analyzer.
to_lower_case true, false This parameter goes together with
type: fulltext and converts values to
lower case during both additions and
querying, making the index case
insensitive. Defaults to true.
analyzer the full class name of an Overrides the type so that a custom
Analyzer indexed tokens, string queries will
not match as expected.
15.11. Extra features for Lucene indexes
15.11.1. Numeric ranges
Lucene supports smart indexing of numbers, querying for ranges and sorting such
results, and so does its backend for Neo4j. To mark a value so that it is
indexed as a numeric value, we can make use of the ValueContext class, like this:
movies.add( theMatrix, "year-numeric", new ValueContext( 1999 ).indexNumeric() );
movies.add( theMatrixReloaded, "year-numeric", new ValueContext( 2003 ).indexNumeric() );
movies.add( malena, "year-numeric", new ValueContext( 2000 ).indexNumeric() );
int from = 1997;
int to = 1999;
hits = movies.query( QueryContext.numericRange( "year-numeric", from, to ) );
Note
The same type must be used for indexing and querying. That is, you can’t index
a value as a Long and then query the index using an Integer.
By giving null as from/to argument, an open ended query is created. In the
following example we are doing that, and have added sorting to the query as
well:
hits = movies.query(
QueryContext.numericRange( "year-numeric", from, null )
.sortNumeric( "year-numeric", false ) );
From/to in the ranges defaults to be inclusive, but you can change this
behavior by using two extra parameters:
movies.add( theMatrix, "score", new ValueContext( 8.7 ).indexNumeric() );
movies.add( theMatrixReloaded, "score", new ValueContext( 7.1 ).indexNumeric() );
movies.add( malena, "score", new ValueContext( 7.4 ).indexNumeric() );
// include 8.0, exclude 9.0
hits = movies.query( QueryContext.numericRange( "score", 8.0, 9.0, true, false ) );
15.11.2. Sorting
Lucene performs sorting very well, and that is also exposed in the index
backend, through the QueryContext class:
hits = movies.query( "title", new QueryContext( "*" ).sort( "title" ) );
for ( Node hit : hits )
{
// all movies with a title in the index, ordered by title
}
// or
hits = movies.query( new QueryContext( "title:*" ).sort( "year", "title" ) );
for ( Node hit : hits )
{
// all movies with a title in the index, ordered by year, then title
}
We sort the results by relevance (score) like this:
hits = movies.query( "title", new QueryContext( "The*" ).sortByScore() );
for ( Node movie : hits )
{
// hits sorted by relevance (score)
}
15.11.3. Querying with Lucene Query objects
Instead of passing in Lucene query syntax queries, you can instantiate such
queries programmatically and pass in as argument, for example:
// a TermQuery will give exact matches
Node actor = actors.query( new TermQuery( new Term( "name", "Keanu Reeves" ) ) ).getSingle();
Note that the TermQuery is basically the same thing as using the
get method on the index.
This is how to perform wildcard searches using Lucene Query Objects:
hits = movies.query( new WildcardQuery( new Term( "title", "The Matrix*" ) ) );
for ( Node movie : hits )
{
System.out.println( movie.getProperty( "title" ) );
}
Note that this allows for whitespace in the search string.
15.11.4. Compound queries
Lucene supports querying for multiple terms in the same query, like so:
hits = movies.query( "title:*Matrix* AND year:1999" );
Caution
Compound queries can’t search across committed index entries and those who
haven’t got committed yet at the same time.
15.11.5. Default operator
The default operator (that is whether AND or OR is used in between different
terms) in a query is OR. Changing that behavior is also done via the
QueryContext class:
QueryContext query = new QueryContext( "title:*Matrix* year:1999" )
.defaultOperator( Operator.AND );
hits = movies.query( query );
15.11.6. Caching
If your index lookups becomes a performance bottle neck, caching can be enabled
for certain keys in certain indexes (key locations) to speed up get requests.
The caching is implemented with an LRU cache so that only the most recently
accessed results are cached (with "results" meaning a query result of a get
request, not a single entity). You can control the size of the cache (the
maximum number of results) per index key.
Index index = graphDb.index().forNodes( "actors" );
( (LuceneIndex) index ).setCacheCapacity( "name", 300000 );
Caution
This setting is not persisted after shutting down the database. This means: set
this value after each startup of the database if you want to keep it.
15.12. Automatic Indexing
Neo4j provides a single index for nodes and one for relationships in each
database that automatically follow property values as they are added, deleted
and changed on database primitives. This functionality is called auto indexing
and is controlled both from the database configuration Map and through its own
API.
Caution
This is an experimental feature. Expect changes in the API and do not rely on
it for production data handling.
15.12.1. Configuration
By default Auto Indexing is off for both Nodes and Relationships. To enable it
on database startup set the configuration options Config.NODE_AUTO_INDEXING and
Config.RELATIONSHIP_AUTO_INDEXING to the string "true".
If you just enable auto indexing as above, then still no property will be auto
indexed. To define which property names you want the auto indexer to monitor as
a configuration parameter, set the Config.{NODE,RELATIONSHIP}_KEYS_INDEXABLE
option to a String that is a comma separated concatenation of the property
names you want auto indexed.
/*
* Creating the configuration, adding nodeProp1 and nodeProp2 as
* auto indexed properties for Nodes and relProp1 and relProp2 as
* auto indexed properties for Relationships. Only those will be
* indexed. We also have to enable auto indexing for both these
* primitives explicitly.
*/
GraphDatabaseService graphDb = new GraphDatabaseFactory().
newEmbeddedDatabaseBuilder( storeDirectory ).
setConfig( GraphDatabaseSettings.node_keys_indexable, "nodeProp1,nodeProp2" ).
setConfig( GraphDatabaseSettings.relationship_keys_indexable, "relProp1,relProp2" ).
setConfig( GraphDatabaseSettings.node_auto_indexing, GraphDatabaseSetting.TRUE ).
setConfig( GraphDatabaseSettings.relationship_auto_indexing, GraphDatabaseSetting.TRUE ).
newGraphDatabase();
Transaction tx = graphDb.beginTx();
Node node1 = null, node2 = null;
Relationship rel = null;
try
{
// Create the primitives
node1 = graphDb.createNode();
node2 = graphDb.createNode();
rel = node1.createRelationshipTo( node2,
DynamicRelationshipType.withName( "DYNAMIC" ) );
// Add indexable and non-indexable properties
node1.setProperty( "nodeProp1", "nodeProp1Value" );
node2.setProperty( "nodeProp2", "nodeProp2Value" );
node1.setProperty( "nonIndexed", "nodeProp2NonIndexedValue" );
rel.setProperty( "relProp1", "relProp1Value" );
rel.setProperty( "relPropNonIndexed", "relPropValueNonIndexed" );
// Make things persistent
tx.success();
}
catch ( Exception e )
{
tx.failure();
}
finally
{
tx.finish();
}
15.12.2. Search
The usefulness of the auto indexing functionality comes of course from the
ability to actually query the index and retrieve results. To that end, you can
acquire a ReadableIndex object from the AutoIndexer that exposes all the query
and get methods of a full Index with exactly the same
functionality. Continuing from the previous example, accessing the index is
done like this:
// Get the Node auto index
ReadableIndex autoNodeIndex = graphDb.index()
.getNodeAutoIndexer()
.getAutoIndex();
// node1 and node2 both had auto indexed properties, get them
assertEquals( node1,
autoNodeIndex.get( "nodeProp1", "nodeProp1Value" ).getSingle() );
assertEquals( node2,
autoNodeIndex.get( "nodeProp2", "nodeProp2Value" ).getSingle() );
// node2 also had a property that should be ignored.
assertFalse( autoNodeIndex.get( "nonIndexed",
"nodeProp2NonIndexedValue" ).hasNext() );
// Get the relationship auto index
ReadableIndex autoRelIndex = graphDb.index()
.getRelationshipAutoIndexer()
.getAutoIndex();
// One property was set for auto indexing
assertEquals( rel,
autoRelIndex.get( "relProp1", "relProp1Value" ).getSingle() );
// The rest should be ignored
assertFalse( autoRelIndex.get( "relPropNonIndexed",
"relPropValueNonIndexed" ).hasNext() );
15.12.3. Runtime Configuration
The same options that are available during database creation via the
configuration can also be set during runtime via the AutoIndexer API.
Gaining access to the AutoIndexer API and adding two Node and one Relationship
properties to auto index is done like so:
// Start without any configuration
GraphDatabaseService graphDb = new GraphDatabaseFactory().
newEmbeddedDatabase( storeDirectory );
// Get the Node AutoIndexer, set nodeProp1 and nodeProp2 as auto
// indexed.
AutoIndexer nodeAutoIndexer = graphDb.index()
.getNodeAutoIndexer();
nodeAutoIndexer.startAutoIndexingProperty( "nodeProp1" );
nodeAutoIndexer.startAutoIndexingProperty( "nodeProp2" );
// Get the Relationship AutoIndexer
AutoIndexer relAutoIndexer = graphDb.index()
.getRelationshipAutoIndexer();
relAutoIndexer.startAutoIndexingProperty( "relProp1" );
// None of the AutoIndexers are enabled so far. Do that now
nodeAutoIndexer.setEnabled( true );
relAutoIndexer.setEnabled( true );
Parameters to the AutoIndexers passed through the Configuration and settings
made through the API are cumulative. So you can set some beforehand known
settings, do runtime checks to augment the initial configuration and then
enable the desired auto indexers - the final configuration is the same
regardless of the method used to reach it.
15.12.4. Updating the Automatic Index
Updates to the auto indexed properties happen of course automatically as you
update them. Removal of properties from the auto index happens for two reasons.
One is that you actually removed the property. The other is that you stopped
autoindexing on a property. When the latter happens, any primitive you touch
and it has that property, it is removed from the auto index, regardless of any
operations on the property. When you start or stop auto indexing on a property,
no auto update operation happens currently. If you need to change the set of
auto indexed properties and have them re-indexed, you currently have to do this
by hand. An example will illustrate the above better:
/*
* Creating the configuration
*/
GraphDatabaseService graphDb = new GraphDatabaseFactory().
newEmbeddedDatabaseBuilder( storeDirectory ).
setConfig( GraphDatabaseSettings.node_keys_indexable, "nodeProp1,nodeProp2" ).
setConfig( GraphDatabaseSettings.node_auto_indexing, GraphDatabaseSetting.TRUE ).
newGraphDatabase();
Transaction tx = graphDb.beginTx();
Node node1 = null, node2 = null, node3 = null, node4 = null;
try
{
// Create the primitives
node1 = graphDb.createNode();
node2 = graphDb.createNode();
node3 = graphDb.createNode();
node4 = graphDb.createNode();
// Add indexable and non-indexable properties
node1.setProperty( "nodeProp1", "nodeProp1Value" );
node2.setProperty( "nodeProp2", "nodeProp2Value" );
node3.setProperty( "nodeProp1", "nodeProp3Value" );
node4.setProperty( "nodeProp2", "nodeProp4Value" );
// Make things persistent
tx.success();
}
catch ( Exception e )
{
tx.failure();
}
finally
{
tx.finish();
}
/*
* Here both nodes are indexed. To demonstrate removal, we stop
* autoindexing nodeProp1.
*/
AutoIndexer nodeAutoIndexer = graphDb.index().getNodeAutoIndexer();
nodeAutoIndexer.stopAutoIndexingProperty( "nodeProp1" );
tx = graphDb.beginTx();
try
{
/*
* nodeProp1 is no longer auto indexed. It will be
* removed regardless. Note that node3 will remain.
*/
node1.setProperty( "nodeProp1", "nodeProp1Value2" );
/*
* node2 will be auto updated
*/
node2.setProperty( "nodeProp2", "nodeProp2Value2" );
/*
* remove node4 property nodeProp2 from index.
*/
node4.removeProperty( "nodeProp2" );
// Make things persistent
tx.success();
}
catch ( Exception e )
{
tx.failure();
}
finally
{
tx.finish();
}
// Verify
ReadableIndex nodeAutoIndex = nodeAutoIndexer.getAutoIndex();
// node1 is completely gone
assertFalse( nodeAutoIndex.get( "nodeProp1", "nodeProp1Value" ).hasNext() );
assertFalse( nodeAutoIndex.get( "nodeProp1", "nodeProp1Value2" ).hasNext() );
// node2 is updated
assertFalse( nodeAutoIndex.get( "nodeProp2", "nodeProp2Value" ).hasNext() );
assertEquals( node2,
nodeAutoIndex.get( "nodeProp2", "nodeProp2Value2" ).getSingle() );
/*
* node3 is still there, despite its nodeProp1 property not being monitored
* any more because it was not touched, in contrast with node1.
*/
assertEquals( node3,
nodeAutoIndex.get( "nodeProp1", "nodeProp3Value" ).getSingle() );
// Finally, node4 is removed because the property was removed.
assertFalse( nodeAutoIndex.get( "nodeProp2", "nodeProp4Value" ).hasNext() );
Caution
If you start the database with auto indexing enabled but different auto indexed
properties than the last run, then already auto-indexed entities will be
deleted as you work with them. Make sure that the monitored set is what you
want before enabling the functionality.
Chapter 16. Cypher Query Language
Cypher is a declarative graph query language that allows for expressive and
efficient querying and updating of the graph store without having to write
traversals through the graph structure in code. Cypher is still growing and
maturing, and that means that there probably will be breaking syntax changes.
It also means that it has not undergone the same rigorous performance testing
as the other components.
Cypher is designed to be a humane query language, suitable for both developers
and (importantly, we think) operations professionals who want to make ad-hoc
queries on the database. Our guiding goal is to make the simple things simple,
and the complex things possible. Its constructs are based on English prose and
neat iconography, which helps to make it (somewhat) self-explanatory.
Cypher is inspired by a number of different approaches and builds upon
established practices for expressive querying. Most of the keywords like WHERE
and ORDER BY are inspired by SQL . Pattern
matching borrows expression approaches from SPARQL .
Being a declarative language, Cypher focuses on the clarity of expressing what
to retrieve from a graph, not how to do it, in contrast to imperative languages
like Java, and scripting languages like Gremlin
(supported via the Section 19.16, “Gremlin Plugin”) and the JRuby Neo4j
bindings . This makes the concern of how to
optimize queries an implementation detail not exposed to the user.
The query language is comprised of several distinct clauses.
* START: Starting points in the graph, obtained via index lookups or by
element IDs.
* MATCH: The graph pattern to match, bound to the starting points in START.
* WHERE: Filtering criteria.
* RETURN: What to return.
* CREATE: Creates nodes and relationships.
* DELETE: Removes nodes, relationships and properties.
* SET: Set values to properties.
* FOREACH: Performs updating actions once per element in a list.
* WITH: Divides a query into multiple, distinct parts.
Let’s see three of them in action:
Imagine an example graph like
Figure 16.1. Example Graph
Example-Graph-cypher-intro.svg
For example, here is a query which finds a user called John in an index and
then traverses the graph looking for friends of Johns friends (though not his
direct friends) before returning both John and any friends-of-friends that are
found.
START john=node:node_auto_index(name = 'John')
MATCH john-[:friend]->()-[:friend]->fof
RETURN john, fof
Resulting in
+--------------------------------------------+
|john |fof |
|--------------------------------------------|
|2 rows |
|--------------------------------------------|
|3 ms |
|--------------------------------------------|
|Node[4]{name->"John"}|Node[2]{name->"Maria"}|
|---------------------+----------------------|
|Node[4]{name->"John"}|Node[3]{name->"Steve"}|
+--------------------------------------------+
Next up we will add filtering to set all four parts in motion:
In this next example, we take a list of users (by node ID) and traverse the
graph looking for those other users that have an outgoing friend relationship,
returning only those followed users who have a name property starting with S.
START user=node(5,4,1,2,3)
MATCH user-[:friend]->follower
WHERE follower.name =~ /S.*/
RETURN user, follower.name
Resulting in
+-----------------------------------+
|user |follower.name|
|-----------------------------------|
|2 rows |
|-----------------------------------|
|1 ms |
|-----------------------------------|
|Node[5]{name->"Joe"} |"Steve" |
|---------------------+-------------|
|Node[4]{name->"John"}|"Sara" |
+-----------------------------------+
To use Cypher from Java, see Section 4.10, “Execute Cypher Queries from Java”.
For more Cypher examples, see even Chapter 5, Cypher Cookbook.
16.1. Operators
Operators in Cypher are of three different varieties - mathematical, equality
and relationships.
The mathematical operators are +, -, *, / and %. Of these, only the plus-sign
works on strings.
The equality operators are =, <>, <, >, <=, >=.
Since Neo4j is a schema-less graph database, Cypher has two special
operators — ? and !.
They are used on properties, and are used to deal with missing values. A
comparison on a property that does not exist will cause an error. Instead of
having to always check if the property exists before comparing its value with
something else, the question mark make the comparison always return true if the
property is missing, and the exclamation mark makes the comparator return
false.
This predicate will evaluate to true if n.prop is missing.
WHERE n.prop? = "foo"
This predicate will evaluate to false if n.prop is missing.
WHERE n.prop! = "foo"
Warning: Mixing the two in the same comparison will lead to unpredictable
results.
This is really syntactic sugar that expands to this:
WHERE n.prop? = "foo" ⇒ WHERE (not(has(n.prop)) OR n.prop = "foo")
WHERE n.prop! = "foo" ⇒ WHERE (has(n.prop) AND n.prop = "foo")
16.2. Expressions
An expression in Cypher can be:
* A numeric literal (integer or double) — 13, 40000, 3.14
* A string literal — "Hello", 'World'
* A boolean literal — true, false, TRUE, FALSE
* An identifier — n, x, rel, myFancyIdentifier, `A name with weird stuff in
it[]!`
* A property — n.prop, x.prop, rel.thisProperty, myFancyIdentifier.`(weird
property name)`
* A nullable property — it’s a property, with a question mark or exclamation
mark — n.prop?, rel.thisProperty!
* A parameter — {param}, {0}
* A collection of expressions — ["a", "b"], [1,2,3], ["a", 2, n.property,
{param}], [ ]
* A function call — length(p), nodes(p)
* An aggregate function — avg(x.prop), count(*)
* Relationship types — :REL_TYPE, :`REL TYPE`, :REL1|REL2
16.3. Parameters
Cypher supports querying with parameters. This allows developers to not to have
to do string building to create a query, and it also makes caching of execution
plans much easier for Cypher.
Parameters can be used for literals in the WHERE clause, for the index key and
index value in the START clause, index queries, and finally for node/
relationship ids.
Accepted names for parameter are letters and number, and any combination of
these.
Here follows a few examples of how you can use parameters from Java.
Parameter for node id.
Map params = new HashMap();
params.put( "id", 0 );
ExecutionResult result = engine.execute( "start n=node({id}) return n.name", params );
Parameter for node object.
Map params = new HashMap();
params.put( "node", andreasNode );
ExecutionResult result = engine.execute( "start n=node({node}) return n.name", params );
Parameter for multiple node ids.
Map params = new HashMap();
params.put( "id", Arrays.asList( 0, 1, 2 ) );
ExecutionResult result = engine.execute( "start n=node({id}) return n.name", params );
Parameter for string literal.
Map params = new HashMap();
params.put( "name", "Johan" );
ExecutionResult result =
engine.execute( "start n=node(0,1,2) where n.name = {name} return n", params );
Parameter for index key and value.
Map params = new HashMap();
params.put( "key", "name" );
params.put( "value", "Michaela" );
ExecutionResult result =
engine.execute( "start n=node:people({key} = {value}) return n", params );
Parameter for index query.
Map params = new HashMap();
params.put( "query", "name:Andreas" );
ExecutionResult result = engine.execute( "start n=node:people({query}) return n", params );
Numeric parameters for SKIP and LIMIT.
Map params = new HashMap();
params.put( "s", 1 );
params.put( "l", 1 );
ExecutionResult result =
engine.execute( "start n=node(0,1,2) return n.name skip {s} limit {l}", params );
Parameter for regular expression.
Map params = new HashMap();
params.put( "regex", ".*h.*" );
ExecutionResult result =
engine.execute( "start n=node(0,1,2) where n.name =~ {regex} return n.name", params );
16.4. Identifiers
When you reference parts of the pattern, you do so by naming them. The names
you give the different parts are called identifiers.
In this example:
START n=node(1) MATCH n-->b RETURN b
The identifiers are n and b.
Identifiers can be lower or upper case, and may contain underscore. If other
characters are needed, you can use the ` sign. The same rules apply to property
names.
16.5. Comments
To add comments to your queries, use double slash. Examples:
START n=node(1) RETURN b //This is an end of line comment
START n=node(1)
//This is a whole line comment
RETURN b
START n=node(1) WHERE n.property = "//This is NOT a comment" RETURN b
16.6. Updating the graph with Cypher
Cypher can be used for both querying and updating your graph.
16.6.1. Updating query structure
A Cypher query part can’t both match and update the graph at the same time.
Every part can either read and match on the graph, or make updates on it.
16.6.2. Query Parts & Structure
If you read from the graph, and then update the graph, your query implicitly
has two parts — the reading is the first part, and the writing is the second.
If your query is read-only, Cypher will be lazy, and not actually pattern match
until you ask for the results. Here, the semantics are that all the reading
will be done before any writing actually happens. This is very
important — without this it’s easy to find cases where the pattern matcher runs
into data that is being created by the very same query, and all bets are off.
That road leads to Heisenbugs, Brownian motion and cats that are dead and alive
at the same time.
First reading, and then writing, is the only pattern where the query parts are
implicit — any other order and you have to be explicit about your query parts.
The parts are separated using the WITH statement. WITH is like the event
horizon — it’s a barrier between a plan and the finished execution of that
plan.
When you want filter using aggregated data, you have to chain together two
reading query parts — the first one does the aggregating, and the second query
filters on the results coming from the first one.
START n=node(...)
MATCH n-[:friend]-friend
WITH n, count(friend) as friendsCount
WHERE friendsCount > 3
RETURN n, friendsCount
Using WITH, you specify how you want the aggregation to happen, and that the
aggregation has to be finished before Cypher can start filtering.
You can chain together as many query parts as you have JVM heap for.
16.6.3. Returning data
Any query can return data. If your query is read only, it has to return
data — it serves no purpose if it doesn’t, and it is not a valid Cypher query.
Queries that update the graph don’t have to return anything, but they can.
After all the parts of the query, comes one final RETURN statement. RETURN is
not part of any query part — it is a period symbol after an eloquent statement.
When RETURN is legal, it’s also legal to use SKIP/LIMIT and ORDER BY.
If you return graph elements from a query that has just deleted them — beware,
you are holding a pointer that is no longer valid. Operations on that node
might fail mysteriously and unpredictably.
16.7. Transactions and Cypher
Any query that updates the graph will run in a transaction. An updating query
will always either fully succeed, or not succeed at all.
Cypher will either create a new transaction, and commit it once the query
finishes. Or if a transaction already exists in the running context, the query
will run inside it, and nothing will be persisted to disk until the transaction
is successfully committed.
This can be used to have multiple queries be committed as a single transaction:
1. Open a transaction,
2. run multiple updating Cypher queries,
3. and commit all of them in one go.
Note that a query will hold the changes in heap until the whole query has
finished executing. A large query will consequently need a JVM with lots of
heap space.
16.8. Start
Every query describes a pattern, and in that pattern one can have multiple
start points. A start point is a relationship or a node that form the starting
points for a pattern match. You can either introduce start points by id, or by
index lookups. Note that trying to use an index that doesn’t exist will throw
an exception.
Graph
cypher-start-graph.svg
16.8.1. Node by id
Binding a node as a start point is done with the node(*) function .
Query
START n=node(1)
RETURN n
The corresponding node is returned.
Table 16.1. Result
+------------------+
|n |
|------------------|
|1 row |
|------------------|
|0 ms |
|------------------|
|Node[1]{name->"A"}|
+------------------+
16.8.2. Relationship by id
Binding a relationship as a start point is done with the relationship(*)
function, which can also be abbreviated rel(*).
Query
START r=relationship(0)
RETURN r
The relationship with id 0 is returned.
Table 16.2. Result
+------------+
|r |
|------------|
|1 row |
|------------|
|0 ms |
|------------|
|:KNOWS[0] {}|
+------------+
16.8.3. Multiple nodes by id
Multiple nodes are selected by listing them separated by commas.
Query
START n=node(1, 2, 3)
RETURN n
This returns the nodes listed in the `START statement.
Table 16.3. Result
+------------------+
|n |
|------------------|
|3 rows |
|------------------|
|0 ms |
|------------------|
|Node[1]{name->"A"}|
|------------------|
|Node[2]{name->"B"}|
|------------------|
|Node[3]{name->"C"}|
+------------------+
16.8.4. All nodes
To get all the nodes, use an asterisk. This can be done with relationships as
well.
Query
START n=node(*)
RETURN n
This query returns all the nodes in the graph.
Table 16.4. Result
+------------------+
|n |
|------------------|
|3 rows |
|------------------|
|0 ms |
|------------------|
|Node[1]{name->"A"}|
|------------------|
|Node[2]{name->"B"}|
|------------------|
|Node[3]{name->"C"}|
+------------------+
16.8.5. Node by index lookup
If the start point can be found by index lookups, it can be done like this:
node:index-name(key = "value"). In this example, there exists a node index
named nodes.
Query
START n=node:nodes(name = "A")
RETURN n
The node indexed with name "A" is returned.
Table 16.5. Result
+------------------+
|n |
|------------------|
|1 row |
|------------------|
|0 ms |
|------------------|
|Node[1]{name->"A"}|
+------------------+
16.8.6. Relationship by index lookup
If the start point can be found by index lookups, it can be done like this:
relationship:index-name(key = "value").
Query
START r=relationship:rels(property = "some_value")
RETURN r
The relationship indexed with property "some_value" is returned.
Table 16.6. Result
+----------------------------------+
|r |
|----------------------------------|
|1 row |
|----------------------------------|
|0 ms |
|----------------------------------|
|:KNOWS[0] {property->"some_value"}|
+----------------------------------+
16.8.7. Node by index query
If the start point can be found by index more complex Lucene queries:
node:index-name("query").This allows you to write more advanced index queries.
Query
START n=node:nodes("name:A")
RETURN n
The node indexed with name "A" is returned.
Table 16.7. Result
+------------------+
|n |
|------------------|
|1 row |
|------------------|
|0 ms |
|------------------|
|Node[1]{name->"A"}|
+------------------+
16.8.8. Multiple start points
Sometimes you want to bind multiple start points. Just list them separated by
commas.
Query
START a=node(1), b=node(2)
RETURN a,b
Both the A and the B node are returned.
Table 16.8. Result
+-------------------------------------+
|a |b |
|-------------------------------------|
|1 row |
|-------------------------------------|
|0 ms |
|-------------------------------------|
|Node[1]{name->"A"}|Node[2]{name->"B"}|
+-------------------------------------+
16.9. Match
16.9.1. introduction
Pattern matching is one of the pillars of Cypher. The pattern is used to
describe the shape of the data that we are looking for. Cypher will then try to
find patterns in the graph — these are called matching subgraphs.
The description of the pattern is made up of one or more paths, separated by
commas. A path is a sequence of nodes and relationships that always start and
end in nodes. An example path would be: (a)-->(b)
Paths can be of arbitrary length, and the same node may appear in multiple
places in the path. Node identifiers can be used with or without surrounding
parenthesis. The following two match clauses are semantically identical — the
difference is purely aesthetic.
MATCH (a)-->(b)
and
MATCH a-->b
Patterns have bound points, or start points. They are the parts of the pattern
that are already “bound” to a set of graph nodes or relationships. All parts of
the pattern must be directly or indirectly connected to a start point — a
pattern where parts of the pattern are not reachable from any start point will
be rejected.
The optional relationship is a way to describe parts of the pattern that can
evaluate to null if it can not be matched to the graph. It’s the equivalent of
SQL outer join — if Cypher finds one or more matches, they will be returned. If
no matches are found, Cypher will return a null. Only relationships can be
marked as optional, and it’s done with a question mark.
Optional relationships of the pattern are used to answer queries like this:
START me=node(1)
MATCH me-->friend-[?:parent_of]->children
RETURN friend, children
The query above says “give me all my friends, and their children, if they have
any.”
Optionality is transitive — if a part of the pattern can only be reached from a
bound point through an optional relationship, that part is also optional. In
the pattern above, the only bound point in the pattern is me. Since the
relationship between friend and children is optional, children is an optional
part of the graph.
Also, named paths that contain optional parts are also optional — if any part
of the path is null, the whole path is null.
In these examples, b and p are all optional and can contain null:
START a=node(1)
MATCH p = a-[?]->b
RETURN b
START a=node(1)
MATCH p = a-[?*]->b
RETURN b
START a=node(1)
MATCH p = a-[?]->x-->b
RETURN b
START a=node(1), x=node(2)
MATCH p = shortestPath( a-[?*]->x )
RETURN p
As a simple example, let’s take the following query, executed on the graph
pictured below.
Query
START me=node(1)
MATCH me-->friend-[?:parent_of]->children
RETURN friend, children
This returns the a friend node, and no children, since there are no such
relatoinships in the graph.
Table 16.9. Result
+--------------------------------+
|friend |children|
|--------------------------------|
|1 row |
|--------------------------------|
|1 ms |
|--------------------------------|
|Node[3]{name->"Anders"}| |
+--------------------------------+
For the examples given in the sections below, the follwoing graph is the base:
Graph
cypher-match-graph.svg
16.9.2. Related nodes
The symbol -- means related to, without regard to type or direction.
Query
START n=node(3)
MATCH (n)--(x)
RETURN x
All nodes related to A (Anders) are returned.
Table 16.10. Result
+------------------------+
|x |
|------------------------|
|3 rows |
|------------------------|
|0 ms |
|------------------------|
|Node[4]{name->"Bossman"}|
|------------------------|
|Node[1]{name->"David"} |
|------------------------|
|Node[5]{name->"Cesar"} |
+------------------------+
16.9.3. Outgoing relationships
When the direction of a relationship is interesting, it is shown by using -->
or <--, like this:
Query
START n=node(3)
MATCH (n)-->(x)
RETURN x
All nodes that A has outgoing relationships to.
Table 16.11. Result
+------------------------+
|x |
|------------------------|
|2 rows |
|------------------------|
|0 ms |
|------------------------|
|Node[4]{name->"Bossman"}|
|------------------------|
|Node[5]{name->"Cesar"} |
+------------------------+
16.9.4. Directed relationships and identifier
If an identifier is needed, either for filtering on properties of the
relationship, or to return the relationship, this is how you introduce the
identifier.
Query
START n=node(3)
MATCH (n)-[r]->()
RETURN r
All outgoing relationships from node A.
Table 16.12. Result
+-------------+
|r |
|-------------|
|2 rows |
|-------------|
|0 ms |
|-------------|
|:KNOWS[0] {} |
|-------------|
|:BLOCKS[1] {}|
+-------------+
16.9.5. Match by relationship type
When you know the relationship type you want to match on, you can specify it by
using a colon.
Query
START n=node(3)
MATCH (n)-[:BLOCKS]->(x)
RETURN x
All nodes that are BLOCKed by A.
Table 16.13. Result
+----------------------+
|x |
|----------------------|
|1 row |
|----------------------|
|0 ms |
|----------------------|
|Node[5]{name->"Cesar"}|
+----------------------+
16.9.6. Match by multiple relationship types
If multiple types are acceptable, you can specify this by chaining them with
the pipe symbol |.
Query
START n=node(3)
MATCH (n)-[:BLOCKS|KNOWS]->(x)
RETURN x
All nodes with a BLOCK or KNOWS relationship to A.
Table 16.14. Result
+------------------------+
|x |
|------------------------|
|2 rows |
|------------------------|
|0 ms |
|------------------------|
|Node[5]{name->"Cesar"} |
|------------------------|
|Node[4]{name->"Bossman"}|
+------------------------+
16.9.7. Match by relationship type and use an identifier
If you both want to introduce an identifier to hold the relationship, and
specify the relationship type you want, just add them both, like this.
Query
START n=node(3)
MATCH (n)-[r:BLOCKS]->()
RETURN r
All BLOCKS relationship going out from A.
Table 16.15. Result
+-------------+
|r |
|-------------|
|1 row |
|-------------|
|0 ms |
|-------------|
|:BLOCKS[1] {}|
+-------------+
16.9.8. Relationship types with uncommon characters
Sometime your database will have types with non-letter characters, or with
spaces in them. Use ` to escape these.
Query
START n=node(3)
MATCH (n)-[r:`TYPE
WITH SPACE IN IT`]->()
RETURN r
This returns a relationship of a type with spaces in it.
Table 16.16. Result
+----------------------------+
|r |
|----------------------------|
|1 row |
|----------------------------|
|0 ms |
|----------------------------|
|:TYPE WITH SPACE IN IT[6] {}|
+----------------------------+
16.9.9. Multiple relationships
Relationships can be expressed by using multiple statements in the form of ()--
(), or they can be strung together, like this:
Query
START a=node(3)
MATCH (a)-[:KNOWS]->(b)-[:KNOWS]->(c)
RETURN a,b,c
The three nodes in the path are returned.
Table 16.17. Result
+----------------------------------------------------------------------+
|a |b |c |
|----------------------------------------------------------------------|
|1 row |
|----------------------------------------------------------------------|
|0 ms |
|----------------------------------------------------------------------|
|Node[3]{name->"Anders"}|Node[4]{name->"Bossman"}|Node[2]{name->"Emil"}|
+----------------------------------------------------------------------+
16.9.10. Variable length relationships
Nodes that are a variable number of relationship→node hops away can be found
using -[:TYPE*minHops..maxHops]->.
Query
START a=node(3), x=node(2, 4)
MATCH a-[:KNOWS*1..3]->x
RETURN a,x
Returns the start and end point, if there is a path between 1 and 3
relationships away.
Table 16.18. Result
+------------------------------------------------+
|a |x |
|------------------------------------------------|
|2 rows |
|------------------------------------------------|
|0 ms |
|------------------------------------------------|
|Node[3]{name->"Anders"}|Node[2]{name->"Emil"} |
|-----------------------+------------------------|
|Node[3]{name->"Anders"}|Node[4]{name->"Bossman"}|
+------------------------------------------------+
16.9.11. Relationship identifier in variable length relationships
When the connection between two nodes is of variable length, a relationship
identifier becomes an iterable of relationships.
Query
START a=node(3), x=node(2, 4)
MATCH a-[r:KNOWS*1..3]->x
RETURN r
Returns the relationships, if there is a path between 1 and 3 relationships
away.
Table 16.19. Result
+---------------------------+
|r |
|---------------------------|
|2 rows |
|---------------------------|
|0 ms |
|---------------------------|
|[:KNOWS[0] {},:KNOWS[3] {}]|
|---------------------------|
|[:KNOWS[0] {}] |
+---------------------------+
16.9.12. Zero length paths
When using variable length paths that have the lower bound zero, it means that
two identifiers can point to the same node. If the distance between two nodes
is zero, they are, by definition, the same node.
Query
START a=node(3)
MATCH p1=a-[:KNOWS*0..1]->b, p2=b-[:BLOCKS*0..1]->c
RETURN a,b,c, length(p1), length(p2)
This query will return four paths, some of them with length zero.
Table 16.20. Result
+-----------------------------------------------------------------------------+
|a |b |c |length |length |
| | | |(p1) |(p2) |
|-----------------------------------------------------------------------------|
|4 rows |
|-----------------------------------------------------------------------------|
|0 ms |
|-----------------------------------------------------------------------------|
|Node[3]{name-> |Node[3]{name-> |Node[3]{name-> |0 |0 |
|"Anders"} |"Anders"} |"Anders"} | | |
|-------------------+-------------------+-------------------+--------+--------|
|Node[3]{name-> |Node[3]{name-> |Node[5]{name-> |0 |1 |
|"Anders"} |"Anders"} |"Cesar"} | | |
|-------------------+-------------------+-------------------+--------+--------|
|Node[3]{name-> |Node[4]{name-> |Node[4]{name-> |1 |0 |
|"Anders"} |"Bossman"} |"Bossman"} | | |
|-------------------+-------------------+-------------------+--------+--------|
|Node[3]{name-> |Node[4]{name-> |Node[1]{name-> |1 |1 |
|"Anders"} |"Bossman"} |"David"} | | |
+-----------------------------------------------------------------------------+
16.9.13. Optional relationship
If a relationship is optional, it can be marked with a question mark. This is
similar to how a SQL outer join works. If the relationship is there, it is
returned. If it’s not, null is returned in it’s place. Remember that anything
hanging off an optional relationship, is in turn optional, unless it is
connected with a bound node through some other path.
Query
START a=node(2)
MATCH a-[?]->x
RETURN a,x
A node, and null, since the node has no outgoing relationships.
Table 16.21. Result
+-----------------------------+
|a |x |
|-----------------------------|
|1 row |
|-----------------------------|
|0 ms |
|-----------------------------|
|Node[2]{name->"Emil"} ||
+-----------------------------+
16.9.14. Optional typed and named relationship
Just as with a normal relationship, you can decide which identifier it goes
into, and what relationship type you need.
Query
START a=node(3)
MATCH a-[r?:LOVES]->()
RETURN a,r
This returns a node, and null, since the node has no outgoing LOVES
relationships.
Table 16.22. Result
+-------------------------------+
|a |r |
|-------------------------------|
|1 row |
|-------------------------------|
|0 ms |
|-------------------------------|
|Node[3]{name->"Anders"} ||
+-------------------------------+
16.9.15. Properties on optional elements
Returning a property from an optional element that is null will also return
null.
Query
START a=node(2)
MATCH a-[?]->x
RETURN x, x.name
This returns the element x (null in this query), and null as it’s name.
Table 16.23. Result
+-------------+
|x |x.name|
|-------------|
|1 row |
|-------------|
|0 ms |
|-------------|
|||
+-------------+
16.9.16. Complex matching
Using Cypher, you can also express more complex patterns to match on, like a
diamond shape pattern.
Query
START a=node(3)
MATCH (a)-[:KNOWS]->(b)-[:KNOWS]->(c), (a)-[:BLOCKS]-(d)-[:KNOWS]-(c)
RETURN a,b,c,d
This returns the four nodes in the paths.
Table 16.24. Result
+-----------------------------------------------------------------------------+
|a |b |c |d |
|-----------------------------------------------------------------------------|
|1 row |
|-----------------------------------------------------------------------------|
|0 ms |
|-----------------------------------------------------------------------------|
|Node[3]{name-> |Node[4]{name-> |Node[2]{name-> |Node[5]{name-> |
|"Anders"} |"Bossman"} |"Emil"} |"Cesar"} |
+-----------------------------------------------------------------------------+
16.9.17. Shortest path
Finding a single shortest path between two nodes is as easy as using the
shortestPath function, like this:
Query
START d=node(1), e=node(2)
MATCH p = shortestPath( d-[*..15]->e )
RETURN p
This means: find a single shortest path between two nodes, as long as the path
is max 15 relationships long. Inside of the parenthesis you write a single link
of a path — the starting node, the connecting relationship and the end node.
Characteristics describing the relationship like relationship type, max hops
and direction are all used when finding the shortest path. You can also mark
the path as optional.
Table 16.25. Result
+------------------------------------------------------+
|p |
|------------------------------------------------------|
|1 row |
|------------------------------------------------------|
|0 ms |
|------------------------------------------------------|
|(1)--[KNOWS,2]-->(3)--[KNOWS,0]-->(4)--[KNOWS,3]-->(2)|
+------------------------------------------------------+
16.9.18. All shortest paths
Finds all the shortest paths between two nodes.
Query
START d=node(1), e=node(2)
MATCH p = allShortestPaths( d-[*..15]->e )
RETURN p
This will find the two directed paths between David and Emil.
Table 16.26. Result
+-------------------------------------------------------+
|p |
|-------------------------------------------------------|
|2 rows |
|-------------------------------------------------------|
|0 ms |
|-------------------------------------------------------|
|(1)--[KNOWS,2]-->(3)--[KNOWS,0]-->(4)--[KNOWS,3]-->(2) |
|-------------------------------------------------------|
|(1)--[KNOWS,2]-->(3)--[BLOCKS,1]-->(5)--[KNOWS,4]-->(2)|
+-------------------------------------------------------+
16.9.19. Named path
If you want to return or filter on a path in your pattern graph, you can a
introduce a named path.
Query
START a=node(3)
MATCH p = a-->b
RETURN p
This returns the two paths starting from the first node.
Table 16.27. Result
+---------------------------------------------------------------+
|p |
|---------------------------------------------------------------|
|2 rows |
|---------------------------------------------------------------|
|0 ms |
|---------------------------------------------------------------|
|[Node[3]{name->"Anders"},:KNOWS[0] {},Node[4]{name->"Bossman"}]|
|---------------------------------------------------------------|
|[Node[3]{name->"Anders"},:BLOCKS[1] {},Node[5]{name->"Cesar"}] |
+---------------------------------------------------------------+
16.9.20. Matching on a bound relationship
When your pattern contains a bound relationship, and that relationship pattern
doesn’t specify direction, Cypher will try to match the relationship where the
connected nodes switch sides.
Query
START a=node(3), b=node(2)
MATCH a-[?:KNOWS]-x-[?:KNOWS]-b
RETURN x
This returns the two connected nodes, once as the start node, and once as the
end node
Table 16.28. Result
+------------------------+
|x |
|------------------------|
|3 rows |
|------------------------|
|0 ms |
|------------------------|
|Node[4]{name->"Bossman"}|
|------------------------|
|Node[5]{name->"Cesar"} |
|------------------------|
|Node[1]{name->"David"} |
+------------------------+
16.10. Where
If you need filtering apart from the pattern of the data that you are looking
for, you can add clauses in the where part of the query.
Graph
cypher-where-graph.svg
16.10.1. Boolean operations
You can use the expected boolean operators AND and OR, and also the boolean
function NOT().
Query
START n=node(3, 1)
WHERE (n.age < 30 and n.name = "Tobias") or not(n.name = "Tobias")
RETURN n
The node.
Table 16.29. Result
+---------------------------------------------+
|n |
|---------------------------------------------|
|2 rows |
|---------------------------------------------|
|0 ms |
|---------------------------------------------|
|Node[3]{name->"Andres",age->36,belt->"white"}|
|---------------------------------------------|
|Node[1]{name->"Tobias",age->25} |
+---------------------------------------------+
16.10.2. Filter on node property
To filter on a property, write your clause after the WHERE keyword.
Query
START n=node(3, 1)
WHERE n.age < 30
RETURN n
The node.
Table 16.30. Result
+-------------------------------+
|n |
|-------------------------------|
|1 row |
|-------------------------------|
|0 ms |
|-------------------------------|
|Node[1]{name->"Tobias",age->25}|
+-------------------------------+
16.10.3. Regular expressions
You can match on regular expressions by using =~ /regexp/, like this:
Query
START n=node(3, 1)
WHERE n.name =~ /Tob.*/
RETURN n
The node named Tobias.
Table 16.31. Result
+-------------------------------+
|n |
|-------------------------------|
|1 row |
|-------------------------------|
|0 ms |
|-------------------------------|
|Node[1]{name->"Tobias",age->25}|
+-------------------------------+
16.10.4. Escaping in regular expressions
If you need a forward slash inside of your regular expression, escape it just
like you expect to.
Query
START n=node(3, 1)
WHERE n.name =~ /Some\/thing/
RETURN n
No nodes match this regular expression.
Table 16.32. Result
+--------------+
|n |
|--------------|
|0 row |
|--------------|
|0 ms |
|--------------|
|(empty result)|
+--------------+
16.10.5. Case insensitive regular expressions
By pre-pending a regular expression with (?i), the whole expression becomes
case insensitive.
Query
START n=node(3, 1)
WHERE n.name =~ /(?i)ANDR.*/
RETURN n
The node with name Andres is returned.
Table 16.33. Result
+---------------------------------------------+
|n |
|---------------------------------------------|
|1 row |
|---------------------------------------------|
|0 ms |
|---------------------------------------------|
|Node[3]{name->"Andres",age->36,belt->"white"}|
+---------------------------------------------+
16.10.6. Filtering on relationship type
You can put the exact relationship type in the MATCH pattern, but sometimes you
want to be able to do more advanced filtering on the type. You can use the
special property TYPE to compare the type with something else. In this example,
the query does a regular expression comparison with the name of the
relationship type.
Query
START n=node(3)
MATCH (n)-[r]->()
WHERE type(r) =~ /K.*/
RETURN r
The relationship that has a type whose name starts with K.
Table 16.34. Result
+------------+
|r |
|------------|
|2 rows |
|------------|
|0 ms |
|------------|
|:KNOWS[0] {}|
|------------|
|:KNOWS[1] {}|
+------------+
16.10.7. Property exists
To only include nodes/relationships that have a property, just write out the
identifier and the property you expect it to have.
Query
START n=node(3, 1)
WHERE has(n.belt)
RETURN n
The node named Andres.
Table 16.35. Result
+---------------------------------------------+
|n |
|---------------------------------------------|
|1 row |
|---------------------------------------------|
|0 ms |
|---------------------------------------------|
|Node[3]{name->"Andres",age->36,belt->"white"}|
+---------------------------------------------+
16.10.8. Default true if property is missing
If you want to compare a property on a graph element, but only if it exists,
use the nullable property syntax. You can use a question mark if you want
missing property to return true, like:
Query
START n=node(3, 1)
WHERE n.belt? = 'white'
RETURN n
All nodes, even those without the belt property
Table 16.36. Result
+---------------------------------------------+
|n |
|---------------------------------------------|
|2 rows |
|---------------------------------------------|
|0 ms |
|---------------------------------------------|
|Node[3]{name->"Andres",age->36,belt->"white"}|
|---------------------------------------------|
|Node[1]{name->"Tobias",age->25} |
+---------------------------------------------+
16.10.9. Default false if property is missing
When you need missing property to evaluate to false, use the exclamation mark.
Query
START n=node(3, 1)
WHERE n.belt! = 'white'
RETURN n
No nodes without the belt property are returned.
Table 16.37. Result
+---------------------------------------------+
|n |
|---------------------------------------------|
|1 row |
|---------------------------------------------|
|0 ms |
|---------------------------------------------|
|Node[3]{name->"Andres",age->36,belt->"white"}|
+---------------------------------------------+
16.10.10. Filter on null values
Sometimes you might want to test if a value or an identifier is null. This is
done just like SQL does it, with IS NULL. Also like SQL, the negative is IS NOT
NULL, althought NOT(IS NULL x) also works.
Query
START a=node(1), b=node(3, 2)
MATCH a<-[r?]-b
WHERE r is null
RETURN b
Nodes that Tobias is not connected to
Table 16.38. Result
+------------------------------+
|b |
|------------------------------|
|1 row |
|------------------------------|
|0 ms |
|------------------------------|
|Node[2]{name->"Peter",age->34}|
+------------------------------+
16.10.11. Filter on relationships
To filter out subgraphs based on relationships between nodes, you use a limited
part of the iconigraphy in the match clause. You can only describe the
relationship with direction and optional type. These are all valid expressions:
WHERE a-->b
WHERE a<--b
WHERE a<-[:KNOWS]-b
WHERE a-[:KNOWS]-b
Note that you can not introduce new identifiers here. Although it might look
very similar to the MATCH clause, the WHERE clause is all about eliminating
matched subgraphs. MATCH a-->b is very different from WHERE a-->b; the first
will produce a subgraph for every relationship between a and b, and the latter
will eliminate any matched subgraphs where a and b do not have a relationship
between them.
Query
START a=node(1), b=node(3, 2)
WHERE a<--b
RETURN b
Nodes that Tobias is not connected to
Table 16.39. Result
+---------------------------------------------+
|b |
|---------------------------------------------|
|1 row |
|---------------------------------------------|
|0 ms |
|---------------------------------------------|
|Node[3]{name->"Andres",age->36,belt->"white"}|
+---------------------------------------------+
16.10.12. IN operator
To check if an element exists in a collection, you can use the IN operator.
Query
START a=node(3, 1, 2)
WHERE a.name IN ["Peter", "Tobias"]
RETURN a
This query shows how to check if a property exists in a literal collection.
Table 16.40. Result
+-------------------------------+
|a |
|-------------------------------|
|2 rows |
|-------------------------------|
|0 ms |
|-------------------------------|
|Node[1]{name->"Tobias",age->25}|
|-------------------------------|
|Node[2]{name->"Peter",age->34} |
+-------------------------------+
16.11. Return
In the return part of your query, you define which parts of the pattern you are
interested in. It can be nodes, relationships, or properties on these.
Graph
cypher-return-graph.svg
16.11.1. Return nodes
To return a node, list it in the return statemenet.
Query
START n=node(2)
RETURN n
The node.
Table 16.41. Result
+------------------+
|n |
|------------------|
|1 row |
|------------------|
|0 ms |
|------------------|
|Node[2]{name->"B"}|
+------------------+
16.11.2. Return relationships
To return a relationship, just include it in the return list.
Query
START n=node(1)
MATCH (n)-[r:KNOWS]->(c)
RETURN r
The relationship.
Table 16.42. Result
+------------+
|r |
|------------|
|1 row |
|------------|
|0 ms |
|------------|
|:KNOWS[0] {}|
+------------+
16.11.3. Return property
To return a property, use the dot separator, like this:
Query
START n=node(1)
RETURN n.name
The the value of the property name.
Table 16.43. Result
+------+
|n.name|
|------|
|1 row |
|------|
|0 ms |
|------|
|"A" |
+------+
16.11.4. Return all elements
When you want to return all nodes, relationships and paths found in a query,
you can use the * symbol.
Query
START a=node(1)
MATCH p=a-[r]->b
RETURN *
Returns the two nodes, the relationship and the path used in the query
Table 16.44. Result
+-----------------------------------------------------------------------------+
|a |b |r |p |
|-----------------------------------------------------------------------------|
|2 rows |
|-----------------------------------------------------------------------------|
|0 ms |
|-----------------------------------------------------------------------------|
|Node[1]{name-> |Node[2] |:KNOWS |[Node[1]{name->"A",happy->"Yes!",age->|
|"A",happy->"Yes! |{name-> |[0] {} |55},:KNOWS[0] {},Node[2]{name->"B"}] |
|",age->55} |"B"} | | |
|--------------------+---------+-------+--------------------------------------|
|Node[1]{name-> |Node[2] |:BLOCKS|[Node[1]{name->"A",happy->"Yes!",age->|
|"A",happy->"Yes! |{name-> |[1] {} |55},:BLOCKS[1] {},Node[2]{name->"B"}] |
|",age->55} |"B"} | | |
+-----------------------------------------------------------------------------+
16.11.5. Identifier with uncommon characters
To introduce a placeholder that is made up of characters that are outside of
the english alphabet, you can use the ` to enclose the identifier, like this:
Query
START `This isn't a common identifier`=node(1)
RETURN `This isn't a common identifier`.happy
The node indexed with name "A" is returned
Table 16.45. Result
+------------------------------------+
|This isn't a common identifier.happy|
|------------------------------------|
|1 row |
|------------------------------------|
|0 ms |
|------------------------------------|
|"Yes!" |
+------------------------------------+
16.11.6. Column alias
If the name of the column should be different from the expression used, you can
rename it by using AS .
Query
START a=node(1)
RETURN a.age AS SomethingTotallyDifferent
Returns the age property of a node, but renames the column.
Table 16.46. Result
+-------------------------+
|SomethingTotallyDifferent|
|-------------------------|
|1 row |
|-------------------------|
|0 ms |
|-------------------------|
|55 |
+-------------------------+
16.11.7. Optional properties
If a property might or might not be there, you can select it optionally by
adding a questionmark to the identifier, like this:
Query
START n=node(1, 2)
RETURN n.age?
The age when the node has that property, or null if the property is not there.
Table 16.47. Result
+------+
|n.age?|
|------|
|2 rows|
|------|
|0 ms |
|------|
|55 |
|------|
||
+------+
16.11.8. Unique results
DISTINCT retrieves only unique rows depending on the columns that have been
selected to output.
Query
START a=node(1)
MATCH (a)-->(b)
RETURN distinct b
The node named B, but only once.
Table 16.48. Result
+------------------+
|b |
|------------------|
|1 row |
|------------------|
|0 ms |
|------------------|
|Node[2]{name->"B"}|
+------------------+
16.12. Aggregation
16.12.1. Introduction
To calculate aggregated data, Cypher offers aggregation, much like SQL’s GROUP
BY.
Aggregate functions take multiple input values and calculate an aggregated
value from them. Examples are AVG that calculate the average of multiple
numeric values, or MIN that finds the smallest numeric value in a set of
values.
Aggregation can be done over all the matching sub graphs, or it can be further
divided by introducing key values. These are non-aggregate expressions, that
are used to group the values going into the aggregate functions.
So, if the return statement looks something like this:
RETURN n, count(*)
We have two return expressions — n, and count(*). The first, n, is no aggregate
function, and so it will be the grouping key. The latter, count(*) is an
aggregate expression. So the matching subgraphs will be divided into different
buckets, depending on the grouping key. The aggregate function will then run on
these buckets, calculating the aggregate values.
The last piece of the puzzle is the DISTINCT keyword. It is used to make all
values unique before running them through an aggregate function.
An example might be helpful:
Query
START me=node(1)
MATCH me-->friend-->friend_of_friend
RETURN count(distinct friend_of_friend), count(friend_of_friend)
In this example we are trying to find all our friends of friends, and count
them. The first aggregate function, count(distinct friend_of_friend), will only
see a friend_of_friend once — DISTINCT removes the duplicates. The latter
aggregate function, count(friend_of_friend), might very well see the same
friend_of_friend multiple times. Since there is no real data in this case, an
empty result is returned. See the sections below for real data.
Table 16.49. Result
+--------------------------------------------------------+
|count(distinct friend_of_friend)|count(friend_of_friend)|
|--------------------------------------------------------|
|0 row |
|--------------------------------------------------------|
|1 ms |
|--------------------------------------------------------|
|(empty result) |
+--------------------------------------------------------+
The following examples are assuming the example graph structure below.
Graph
cypher-aggregation-graph.svg
16.12.2. COUNT
COUNT is used to count the number of rows. COUNT can be used in two
forms — COUNT(*) which just counts the number of matching rows, and COUNT
(), which counts the number of non-null values in .
16.12.3. Count nodes
To count the number of nodes, for example the number of nodes connected to one
node, you can use count(*).
Query
START n=node(2)
MATCH (n)-->(x)
RETURN n, count(*)
The start node and the count of related nodes.
Table 16.50. Result
+----------------------------------------+
|n |count(*)|
|----------------------------------------|
|1 row |
|----------------------------------------|
|0 ms |
|----------------------------------------|
|Node[2]{name->"A",property->13}|3 |
+----------------------------------------+
16.12.4. Group Count Relationship Types
To count the groups of relationship types, return the types and count them with
count(*).
Query
START n=node(2)
MATCH (n)-[r]->()
RETURN type(r), count(*)
The relationship types and their group count.
Table 16.51. Result
+-----------------+
|type(r)|count(*) |
|-----------------|
|1 row |
|-----------------|
|0 ms |
|-----------------|
|"KNOWS"|3 |
+-----------------+
16.12.5. Count entities
Instead of counting the number of results with count(*), it might be more
expressive to include the name of the identifier you care about.
Query
START n=node(2)
MATCH (n)-->(x)
RETURN count(x)
The number of connected nodes from the start node.
Table 16.52. Result
+--------+
|count(x)|
|--------|
|1 row |
|--------|
|0 ms |
|--------|
|3 |
+--------+
16.12.6. Count non-null values
You can count the non-null values by using count().
Query
START n=node(2,3,4,1)
RETURN count(n.property?)
The count of related nodes.
Table 16.53. Result
+------------------+
|count(n.property?)|
|------------------|
|1 row |
|------------------|
|0 ms |
|------------------|
|3 |
+------------------+
16.12.7. SUM
The SUM aggregation function simply sums all the numeric values it encounters.
Nulls are silently dropped. This is an example of how you can use SUM.
Query
START n=node(2,3,4)
RETURN sum(n.property)
The sum of all the values in the property property.
Table 16.54. Result
+---------------+
|sum(n.property)|
|---------------|
|1 row |
|---------------|
|0 ms |
|---------------|
|90 |
+---------------+
16.12.8. AVG
AVG calculates the average of a numeric column.
Query
START n=node(2,3,4)
RETURN avg(n.property)
The average of all the values in the property property.
Table 16.55. Result
+---------------+
|avg(n.property)|
|---------------|
|1 row |
|---------------|
|0 ms |
|---------------|
|30.0 |
+---------------+
16.12.9. MAX
MAX find the largets value in a numeric column.
Query
START n=node(2,3,4)
RETURN max(n.property)
The largest of all the values in the property property.
Table 16.56. Result
+---------------+
|max(n.property)|
|---------------|
|1 row |
|---------------|
|0 ms |
|---------------|
|44 |
+---------------+
16.12.10. MIN
MIN takes a numeric property as input, and returns the smallest value in that
column.
Query
START n=node(2,3,4)
RETURN min(n.property)
The smallest of all the values in the property property.
Table 16.57. Result
+---------------+
|min(n.property)|
|---------------|
|1 row |
|---------------|
|0 ms |
|---------------|
|13 |
+---------------+
16.12.11. COLLECT
COLLECT collects all the values into a list.
Query
START n=node(2,3,4)
RETURN collect(n.property)
Returns a single row, with all the values collected.
Table 16.58. Result
+-------------------+
|collect(n.property)|
|-------------------|
|1 row |
|-------------------|
|0 ms |
|-------------------|
|[13,33,44] |
+-------------------+
16.12.12. DISTINCT
All aggregation functions also take the DISTINCT modifier, which removes
duplicates from the values. So, to count the number of unique eye colors from
nodes related to a, this query can be used:
Query
START a=node(2)
MATCH a-->b
RETURN count(distinct b.eyes)
Returns the number of eye colors.
Table 16.59. Result
+----------------------+
|count(distinct b.eyes)|
|----------------------|
|1 row |
|----------------------|
|0 ms |
|----------------------|
|2 |
+----------------------+
16.13. Order by
To sort the output, use the ORDER BY clause. Note that you can not sort on
nodes or relationships, just on properties on these.
Graph
cypher-orderby-graph.svg
16.13.1. Order nodes by property
ORDER BY is used to sort the output
Query
START n=node(3,1,2)
RETURN n
ORDER BY n.name
The nodes, sorted by their name.
Table 16.60. Result
+--------------------------------------+
|n |
|--------------------------------------|
|3 rows |
|--------------------------------------|
|0 ms |
|--------------------------------------|
|Node[1]{name->"A",age->34,length->170}|
|--------------------------------------|
|Node[2]{name->"B",age->34} |
|--------------------------------------|
|Node[3]{name->"C",age->32,length->185}|
+--------------------------------------+
16.13.2. Order nodes by multiple properties
You can order by multiple properties by stating each identifier in the ORDER BY
statement. Cypher will sort the result by the first identifier listed, and for
equals values, go to the next property in the order by, and so on.
Query
START n=node(3,1,2)
RETURN n
ORDER BY n.age, n.name
The nodes, sorted first by their age, and then by their name.
Table 16.61. Result
+--------------------------------------+
|n |
|--------------------------------------|
|3 rows |
|--------------------------------------|
|0 ms |
|--------------------------------------|
|Node[3]{name->"C",age->32,length->185}|
|--------------------------------------|
|Node[1]{name->"A",age->34,length->170}|
|--------------------------------------|
|Node[2]{name->"B",age->34} |
+--------------------------------------+
16.13.3. Order nodes in descending order
By adding DESC[ENDING] after the identifier to sort on, the sort will be done
in reverse order.
Query
START n=node(3,1,2)
RETURN n
ORDER BY n.name DESC
The nodes, sorted by their name reversely.
Table 16.62. Result
+--------------------------------------+
|n |
|--------------------------------------|
|3 rows |
|--------------------------------------|
|0 ms |
|--------------------------------------|
|Node[3]{name->"C",age->32,length->185}|
|--------------------------------------|
|Node[2]{name->"B",age->34} |
|--------------------------------------|
|Node[1]{name->"A",age->34,length->170}|
+--------------------------------------+
16.13.4. Ordering null
When sorting the result set, null will always come at the end of the result set
for ascending sorting, and first when doing descending sort.
Query
START n=node(3,1,2)
RETURN n.length?, n
ORDER BY n.length?
The nodes sorted by the length property, with a node without that property
last.
Table 16.63. Result
+------------------------------------------------+
|n.length?|n |
|------------------------------------------------|
|3 rows |
|------------------------------------------------|
|0 ms |
|------------------------------------------------|
|170 |Node[1]{name->"A",age->34,length->170}|
|---------+--------------------------------------|
|185 |Node[3]{name->"C",age->32,length->185}|
|---------+--------------------------------------|
| |Node[2]{name->"B",age->34} |
+------------------------------------------------+
16.14. Skip
SKIP enables the return of only subsets of the total result. By using SKIP, the
result set will trimmed from the top. Please note that no guarantees are made
on the order of the result unless the query specifies the ORDER BY clause.
Graph
cypher-skip-graph.svg
16.14.1. Skip first three
To return a subset of the result, starting from third result, use this syntax:
Query
START n=node(3, 4, 5, 1, 2)
RETURN n
ORDER BY n.name
SKIP 3
The first three nodes are skipped, and only the last two are returned.
Table 16.64. Result
+------------------+
|n |
|------------------|
|2 rows |
|------------------|
|0 ms |
|------------------|
|Node[1]{name->"D"}|
|------------------|
|Node[2]{name->"E"}|
+------------------+
16.14.2. Return middle two
To return a subset of the result, starting from somewhere in the middle, use
this syntax:
Query
START n=node(3, 4, 5, 1, 2)
RETURN n
ORDER BY n.name
SKIP 1
LIMIT 2
Two nodes from the middle are returned
Table 16.65. Result
+------------------+
|n |
|------------------|
|2 rows |
|------------------|
|0 ms |
|------------------|
|Node[4]{name->"B"}|
|------------------|
|Node[5]{name->"C"}|
+------------------+
16.15. Limit
LIMIT enables the return of only subsets of the total result.
Graph
cypher-limit-graph.svg
16.15.1. Return first part
To return a subset of the result, starting from the top, use this syntax:
Query
START n=node(3, 4, 5, 1, 2)
RETURN n
LIMIT 3
The top three items are returned
Table 16.66. Result
+------------------+
|n |
|------------------|
|3 rows |
|------------------|
|0 ms |
|------------------|
|Node[3]{name->"A"}|
|------------------|
|Node[4]{name->"B"}|
|------------------|
|Node[5]{name->"C"}|
+------------------+
16.16. With
The ability to chain queries together allows for powerful constructs. In
Cypher, the WITH clause is used to pipe the result from one query to the next.
WITH is also used to separate reading from updating of the graph. Every
sub-query of a query must either be a read-only or a write-only.
Graph
cypher-with-graph.svg
16.16.1. Filter on aggregate function results
Aggregated results have to pass through a WITH clause to be able to filter on.
Query
START david=node(1)
MATCH david--otherPerson-->()
WITH otherPerson, count(*) as foaf
WHERE foaf > 1
RETURN otherPerson
The person connected to David with the at least more than one outgoing
relationship.
Table 16.67. Result
+-----------------------+
|otherPerson |
|-----------------------|
|1 row |
|-----------------------|
|0 ms |
|-----------------------|
|Node[3]{name->"Anders"}|
+-----------------------+
16.16.2. Alternative syntax of with
If you prefer a more visual way of writing your query, you can use equal-signs
as delimiters before and after the column list. Use at least three before the
column list, and at least after.
Query
START david=node(1)
MATCH david--otherPerson-->()
========== otherPerson, count(*) as foaf ==========
SET otherPerson.connection_count = foaf
The person connected to David with the at least more than one outgoing
relationship.
Table 16.68. Result
+-----------------+
|Properties set: 2|
|-----------------|
|2 ms |
|-----------------|
|(empty result) |
+-----------------+
16.17. Create
Creating graph elements - nodes and relationships, is done with CREATE.
16.17.1. Create single node
Creating a single node is done by issuing the following query.
Query
CREATE n
Nothing is returned from this query, except the count of affected nodes.
Table 16.69. Result
+----------------+
|Nodes created: 1|
|----------------|
|1 ms |
|----------------|
|(empty result) |
+----------------+
16.17.2. Create single node and set properties
The values for the properties can be any scalar expressions.
Query
CREATE n = {name : 'Andres', title : 'Developer'}
Nothing is returned from this query.
Table 16.70. Result
+-----------------+
|Nodes created: 1 |
|-----------------|
|Properties set: 2|
|-----------------|
|1 ms |
|-----------------|
|(empty result) |
+-----------------+
16.17.3. Return created node
Creating a single node is done by issuing the following query.
Query
CREATE a = {name : 'Andres'}
RETURN a
The newly created node is returned.
Table 16.71. Result
+-----------------------+
|a |
|-----------------------|
|1 row |
|-----------------------|
|Nodes created: 1 |
|-----------------------|
|Properties set: 1 |
|-----------------------|
|1 ms |
|-----------------------|
|Node[1]{name->"Andres"}|
+-----------------------+
16.17.4. Create a relationship between two nodes
To create a relationship between two nodes, we first get the two nodes. Once
the nodes are loaded, we simply create a relationship between them.
Query
START a=node(1), b=node(2)
CREATE a-[r:REL]->b
RETURN r
The created relationship is returned.
Table 16.72. Result
+------------------------+
|r |
|------------------------|
|1 row |
|------------------------|
|Relationships created: 1|
|------------------------|
|2 ms |
|------------------------|
|:REL[0] {} |
+------------------------+
16.17.5. Create a relationship and set properties
Setting properties on relationships is done in a similar manner to how it’s
done when creating nodes.Note that the values can be any expression.
Query
START a=node(1), b=node(2)
CREATE a-[r:REL {name : a.name + '<->' + b.name }]->b
RETURN r
The newly created relationship is returned.
Table 16.73. Result
+----------------------------------+
|r |
|----------------------------------|
|1 row |
|----------------------------------|
|Relationships created: 1 |
|----------------------------------|
|Properties set: 1 |
|----------------------------------|
|12 ms |
|----------------------------------|
|:REL[0] {name->"Andres<->Michael"}|
+----------------------------------+
16.17.6. Create single node from map
You can also create a graph entity from a Map map. All the key/
value pairs in the map will be set as properties on the created relationship or
node.
Query
create node {props}
This query can be used in the following fashion:
Map props = new HashMap();
props.put( "name", "Andres" );
props.put( "position", "Developer" );
Map params = new HashMap();
params.put( "props", props );
engine.execute( "create n = {props}", params );
16.17.7. Create multiple nodes from maps
By providing an iterable of maps (Iterable