This guide is for people who understand SQL. You can use that prior knowledge to quickly get going with Cypher and start exploring Neo4j.
SQL starts with the result you want — we SELECT
what we want and then
declare how to source it. In Cypher, the START
clause is quite a
different concept which specifies starting points in the graph from which
the query will execute.
From a SQL point of view, the identifiers in START
are like table names
that point to a set of nodes or relationships. The set can be listed
literally, come via parameters, or as I show in the following example, be
defined by an index look-up.
So in fact rather than being SELECT
-like, the START
clause is
somewhere between the FROM
and the WHERE
clause in SQL.
SQL Query.
SELECT * FROM "Person" WHERE name = 'Anakin'
NAME | ID | AGE | HAIR |
---|---|---|---|
1 rows | |||
|
|
|
|
Cypher Query.
START person=node:Person(name = 'Anakin') RETURN person
person |
---|
1 row |
32 ms |
|
Cypher allows multiple starting points. This should not be strange from a SQL perspective — every table in the FROM
clause is another starting point.
Unlike SQL which operates on sets, Cypher predominantly works on sub-graphs.
The relational equivalent is the current set of tuples being evaluated during a SELECT
query.
The shape of the sub-graph is specified in the MATCH
clause.
The MATCH
clause is analogous to the JOIN
in SQL. A normal a→b relationship is an
inner join between nodes a and b — both sides have to have at least one match, or nothing is returned.
We’ll start with a simple example, where we find all email addresses that are connected to the person “Anakin”. This is an ordinary one-to-many relationship.
SQL Query.
SELECT "Email".* FROM "Person" JOIN "Email" ON "Person".id = "Email".person_id WHERE "Person".name = 'Anakin'
ADDRESS | COMMENT | PERSON_ID |
---|---|---|
2 rows | ||
|
|
|
|
|
|
Cypher Query.
START person=node:Person(name = 'Anakin') MATCH person-[:email]->email RETURN email
2 rows |
---|
13 ms |
|
|
There is no join table here, but if one is necessary the next example will show how to do that, writing the pattern relationship like so:
-[r:belongs_to]->
will introduce (the equivalent of) join table available as the variable r
.
In reality this is a named relationship in Cypher, so we’re saying “join Person
to Group
via belongs_to
.”
To illustrate this, consider this image, comparing the SQL model and Neo4j/Cypher.
And here are example queries:
SQL Query.
SELECT "Group".*, "Person_Group".* FROM "Person" JOIN "Person_Group" ON "Person".id = "Person_Group".person_id JOIN "Group" ON "Person_Group".Group_id="Group".id WHERE "Person".name = 'Bridget'
NAME | ID | BELONGS_TO_GROUP_ID | PERSON_ID | GROUP_ID |
---|---|---|---|---|
1 rows | ||||
|
|
|
|
|
Cypher Query.
START person=node:Person(name = 'Bridget') MATCH person-[r:belongs_to]->group RETURN group, r
group | r |
---|---|
1 row | |
1 ms | |
|
|
An outer join is just as easy.
Add a question mark -[?:KNOWS]->
and it’s an optional relationship between nodes — the outer join of Cypher.
Whether it’s a left outer join, or a right outer join is defined by which side of the pattern has a starting point. This example is a left outer join, because the bound node is on the left side:
SQL Query.
SELECT "Person".name, "Email".address FROM "Person" LEFT JOIN "Email" ON "Person".id = "Email".person_id
NAME | ADDRESS |
---|---|
3 rows | |
|
|
|
|
|
|
Cypher Query.
START person=node:Person('name: *') MATCH person-[?:email]->email RETURN person.name, email.address?
person.name | email.address? |
---|---|
3 rows | |
47 ms | |
|
|
|
|
|
|
Relationships in Neo4j are first class citizens — it’s like the SQL tables are pre-joined with each other. So, naturally, Cypher is designed to be able to handle highly connected data easily.
One such domain is tree structures — anyone that has tried storing tree structures in SQL knows that you have to work hard to get around the limitations of the relational model. There are even books on the subject.
To find all the groups and sub-groups that Bridget belongs to, this query is enough in Cypher:
Cypher Query.
START person=node:Person('name: Bridget') MATCH person-[:belongs_to*]->group RETURN person.name, group.name
person.name | group.name |
---|---|
3 rows | |
6 ms | |
|
|
|
|
|
|
The * after the relationship type means that there can be multiple hops across belongs_to
relationships between group and user.
Some SQL dialects have recursive abilities, that allow the expression of queries like this, but you may have a hard time wrapping your head around those.
Expressing something like this in SQL is hugely impractical if not practically impossible.
This is the easiest thing to understand — it’s the same animal in both languages. It filters out result sets/subgraphs. Not all predicates have an equivalent in the other language, but the concept is the same.
SQL Query.
SELECT * FROM "Person" WHERE "Person".age > 35 AND "Person".hair = 'blonde'
NAME | ID | AGE | HAIR |
---|---|---|---|
1 rows | |||
|
|
|
|
Cypher Query.
START person=node:Person('name: *') WHERE person.age > 35 AND person.hair = 'blonde' RETURN person
person |
---|
1 row |
2 ms |
|
This is SQL’s SELECT
.
We just put it in the end because it felt better to have it there — you do a lot of matching and filtering, and finally, you return something.
Aggregate queries work just like they do in SQL, apart from the fact that there is no explicit GROUP BY
clause.
Everything in the return clause that is not an aggregate function will be used as the grouping columns.
SQL Query.
SELECT "Person".name, count(*) FROM "Person" GROUP BY "Person".name ORDER BY "Person".name
NAME | C2 |
---|---|
2 rows | |
|
|
|
|
Cypher Query.
START person=node:Person('name: *') RETURN person.name, count(*) ORDER BY person.name
person.name | count(*) |
---|---|
2 rows | |
8 ms | |
|
|
|
|
Order by is the same in both languages — ORDER BY
expression ASC
/DESC
.
Nothing weird here.
Copyright © 2012 Neo Technology