15.25. From SQL to Cypher

Prev		Next

15.25.1. Start
15.25.2. Match
15.25.3. Where
15.25.4. Return

This guide is for people who understand SQL. You can use that prior knowledge to quickly get going with Cypher and start exploring Neo4j.

15.25.1. Start

SQL starts with the result you want — we SELECT what we want and then declare how to source it. In Cypher, the START clause is quite a different concept which specifies starting points in the graph from which the query will execute.

From a SQL point of view, the identifiers in START are like table names that point to a set of nodes or relationships. The set can be listed literally, come via parameters, or as I show in the following example, be defined by an index look-up.

So in fact rather than being SELECT-like, the START clause is somewhere between the FROM and the WHERE clause in SQL.

SQL Query.

SELECT *
FROM "Person"
WHERE name = 'Anakin'

NAME	ID	AGE	HAIR
1 rows
`Anakin`	`1`	`20`	`blonde`

Cypher Query.

START person=node:Person(name = 'Anakin')
RETURN person

person
1 row
32 ms
`Node[1]{name:"Anakin",id:1,age:20,hair:"blonde"}`

Cypher allows multiple starting points. This should not be strange from a SQL perspective — every table in the FROM clause is another starting point.

15.25.2. Match

Unlike SQL which operates on sets, Cypher predominantly works on sub-graphs. The relational equivalent is the current set of tuples being evaluated during a SELECT query.

The shape of the sub-graph is specified in the MATCH clause. The MATCH clause is analogous to the JOIN in SQL. A normal a→b relationship is an inner join between nodes a and b — both sides have to have at least one match, or nothing is returned.

We’ll start with a simple example, where we find all email addresses that are connected to the person “Anakin”. This is an ordinary one-to-many relationship.

SQL Query.

SELECT "Email".*
FROM "Person"
JOIN "Email" ON "Person".id = "Email".person_id
WHERE "Person".name = 'Anakin'

ADDRESS	COMMENT	PERSON_ID
2 rows
`anakin@example.com`	`home`	`1`
`anakin@example.org`	`work`	`1`

Cypher Query.

START person=node:Person(name = 'Anakin')
MATCH person-[:email]->email
RETURN email

email
2 rows
13 ms
`Node[7]{address:"anakin@example.com",comment:"home"}`
`Node[8]{address:"anakin@example.org",comment:"work"}`

There is no join table here, but if one is necessary the next example will show how to do that, writing the pattern relationship like so: -[r:belongs_to]-> will introduce (the equivalent of) join table available as the variable r. In reality this is a named relationship in Cypher, so we’re saying “join Person to Group via belongs_to.” To illustrate this, consider this image, comparing the SQL model and Neo4j/Cypher.

And here are example queries:

SQL Query.

SELECT "Group".*, "Person_Group".*
FROM "Person"
JOIN "Person_Group" ON "Person".id = "Person_Group".person_id
JOIN "Group" ON "Person_Group".Group_id="Group".id
WHERE "Person".name = 'Bridget'

NAME	ID	BELONGS_TO_GROUP_ID	PERSON_ID	GROUP_ID
1 rows
`Admin`	`4`	`3`	`2`	`4`

Cypher Query.

START person=node:Person(name = 'Bridget')
MATCH person-[r:belongs_to]->group
RETURN group, r

group	r
1 row
1 ms
`Node[6]{name:"Admin",id:4}`	`:belongs_to[0] {}`

An outer join is just as easy. Add a question mark -[?:KNOWS]-> and it’s an optional relationship between nodes — the outer join of Cypher.

Whether it’s a left outer join, or a right outer join is defined by which side of the pattern has a starting point. This example is a left outer join, because the bound node is on the left side:

SQL Query.

SELECT "Person".name, "Email".address
FROM "Person" LEFT
JOIN "Email" ON "Person".id = "Email".person_id

NAME	ADDRESS
3 rows
`Anakin`	`anakin@example.com`
`Anakin`	`anakin@example.org`
`Bridget`	`<null>`

Cypher Query.

START person=node:Person('name: *')
MATCH person-[?:email]->email
RETURN person.name, email.address?

person.name	email.address?
3 rows
47 ms
`"Anakin"`	`"anakin@example.com"`
`"Anakin"`	`"anakin@example.org"`
`"Bridget"`	`<null>`

Relationships in Neo4j are first class citizens — it’s like the SQL tables are pre-joined with each other. So, naturally, Cypher is designed to be able to handle highly connected data easily.

One such domain is tree structures — anyone that has tried storing tree structures in SQL knows that you have to work hard to get around the limitations of the relational model. There are even books on the subject.

To find all the groups and sub-groups that Bridget belongs to, this query is enough in Cypher:

Cypher Query.

START person=node:Person('name: Bridget')
MATCH person-[:belongs_to*]->group
RETURN person.name, group.name

person.name	group.name
3 rows
6 ms
`"Bridget"`	`"Admin"`
`"Bridget"`	`"Technichian"`
`"Bridget"`	`"User"`

The * after the relationship type means that there can be multiple hops across belongs_to relationships between group and user. Some SQL dialects have recursive abilities, that allow the expression of queries like this, but you may have a hard time wrapping your head around those. Expressing something like this in SQL is hugely impractical if not practically impossible.

15.25.3. Where

This is the easiest thing to understand — it’s the same animal in both languages. It filters out result sets/subgraphs. Not all predicates have an equivalent in the other language, but the concept is the same.

SQL Query.

SELECT *
FROM "Person"
WHERE "Person".age > 35 AND "Person".hair = 'blonde'

NAME	ID	AGE	HAIR
1 rows
`Bridget`	`2`	`40`	`blonde`

Cypher Query.

START person=node:Person('name: *')
WHERE person.age > 35 AND person.hair = 'blonde'
RETURN person

person
1 row
2 ms
`Node[2]{name:"Bridget",id:2,age:40,hair:"blonde"}`

15.25.4. Return

This is SQL’s SELECT. We just put it in the end because it felt better to have it there — you do a lot of matching and filtering, and finally, you return something.

Aggregate queries work just like they do in SQL, apart from the fact that there is no explicit GROUP BY clause. Everything in the return clause that is not an aggregate function will be used as the grouping columns.

SQL Query.

SELECT "Person".name, count(*)
FROM "Person"
GROUP BY "Person".name
ORDER BY "Person".name

NAME	C2
2 rows
`Anakin`	`1`
`Bridget`	`1`

Cypher Query.

START person=node:Person('name: *')
RETURN person.name, count(*)
ORDER BY person.name

person.name	count(*)
2 rows
8 ms
`"Anakin"`	`1`
`"Bridget"`	`1`

Order by is the same in both languages — ORDER BY expression ASC/DESC. Nothing weird here.