15.13. Aggregation

15.13.1. Introduction
15.13.2. COUNT
15.13.3. Count nodes
15.13.4. Group Count Relationship Types
15.13.5. Count entities
15.13.6. Count non-null values
15.13.7. SUM
15.13.8. AVG
15.13.9. MAX
15.13.10. MIN
15.13.11. COLLECT
15.13.12. DISTINCT

15.13.1. Introduction

To calculate aggregated data, Cypher offers aggregation, much like SQL’s GROUP BY.

Aggregate functions take multiple input values and calculate an aggregated value from them. Examples are AVG that calculate the average of multiple numeric values, or MIN that finds the smallest numeric value in a set of values.

Aggregation can be done over all the matching sub graphs, or it can be further divided by introducing key values. These are non-aggregate expressions, that are used to group the values going into the aggregate functions.

So, if the return statement looks something like this:

RETURN n, count(*)

We have two return expressions — n, and count(*). The first, n, is no aggregate function, and so it will be the grouping key. The latter, count(*) is an aggregate expression. So the matching subgraphs will be divided into different buckets, depending on the grouping key. The aggregate function will then run on these buckets, calculating the aggregate values.

The last piece of the puzzle is the DISTINCT keyword. It is used to make all values unique before running them through an aggregate function.

An example might be helpful:

Query

START me=node(1)
MATCH me-->friend-->friend_of_friend
RETURN count(distinct friend_of_friend), count(friend_of_friend)

In this example we are trying to find all our friends of friends, and count them. The first aggregate function, count(distinct friend_of_friend), will only see a friend_of_friend once — DISTINCT removes the duplicates. The latter aggregate function, count(friend_of_friend), might very well see the same friend_of_friend multiple times. Since there is no real data in this case, an empty result is returned. See the sections below for real data.

Result

count(distinct friend_of_friend)count(friend_of_friend)
1 row
2 ms

0

0


Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} START me=node(1) MATCH me-->friend-->friend_of_friend RETURN count(distinct friend_of_friend), count(friend_of_friend)

The following examples are assuming the example graph structure below.

Graph

15.13.2. COUNT

COUNT is used to count the number of rows. COUNT can be used in two forms — COUNT(*) which just counts the number of matching rows, and COUNT(<identifier>), which counts the number of non-null values in <identifier>.

15.13.3. Count nodes

To count the number of nodes, for example the number of nodes connected to one node, you can use count(*).

Query

START n=node(2)
MATCH (n)-->(x)
RETURN n, count(*)

This returns the start node and the count of related nodes.

Result

ncount(*)
1 row
0 ms

Node[2]{name:"A",property:13}

3


Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} start n=node(2) match (n)-->(x) return n, count(*)

15.13.4. Group Count Relationship Types

To count the groups of relationship types, return the types and count them with count(*).

Query

START n=node(2)
MATCH (n)-[r]->()
RETURN type(r), count(*)

The relationship types and their group count is returned by the query.

Result

type(r)count(*)
1 row
0 ms

"KNOWS"

3


Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} start n=node(2) match (n)-[r]->() return type(r), count(*)

15.13.5. Count entities

Instead of counting the number of results with count(*), it might be more expressive to include the name of the identifier you care about.

Query

START n=node(2)
MATCH (n)-->(x)
RETURN count(x)

The example query returns the number of connected nodes from the start node.

Result

count(x)
1 row
0 ms

3


Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} start n=node(2) match (n)-->(x) return count(x)

15.13.6. Count non-null values

You can count the non-null values by using count(<identifier>).

Query

START n=node(2,3,4,1)
RETURN count(n.property?)

The count of related nodes with the property property set is returned by the query.

Result

count(n.property?)
1 row
0 ms

3


Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} start n=node(2,3,4,1) return count(n.property?)

15.13.7. SUM

The SUM aggregation function simply sums all the numeric values it encounters. Nulls are silently dropped. This is an example of how you can use SUM.

Query

START n=node(2,3,4)
RETURN sum(n.property)

This returns the sum of all the values in the property property.

Result

sum(n.property)
1 row
0 ms

90


Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} start n=node(2,3,4) return sum(n.property)

15.13.8. AVG

AVG calculates the average of a numeric column.

Query

START n=node(2,3,4)
RETURN avg(n.property)

The average of all the values in the property property is returned by the example query.

Result

avg(n.property)
1 row
0 ms

30.0


Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} start n=node(2,3,4) return avg(n.property)

15.13.9. MAX

MAX find the largets value in a numeric column.

Query

START n=node(2,3,4)
RETURN max(n.property)

The largest of all the values in the property property is returned.

Result

max(n.property)
1 row
0 ms

44


Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} start n=node(2,3,4) return max(n.property)

15.13.10. MIN

MIN takes a numeric property as input, and returns the smallest value in that column.

Query

START n=node(2,3,4)
RETURN min(n.property)

This returns the smallest of all the values in the property property.

Result

min(n.property)
1 row
0 ms

13


Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} start n=node(2,3,4) return min(n.property)

15.13.11. COLLECT

COLLECT collects all the values into a list.

Query

START n=node(2,3,4)
RETURN collect(n.property)

Returns a single row, with all the values collected.

Result

collect(n.property)
1 row
0 ms

[13,33,44]


Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} start n=node(2,3,4) return collect(n.property)

15.13.12. DISTINCT

All aggregation functions also take the DISTINCT modifier, which removes duplicates from the values. So, to count the number of unique eye colors from nodes related to a, this query can be used:

Query

START a=node(2)
MATCH a-->b
RETURN count(distinct b.eyes)

Returns the number of eye colors.

Result

count(distinct b.eyes)
1 row
0 ms

2


Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} start a=node(2) match a-->b return count(distinct b.eyes)