From Beginner to Pro: Mastering Data Filtering with Cypher in Graph Databases
Graph databases are becoming more and more popular as they offer powerful ways to model and query interconnected data. One of the key features that makes graph databases so useful is the ability to filter data easily. In this article, we’ll dive into the fundamentals of query filtering using Cypher, the query language for Neo4j, and explore how to filter data in different ways. Whether you’re a beginner or just getting started with graph databases, by the end of this post, you’ll understand how to apply these filtering techniques to extract meaningful insights from your data.
What You Will Learn
By the end of this guide, you will:
- Master basic query filtering techniques.
- Learn how to use Cypher to filter data based on specific criteria such as equality, inequality, and ranges.
- Understand how to evaluate strings and properties within your queries to retrieve the exact data you need.
Let’s Start with Filtering: The WHERE
Clause
The most commonly used clause for filtering data in Cypher is the WHERE
clause. It's where you define the conditions that the data must meet to be included in your results. Let’s break down some core concepts and show you how to use WHERE
effectively.
1. Testing Equality
The simplest form of filtering involves checking if a property is equal to a given value. For instance, you might want to find all movies an actor starred in from a specific year. Here’s how you can do it:
Example Query:
MATCH (a:Actor)-[:STARRED_IN]->(f:Film)
WHERE a.name = 'Leonardo DiCaprio'
AND f.releaseYear = 2010
RETURN f.title
Explanation:
- MATCH: We’re looking for
Actor
nodes connected toFilm
nodes by theSTARRED_IN
relationship - WHERE: We filter for actors whose name is “Leonardo DiCaprio” and films that were released in 2010.
- RETURN: The query returns the titles of all films that meet these conditions
By using the =
operator, you’re checking for exact matches of the properties, which is a very basic but powerful way to filter results.
Pro Tip: Queries execute faster if your database has indexes on frequently queried property values, such as name
or releaseYear
. This optimization can save you a lot of time in larger datasets.
2. Testing Inequality
What if you want to exclude certain values from your results? This is where inequality comes into play. The <>
operator allows you to filter out specific data. Let’s look at how to exclude an actor from a list:
Example Query:
MATCH (a:Actor)-[:STARRED_IN]->(f:Film)
WHERE a.name <> 'Leonardo DiCaprio'
AND f.title = 'Inception'
RETURN a.name
Explanation:
- MATCH: We look for all
Actor
nodes related toFilm
nodes by theSTARRED_IN
relationship - WHERE: We exclude “Leonardo DiCaprio” from the results while filtering for the movie “Inception”
- RETURN: The query will return the names of the actors who starred in “Inception”, excluding Leonardo DiCaprio
Here, the <>
operator is key for excluding specific values, which is especially useful when you want to refine your results further.
3. Testing Less Than, Greater Than, or Equal to
In addition to checking equality, you can also filter based on numerical values that are less than, greater than, or equal to a specific value. This is very useful for analyzing trends over time, such as finding movies released before or after a certain year.
Example Query:
MATCH (f:Film)
WHERE f.title = 'The Wolf of Wall Street'
RETURN
f.releaseYear < 2014 AS before2014,
f.releaseYear <= 2014 AS beforeOrEqual2014,
f.releaseYear > 2014 AS after2014,
f.releaseYear >= 2014 AS afterOrEqual2014
Explanation:
- MATCH: We search for a
Film
node with the title "The Wolf of Wall Street" - WHERE: We test the
releaseYear
property of the movie with multiple conditions:
- Is it before 2014?
- Is it before or equal to 2014?
- Is it after 2014?
- Is it after or equal to 2014?
- RETURN: This query will return boolean values (
true
orfalse
) for each of the conditions, showing whether the movie's release year satisfies the respective criteria
This kind of filtering is particularly useful when working with time-series data or making comparisons.
4. Filtering by Range
To check for values within a specific range, you can combine multiple conditions using the less than (<
) and greater than (>
) operators. This is especially helpful when you want to find data that falls between two values, such as movies released within a particular decade.
Example Query:
ATCH (a:Actor)-[:STARRED_IN]->(f:Film)
WHERE a.name = 'Leonardo DiCaprio'
AND 2000 <= f.releaseYear <= 2010
RETURN f.title, f.releaseYear
Explanation:
- MATCH: We find all films that Leonardo DiCaprio starred in
- WHERE: We filter these films to only include those released between 2000 and 2010, inclusive
- RETURN: The query returns the title and release year of the films that meet this range condition
You can easily modify the range by changing the <=
and >=
values to fine-tune your search.
5. Testing for Null Property Values
Sometimes, a property might not exist or might be null (i.e., absent). Cypher allows you to test for null properties using IS NULL
or IS NOT NULL
. This can be useful to identify nodes or relationships that are incomplete or missing certain information.
Example Query:
MATCH (a:Actor)
WHERE a.deathYear IS NOT NULL
AND a.birthYear >= 1980
RETURN a.name, a.birthYear, a.deathYear
Explanation:
- MATCH: We search for all
Actor
nodes - WHERE: We filter for actors who have a non-null
deathYear
and were born after 1980 - RETURN: The query will return the names, birth years, and death years of all matching actors
Similarly, if you wanted to find actors without a deathYear
, you would use IS NULL
:
MATCH (a:Actor)
WHERE a.deathYear IS NULL
AND a.birthYear <= 1950
RETURN a.name, a.birthYear, a.deathYear
6. Filtering by Node Labels and Relationship Types
Cypher allows you to filter not just on properties but also on node labels and relationship types. Let’s explore how you can do that.
Filtering by Node Labels:
In Neo4j, each node can have one or more labels that categorize the node. You can filter nodes based on these labels.
Example Query for Node Labels:
MATCH (a:Actor)
WHERE a.birthYear > 1970
AND a:Producer
RETURN a.name, a.birthYear
Explanation:
- MATCH: We search for nodes labeled
Actor
- WHERE: We filter for actors born after 1970 who are also labeled as
Producer
- RETURN: The query returns the names and birth years of these actors
By using the label check (a:Producer
), we ensure that only actors who are also producers are returned.
Filtering by Relationship Types:
Just like nodes, relationships also have types. You can filter based on the types of relationships between nodes.
Example Query for Relationship Types:
MATCH (a:Actor)-[r:STARRED_IN]->(f:Film)
WHERE a.name = 'Tom Hanks'
RETURN f.title, type(r) AS relationshipType
Explanation:
- MATCH: We find actors connected to films through the
STARRED_IN
relationship. - WHERE: We filter for movies that Tom Hanks starred in.
- RETURN: We return the movie titles along with the relationship type (i.e.,
STARRED_IN
).
7. Testing Strings: STARTS WITH, ENDS WITH, and CONTAINS
When the property is a string, Cypher allows you to test for specific patterns or substrings.
STARTS WITH
You can filter nodes whose string property starts with a particular set of characters.
Example Query:
ATCH (m:Movie)
WHERE m.title STARTS WITH 'Toy Story'
RETURN m.title, m.released
Explanation:
This query returns the titles and release years for all movies whose titles begin with “Toy Story”.
ENDS WITH
Similarly, you can filter properties that end with a specific set of characters.
Example Query:
MATCH (m:Movie)
WHERE m.title ENDS WITH ' I'
RETURN m.title, m.released
Explanation:
This query will return movies whose titles end with “ I”.
CONTAINS
Cypher also allows you to test if a substring is present within a property using the CONTAINS
operator.
Example Query:
MATCH (m:Movie)
WHERE m.title CONTAINS 'River'
RETURN m.title, m.released
Explanation:
This query will return movies whose titles contain the substring “River”.
Case-Sensitive String Matching
String matching in Cypher is case-sensitive by default. If case mismatches occur, your queries might miss relevant data.
Example Query for Case-Sensitive Matching:
MATCH (p:Person)
WHERE toLower(p.name) ENDS WITH 'demille'
RETURN p.name
Explanation:
This query ensures that all variations of “DeMille” (like “DeMille” or “demille”) are captured by converting the string to lowercase.
About Indexes for Queries
If you apply transformations like toUpper()
or toLower()
, Cypher disables the use of indexes for those properties.
Example:
EXPLAIN MATCH (p:Person)
WHERE toLower(p.name) = 'demille'
RETURN p.name
This query can’t take full advantage of indexes because the transformation function (toLower()
) prevents Cypher from using an index on the name
property.
Wrapping Up
Filtering is a key aspect of querying in Cypher, and these techniques will help you narrow down your results efficiently. Whether you’re filtering based on exact values, ranges, patterns, or null properties, mastering these filters allows you to gain valuable insights from your graph data.
I’d love to hear your thoughts and suggestions — feel free to share them in the comments. Stay tuned and follow for more articles on data science, graph databases, Cypher, and data management tips!