From Beginner to Pro: Mastering Data Filtering with Cypher in Graph Databases

Parveen Khurana
6 min readJan 10, 2025

--

Graph databases are becoming more and more popular as they offer powerful ways to model and query interconnected data. One of the key features that makes graph databases so useful is the ability to filter data easily. In this article, we’ll dive into the fundamentals of query filtering using Cypher, the query language for Neo4j, and explore how to filter data in different ways. Whether you’re a beginner or just getting started with graph databases, by the end of this post, you’ll understand how to apply these filtering techniques to extract meaningful insights from your data.

Here is a snapshot of how diff. nodes would look like in the database.

What You Will Learn

By the end of this guide, you will:

  • Master basic query filtering techniques.
  • Learn how to use Cypher to filter data based on specific criteria such as equality, inequality, and ranges.
  • Understand how to evaluate strings and properties within your queries to retrieve the exact data you need.

Let’s Start with Filtering: The WHERE Clause

The most commonly used clause for filtering data in Cypher is the WHERE clause. It's where you define the conditions that the data must meet to be included in your results. Let’s break down some core concepts and show you how to use WHERE effectively.

1. Testing Equality

The simplest form of filtering involves checking if a property is equal to a given value. For instance, you might want to find all movies an actor starred in from a specific year. Here’s how you can do it:

Example Query:

MATCH (a:Actor)-[:STARRED_IN]->(f:Film)
WHERE a.name = 'Leonardo DiCaprio'
AND f.releaseYear = 2010
RETURN f.title

Explanation:

  1. MATCH: We’re looking for Actor nodes connected to Film nodes by the STARRED_IN relationship
  2. WHERE: We filter for actors whose name is “Leonardo DiCaprio” and films that were released in 2010.
  3. RETURN: The query returns the titles of all films that meet these conditions

By using the = operator, you’re checking for exact matches of the properties, which is a very basic but powerful way to filter results.

Pro Tip: Queries execute faster if your database has indexes on frequently queried property values, such as name or releaseYear. This optimization can save you a lot of time in larger datasets.

2. Testing Inequality

What if you want to exclude certain values from your results? This is where inequality comes into play. The <> operator allows you to filter out specific data. Let’s look at how to exclude an actor from a list:

Example Query:

MATCH (a:Actor)-[:STARRED_IN]->(f:Film)
WHERE a.name <> 'Leonardo DiCaprio'
AND f.title = 'Inception'
RETURN a.name

Explanation:

  1. MATCH: We look for all Actor nodes related to Film nodes by the STARRED_IN relationship
  2. WHERE: We exclude “Leonardo DiCaprio” from the results while filtering for the movie “Inception”
  3. RETURN: The query will return the names of the actors who starred in “Inception”, excluding Leonardo DiCaprio

Here, the <> operator is key for excluding specific values, which is especially useful when you want to refine your results further.

3. Testing Less Than, Greater Than, or Equal to

In addition to checking equality, you can also filter based on numerical values that are less than, greater than, or equal to a specific value. This is very useful for analyzing trends over time, such as finding movies released before or after a certain year.

Example Query:

MATCH (f:Film)
WHERE f.title = 'The Wolf of Wall Street'
RETURN
f.releaseYear < 2014 AS before2014,
f.releaseYear <= 2014 AS beforeOrEqual2014,
f.releaseYear > 2014 AS after2014,
f.releaseYear >= 2014 AS afterOrEqual2014

Explanation:

  1. MATCH: We search for a Film node with the title "The Wolf of Wall Street"
  2. WHERE: We test the releaseYear property of the movie with multiple conditions:
  • Is it before 2014?
  • Is it before or equal to 2014?
  • Is it after 2014?
  • Is it after or equal to 2014?
  1. RETURN: This query will return boolean values (true or false) for each of the conditions, showing whether the movie's release year satisfies the respective criteria

This kind of filtering is particularly useful when working with time-series data or making comparisons.

4. Filtering by Range

To check for values within a specific range, you can combine multiple conditions using the less than (<) and greater than (>) operators. This is especially helpful when you want to find data that falls between two values, such as movies released within a particular decade.

Example Query:

ATCH (a:Actor)-[:STARRED_IN]->(f:Film)
WHERE a.name = 'Leonardo DiCaprio'
AND 2000 <= f.releaseYear <= 2010
RETURN f.title, f.releaseYear

Explanation:

  1. MATCH: We find all films that Leonardo DiCaprio starred in
  2. WHERE: We filter these films to only include those released between 2000 and 2010, inclusive
  3. RETURN: The query returns the title and release year of the films that meet this range condition

You can easily modify the range by changing the <= and >= values to fine-tune your search.

5. Testing for Null Property Values

Sometimes, a property might not exist or might be null (i.e., absent). Cypher allows you to test for null properties using IS NULL or IS NOT NULL. This can be useful to identify nodes or relationships that are incomplete or missing certain information.

Example Query:

MATCH (a:Actor)
WHERE a.deathYear IS NOT NULL
AND a.birthYear >= 1980
RETURN a.name, a.birthYear, a.deathYear

Explanation:

  1. MATCH: We search for all Actor nodes
  2. WHERE: We filter for actors who have a non-null deathYear and were born after 1980
  3. RETURN: The query will return the names, birth years, and death years of all matching actors

Similarly, if you wanted to find actors without a deathYear, you would use IS NULL:

MATCH (a:Actor)
WHERE a.deathYear IS NULL
AND a.birthYear <= 1950
RETURN a.name, a.birthYear, a.deathYear

6. Filtering by Node Labels and Relationship Types

Cypher allows you to filter not just on properties but also on node labels and relationship types. Let’s explore how you can do that.

Filtering by Node Labels:

In Neo4j, each node can have one or more labels that categorize the node. You can filter nodes based on these labels.

Example Query for Node Labels:

MATCH (a:Actor)
WHERE a.birthYear > 1970
AND a:Producer
RETURN a.name, a.birthYear

Explanation:

  1. MATCH: We search for nodes labeled Actor
  2. WHERE: We filter for actors born after 1970 who are also labeled as Producer
  3. RETURN: The query returns the names and birth years of these actors

By using the label check (a:Producer), we ensure that only actors who are also producers are returned.

Filtering by Relationship Types:

Just like nodes, relationships also have types. You can filter based on the types of relationships between nodes.

Example Query for Relationship Types:

MATCH (a:Actor)-[r:STARRED_IN]->(f:Film)
WHERE a.name = 'Tom Hanks'
RETURN f.title, type(r) AS relationshipType

Explanation:

  1. MATCH: We find actors connected to films through the STARRED_IN relationship.
  2. WHERE: We filter for movies that Tom Hanks starred in.
  3. RETURN: We return the movie titles along with the relationship type (i.e., STARRED_IN).

7. Testing Strings: STARTS WITH, ENDS WITH, and CONTAINS

When the property is a string, Cypher allows you to test for specific patterns or substrings.

STARTS WITH

You can filter nodes whose string property starts with a particular set of characters.

Example Query:

ATCH (m:Movie)
WHERE m.title STARTS WITH 'Toy Story'
RETURN m.title, m.released

Explanation:

This query returns the titles and release years for all movies whose titles begin with “Toy Story”.

ENDS WITH

Similarly, you can filter properties that end with a specific set of characters.

Example Query:

MATCH (m:Movie)
WHERE m.title ENDS WITH ' I'
RETURN m.title, m.released

Explanation:

This query will return movies whose titles end with “ I”.

CONTAINS

Cypher also allows you to test if a substring is present within a property using the CONTAINS operator.

Example Query:

MATCH (m:Movie)
WHERE m.title CONTAINS 'River'
RETURN m.title, m.released

Explanation:

This query will return movies whose titles contain the substring “River”.

Case-Sensitive String Matching

String matching in Cypher is case-sensitive by default. If case mismatches occur, your queries might miss relevant data.

Example Query for Case-Sensitive Matching:

MATCH (p:Person)
WHERE toLower(p.name) ENDS WITH 'demille'
RETURN p.name

Explanation:

This query ensures that all variations of “DeMille” (like “DeMille” or “demille”) are captured by converting the string to lowercase.

About Indexes for Queries

If you apply transformations like toUpper() or toLower(), Cypher disables the use of indexes for those properties.

Example:

EXPLAIN MATCH (p:Person)
WHERE toLower(p.name) = 'demille'
RETURN p.name

This query can’t take full advantage of indexes because the transformation function (toLower()) prevents Cypher from using an index on the name property.

Wrapping Up

Filtering is a key aspect of querying in Cypher, and these techniques will help you narrow down your results efficiently. Whether you’re filtering based on exact values, ranges, patterns, or null properties, mastering these filters allows you to gain valuable insights from your graph data.

I’d love to hear your thoughts and suggestions — feel free to share them in the comments. Stay tuned and follow for more articles on data science, graph databases, Cypher, and data management tips!

--

--

Parveen Khurana
Parveen Khurana

Written by Parveen Khurana

Writing on Data Science, Philosophy, Emotional Health | Grateful for the little moments and every reader | Nature lover at heart | Follow for reflective musings

No responses yet