Skip to main content

Filtering

To filter results, we use the filter argument. It takes the ConnectionFilterInput corresponding to the entity we're retrieving as a value. These inputs are defined as:

input ExampleConnectionFilterInput {
_and: [ExampleConnectionFilterInput!]
_or: [ExampleConnectionFilterInput!]
url: ConnectionStringFilterInput
# ...and so on for other field types like Boolean, Date, Int etc.
}

Each field in the ConnectionFilterInput can be filtered by a subset of predicates available for the given data type. The sections below list every predicate grouped by type.

String predicates

ConnectionStringFilterInput -- used for text fields such as URLs, names, and descriptions.

PredicateTypeDescription
eqStringExact match.
neStringNot equal.
containsStringField contains the substring.
notContainsStringField does not contain the substring.
beginsWithStringField starts with the value.
endsWithStringField ends with the value.
matchesRegexStringField matches the regular expression.
notMatchesRegexStringField does not match the regular expression.
in[String!]Field value is one of the provided values.
notIn[String!]Field value is not one of the provided values.
isEmptyBooleanWhen true, matches empty strings. When false, matches non-empty strings.
isNullBooleanWhen true, matches null values. When false, matches non-null values.

Examples:

# URLs containing "blog"
filter: { url: { contains: "blog" } }

# URLs starting with "https://example.com"
filter: { url: { beginsWith: "https://example.com" } }

# Match a regex pattern
filter: { url: { matchesRegex: "^https://[^/]+/products/\\d+" } }

# One of several exact values
filter: { name: { in: ["Google", "Wikipedia", "GitHub"] } }

Numeric predicates (Int, Float, BigInt)

ConnectionIntFilterInput, ConnectionFloatFilterInput, and ConnectionBigIntFilterInput share the same set of predicates.

PredicateTypeDescription
eqIntEqual to.
neIntNot equal to.
gtIntGreater than.
geIntGreater than or equal to.
ltIntLess than.
leIntLess than or equal to.
in[Int!]Value is one of the provided values.
notIn[Int!]Value is not one of the provided values.
isNullBooleanWhen true, matches null values.

Examples:

# Pages returning a 404
filter: { httpStatusCode: { eq: 404 } }

# Pages with more than 100 inlinks
filter: { inLinksInternalCount: { gt: 100 } }

# Status codes in a set
filter: { httpStatusCode: { in: [301, 302, 307] } }

# Combine range predicates for a between filter
filter: { pageSize: { ge: 1000, le: 5000 } }

Boolean predicates

ConnectionBooleanFilterInput -- used for true/false fields.

PredicateTypeDescription
eqBooleanMatches true or false.
neBooleanDoes not match the given value.
isNullBooleanWhen true, matches null values.

Example:

# Pages without structured data
filter: { hasStructuredData: { eq: false } }

Date predicates

ConnectionDateFilterInput -- used for timestamp fields such as createdAt and updatedAt.

PredicateTypeDescription
eqDateTimeExact date match.
neDateTimeNot equal.
gtDateTimeAfter the given date.
geDateTimeOn or after the given date.
ltDateTimeBefore the given date.
leDateTimeOn or before the given date.
in[DateTime!]One of the provided dates.
notIn[DateTime!]Not one of the provided dates.
isNullBooleanWhen true, matches null values.

Example:

# Projects created after Jan 1 2025
filter: { createdAt: { gt: "2025-01-01T00:00:00Z" } }

Enum predicates

Enum fields (e.g. CrawlStatus, CrawlPriority) use dedicated filter inputs that follow the same pattern.

PredicateTypeDescription
eq<EnumType>Exact match on enum value.
ne<EnumType>Not equal.
in[<EnumType>!]Value is one of the provided enum values.
notIn[<EnumType>!]Value is not one of the provided enum values.
isNullBooleanWhen true, matches null values.

Example:

# Only finished crawls
filter: { status: { eq: Finished } }

# Crawls that are queued or running
filter: { status: { in: [Queued, Running] } }

Array predicates

ConnectionIntArrayFilterInput and ConnectionStringArrayFilterInput -- used for array-valued fields.

PredicateTypeDescription
arrayContainsString / IntArray includes the given element.
arrayNotContainsString / IntArray does not include the given element.
arrayContainsLikeStringArray includes an element matching the pattern (string arrays only).
arrayNotContainsLikeStringArray does not include an element matching the pattern (string arrays only).
isNullBooleanWhen true, matches null values.

Example:

# Pages that have a specific tag in their tags array
filter: { tags: { arrayContains: "navigation" } }

Combining filters with _and and _or

The _and and _or arrays are special properties allowing you to write more complex filters. ConnectionFilterInput objects in the _and array are combined using logical AND operator, and those inside _or array are combined using logical OR operator. All root-level conditions are always combined using logical AND operator. That means a filter such as this:

{
_and: [
{ sitemapsInCount: { eq: 0 } }
],
_or: [
{ url: { contains: "news" } },
{ url: { contains: "guides" } }
],
hasStructuredData: { eq: false }
}

is exactly the same as:

{
_and: [
{ sitemapsInCount: { eq: 0 } },
{
_or: [
{ url: { contains: "news" } },
{ url: { contains: "guides" } }
]
},
{ hasStructuredData: { eq: false } },
]
}

Nesting _and inside _or (and vice versa)

You can nest logical operators to express arbitrarily complex conditions:

filter: {
_or: [
{
_and: [
{ httpStatusCode: { eq: 200 } },
{ pageSize: { gt: 50000 } }
]
},
{
_and: [
{ httpStatusCode: { eq: 301 } },
{ redirectUrl: { contains: "legacy" } }
]
}
]
}

This returns pages that are either (200 AND large) or (301 AND redirecting to a legacy URL).

Multiple predicates on the same field

You can apply more than one predicate to a single field to create range filters:

filter: {
httpStatusCode: { ge: 400, lt: 500 }
}

This returns all pages with a 4xx client error status code.

Example -- Getting Projects with a filter

Operation: query getAccount($id: ObjectID!) { getAccount(id: $id) { projects( filter: { _or: [{ name: { eq: "Google" } }, { name: { eq: "Wikipedia" } }] } ) { nodes { name } } } }Variables: { "id": 123 }Response Example: { "data": { "getAccount": { "projects": { "nodes": [ { "name": "Google" }, { "name": "Wikipedia" } ] } } } }
getAccountTry in Explorer
GraphQL
query getAccount($id: ObjectID!) {
getAccount(id: $id) {
projects(
filter: {
_or: [{ name: { eq: "Google" } }, { name: { eq: "Wikipedia" } }]
}
) {
nodes {
name
}
}
}
}

Example -- Complex filtering on crawl URLs

The following example combines _and and _or to filter crawl URLs that are HTTP 200 and belong to either the /blog/ or /news/ path:

Operation: query FilteredCrawlUrls($crawlId: ObjectID!) { getReportStat( input: { crawlId: $crawlId, reportTemplateCode: "all_pages" } ) { crawlUrls( first: 5 filter: { _and: [ { httpStatusCode: { eq: 200 } } { _or: [ { url: { contains: "/blog/" } } { url: { contains: "/news/" } } ] } ] } ) { nodes { url httpStatusCode } totalCount } } }Variables: { "crawlId": "TjAwNUNyYXdsMTU4MzI0NQ" }Response Example: { "data": { "getReportStat": { "crawlUrls": { "nodes": [ { "url": "https://www.example.com/blog/seo-tips", "httpStatusCode": 200 }, { "url": "https://www.example.com/news/latest-update", "httpStatusCode": 200 } ], "totalCount": 42 } } } }
FilteredCrawlUrlsTry in Explorer
GraphQL
query FilteredCrawlUrls($crawlId: ObjectID!) {
getReportStat(
input: { crawlId: $crawlId, reportTemplateCode: "all_pages" }
) {
crawlUrls(
first: 5
filter: {
_and: [
{ httpStatusCode: { eq: 200 } }
{
_or: [
{ url: { contains: "/blog/" } }
{ url: { contains: "/news/" } }
]
}
]
}
) {
nodes {
url
httpStatusCode
}
totalCount
}
}
}