# Filtering https://api-docs.lumar.io/docs/graphql/filtering To filter results, we use the `filter` argument. It takes the `ConnectionFilterInput` corresponding to the entity we're retrieving as a value. These inputs are defined as: ```graphql input ExampleConnectionFilterInput { _and: [ExampleConnectionFilterInput!] _or: [ExampleConnectionFilterInput!] url: ConnectionStringFilterInput # ...and so on for other field types like Boolean, Date, Int etc. } ``` Each field in the `ConnectionFilterInput` can be filtered by a subset of predicates available for the given data type. The sections below list every predicate grouped by type. ## String predicates `ConnectionStringFilterInput` -- used for text fields such as URLs, names, and descriptions. | Predicate | Type | Description | | ----------------- | ----------- | ---------------------------------------------------------------------------- | | `eq` | `String` | Exact match. | | `ne` | `String` | Not equal. | | `contains` | `String` | Field contains the substring. | | `notContains` | `String` | Field does not contain the substring. | | `beginsWith` | `String` | Field starts with the value. | | `endsWith` | `String` | Field ends with the value. | | `matchesRegex` | `String` | Field matches the regular expression. | | `notMatchesRegex` | `String` | Field does not match the regular expression. | | `in` | `[String!]` | Field value is one of the provided values. | | `notIn` | `[String!]` | Field value is not one of the provided values. | | `isEmpty` | `Boolean` | When `true`, matches empty strings. When `false`, matches non-empty strings. | | `isNull` | `Boolean` | When `true`, matches `null` values. When `false`, matches non-null values. | **Examples:** ```graphql # URLs containing "blog" filter: { url: { contains: "blog" } } # URLs starting with "https://example.com" filter: { url: { beginsWith: "https://example.com" } } # Match a regex pattern filter: { url: { matchesRegex: "^https://[^/]+/products/\\d+" } } # One of several exact values filter: { name: { in: ["Google", "Wikipedia", "GitHub"] } } ``` ## Numeric predicates (Int, Float, BigInt) `ConnectionIntFilterInput`, `ConnectionFloatFilterInput`, and `ConnectionBigIntFilterInput` share the same set of predicates. | Predicate | Type | Description | | --------- | --------- | ---------------------------------------- | | `eq` | `Int` | Equal to. | | `ne` | `Int` | Not equal to. | | `gt` | `Int` | Greater than. | | `ge` | `Int` | Greater than or equal to. | | `lt` | `Int` | Less than. | | `le` | `Int` | Less than or equal to. | | `in` | `[Int!]` | Value is one of the provided values. | | `notIn` | `[Int!]` | Value is not one of the provided values. | | `isNull` | `Boolean` | When `true`, matches `null` values. | **Examples:** ```graphql # Pages returning a 404 filter: { httpStatusCode: { eq: 404 } } # Pages with more than 100 inlinks filter: { inLinksInternalCount: { gt: 100 } } # Status codes in a set filter: { httpStatusCode: { in: [301, 302, 307] } } # Combine range predicates for a between filter filter: { pageSize: { ge: 1000, le: 5000 } } ``` ## Boolean predicates `ConnectionBooleanFilterInput` -- used for true/false fields. | Predicate | Type | Description | | --------- | --------- | ----------------------------------- | | `eq` | `Boolean` | Matches `true` or `false`. | | `ne` | `Boolean` | Does not match the given value. | | `isNull` | `Boolean` | When `true`, matches `null` values. | **Example:** ```graphql # Pages without structured data filter: { hasStructuredData: { eq: false } } ``` ## Date predicates `ConnectionDateFilterInput` -- used for timestamp fields such as `createdAt` and `updatedAt`. | Predicate | Type | Description | | --------- | ------------- | ----------------------------------- | | `eq` | `DateTime` | Exact date match. | | `ne` | `DateTime` | Not equal. | | `gt` | `DateTime` | After the given date. | | `ge` | `DateTime` | On or after the given date. | | `lt` | `DateTime` | Before the given date. | | `le` | `DateTime` | On or before the given date. | | `in` | `[DateTime!]` | One of the provided dates. | | `notIn` | `[DateTime!]` | Not one of the provided dates. | | `isNull` | `Boolean` | When `true`, matches `null` values. | **Example:** ```graphql # Projects created after Jan 1 2025 filter: { createdAt: { gt: "2025-01-01T00:00:00Z" } } ``` ## Enum predicates Enum fields (e.g. `CrawlStatus`, `CrawlPriority`) use dedicated filter inputs that follow the same pattern. | Predicate | Type | Description | | --------- | --------------- | --------------------------------------------- | | `eq` | `` | Exact match on enum value. | | `ne` | `` | Not equal. | | `in` | `[!]` | Value is one of the provided enum values. | | `notIn` | `[!]` | Value is not one of the provided enum values. | | `isNull` | `Boolean` | When `true`, matches `null` values. | **Example:** ```graphql # Only finished crawls filter: { status: { eq: Finished } } # Crawls that are queued or running filter: { status: { in: [Queued, Running] } } ``` ## Array predicates `ConnectionIntArrayFilterInput` and `ConnectionStringArrayFilterInput` -- used for array-valued fields. | Predicate | Type | Description | | ---------------------- | ---------------- | ---------------------------------------------------------------------------- | | `arrayContains` | `String` / `Int` | Array includes the given element. | | `arrayNotContains` | `String` / `Int` | Array does not include the given element. | | `arrayContainsLike` | `String` | Array includes an element matching the pattern (string arrays only). | | `arrayNotContainsLike` | `String` | Array does not include an element matching the pattern (string arrays only). | | `isNull` | `Boolean` | When `true`, matches `null` values. | **Example:** ```graphql # Pages that have a specific tag in their tags array filter: { tags: { arrayContains: "navigation" } } ``` ## Combining filters with `_and` and `_or` The `_and` and `_or` arrays are special properties allowing you to write more complex filters. `ConnectionFilterInput` objects in the `_and` array are combined using logical AND operator, and those inside `_or` array are combined using logical OR operator. All root-level conditions are always combined using logical AND operator. That means a filter such as this: ```graphql { _and: [ { sitemapsInCount: { eq: 0 } } ], _or: [ { url: { contains: "news" } }, { url: { contains: "guides" } } ], hasStructuredData: { eq: false } } ``` is exactly the same as: ```graphql { _and: [ { sitemapsInCount: { eq: 0 } }, { _or: [ { url: { contains: "news" } }, { url: { contains: "guides" } } ] }, { hasStructuredData: { eq: false } }, ] } ``` ### Nesting `_and` inside `_or` (and vice versa) You can nest logical operators to express arbitrarily complex conditions: ```graphql filter: { _or: [ { _and: [ { httpStatusCode: { eq: 200 } }, { pageSize: { gt: 50000 } } ] }, { _and: [ { httpStatusCode: { eq: 301 } }, { redirectUrl: { contains: "legacy" } } ] } ] } ``` This returns pages that are either (200 AND large) or (301 AND redirecting to a legacy URL). ### Multiple predicates on the same field You can apply more than one predicate to a single field to create range filters: ```graphql filter: { httpStatusCode: { ge: 400, lt: 500 } } ``` This returns all pages with a 4xx client error status code. ## Example -- Getting Projects with a filter ```graphql query getAccount($id: ObjectID!) { getAccount(id: $id) { projects( filter: { _or: [{ name: { eq: "Google" } }, { name: { eq: "Wikipedia" } }] } ) { nodes { name } } } } ``` **Variables:** ```json { "id": 123 } ``` **Response:** ```json { "data": { "getAccount": { "projects": { "nodes": [ { "name": "Google" }, { "name": "Wikipedia" } ] } } } } ``` ## Example -- Complex filtering on crawl URLs The following example combines `_and` and `_or` to filter crawl URLs that are HTTP 200 and belong to either the `/blog/` or `/news/` path: ```graphql query FilteredCrawlUrls($crawlId: ObjectID!) { getReportStat( input: { crawlId: $crawlId, reportTemplateCode: "all_pages" } ) { crawlUrls( first: 5 filter: { _and: [ { httpStatusCode: { eq: 200 } } { _or: [ { url: { contains: "/blog/" } } { url: { contains: "/news/" } } ] } ] } ) { nodes { url httpStatusCode } totalCount } } } ``` **Variables:** ```json { "crawlId": "TjAwNUNyYXdsMTU4MzI0NQ" } ``` **Response:** ```json { "data": { "getReportStat": { "crawlUrls": { "nodes": [ { "url": "https://www.example.com/blog/seo-tips", "httpStatusCode": 200 }, { "url": "https://www.example.com/news/latest-update", "httpStatusCode": 200 } ], "totalCount": 42 } } } } ```