Skip to main content
warning

This is a beta feature. If you would like to participate in the beta, please contact your account manager or support.

Getting started with custom metrics

Extend Lumar with your custom metrics and unlock the full power of extracting data from the web.

Using TypeScript/JavaScript you can create custom metrics that extract data from your pages. You can use Puppeteer to access the DOM and extract data or use the provided API to extract data from the DOM.

Custom metrics can be grouped into custom metric containers. Each custom metric container can contain multiple custom metrics.

Create a new project

Generate a fresh TypeScript custom metric container project and register it with the API.

npx @deepcrawl/oreo@latest metric bootstrap my-metrics/

When bootstraping a new container you will be asked to provide a name. This name will be used to identify the container in the API and it needs to be globally unique.

You will also have a choice between DOM or Puppeteer container. Using Puppeteer requires the project in Lumar to be run with JS rendering enabled.

In case you want to extract metrics from images or style sheets, you can enable that as well.

You can also provide all of these options via CLI without getting prompted. Refer to our CLI docs. for all the available arguments.

npx @deepcrawl/oreo@latest metric bootstrap examples/bootstrap-example3 --name "MyGloballyUniqueMetricContainerName" --inputType=Puppeteer --resourceTypes=Document --description "Example metric extraction container"

After bootstrapping your container project, you will need to install dependencies with your package manager (npm, yarn, pnpm or other).

Change to the directory where you have bootstrapped your container project and run the following command.

npm install

Writing metrics extraction code

Open the container project in your favorite editor. You will find a src/index.ts file with a sample metrics extraction script.

export interface IMetrics extends MetricScriptBasicOutput {
url: string;
}

export const handler: MetricScriptHandler<IMetrics, IPuppeteerRequestContainerInput> = async (input, _context) => {
return {
url: input.page.url(),
};
};

Your container needs to explicitly specify its return types. This is done by defining an interface that extends the MetricScriptBasicOutput interface, and referencing that interface name in .oreorc.json or .oreorc.ts file under metricsTypeName key. (Make sure it's exported from the entrypoint file.)

You can use either .oreorc.json (legacy) or .oreorc.ts (type-safe) for configuration. The TypeScript option provides better type safety and IntelliSense support.

{
"id": "XXX",
"handler": "handler",
"entrypoint": "src/index.ts",
"metricsTypeName": "IMetrics"
}

Your container can export one or more metrics. Each metric needs to have a unique name and a type.

Handler phases

Containers can define multiple handler phases inside the handlers section of .oreorc.json / .oreorc.ts. The request handler is required for URL-level metrics, while preCrawl and postCrawl handlers run exactly once per crawl—ideal for seeding data before pages are processed or for pushing aggregated results after the crawl ends. Each phase can specify its own handler, entrypoint, timeout, and (for request handlers) metricsTypeName.

{
"id": "XXX",
"handlers": {
"request": {
"handler": "handler",
"entrypoint": "src/index.ts",
"metricsTypeName": "IMetrics"
},
"preCrawl": {
"handler": "preCrawlHandler",
"entrypoint": "src/index.ts"
},
"postCrawl": {
"handler": "postCrawlHandler",
"entrypoint": "src/post-crawl.ts"
}
}
}

Use the specific container input types to author these lifecycle handlers:

import type {
IPreCrawlContainerInput,
IPostCrawlContainerInput,
MetricScriptHandler,
} from "@deepcrawl/custom-metric-types";

export const preCrawlHandler: MetricScriptHandler<{}, IPreCrawlContainerInput> = async input => {
// Warm up cache, fetch secrets, or emit crawl-level metadata
return {};
};

export const postCrawlHandler: MetricScriptHandler<{}, IPostCrawlContainerInput> = async input => {
// Aggregate crawl results or emit final metrics
return {};
};

preCrawl receives the crawl definition before any URLs are processed, while postCrawl receives the final crawl context (including accumulated stats and failures). Both handlers can share the same entrypoint file or live in separate modules to keep concerns isolated.

.oreorc configuration reference

Your .oreorc.ts (or legacy .json) file is validated against the ContainerConfigData schema used by the CLI and API. Every field below is optional unless stated otherwise, so you can adopt only the parts you need.

Top-level fields

  • id — The CustomMetricContainer ID. Required when you already have a container in Lumar and want the CLI to publish new versions against it.
  • handlers — Object that defines per-phase handlers. request is required; preCrawl and postCrawl are optional (see the next section for the available handler options).
  • secretsTypeName / secretsTypePath — Point to a TypeScript interface that documents the environment variables (secrets) your container expects. The CLI uses this to type-check process.env access and to generate schema hints.
  • paramsTypeName / paramsTypePath — Similar to secrets, but for structured params you pass at runtime. Defining these keeps context.params strongly typed.
  • allowedRenderingResources — Restrict which rendering resource types (values from CustomMetricContainerRenderingResource) Puppeteer may request, e.g. ["Image", "Font"] for leaner crawls.
  • navigationTimeoutMs — Overrides the default page-level navigation timeout for request handlers. Helpful when you expect very slow pages or when you want to fail quickly.
  • reportTemplates — Array of predefined report templates. See Report templates for the structure and usage.
  • targets — Optional record of named deployment targets for multi-environment publishing. Each target specifies id, profile, and optionally apiUrl. See Multi-target deployment.
  • entrypoint, handler, metricsTypeName, metricsMetadata, metricsTypeNames, metricsTypePath, externalPackages, metricsSchema — Legacy single-handler shortcuts. They mirror the per-handler options below and are still read for backwards compatibility, but we recommend moving to the handlers block so you can mix request/preCrawl/postCrawl logic in one file.

Container properties

These fields define the container's identity and behavior in the API. When present in .oreorc, they are automatically synced to the API on every publish and can be managed via metric update and metric pull commands.

  • name — Unique container name (alphanumeric, hyphens, underscores; 3-100 chars). Immutable after creation. When set in .oreorc, it acts as a safety guard — publishing will fail if this name doesn't match the container ID, preventing accidental publishes to the wrong container.
  • displayName — Human-readable display name for the container (3-100 chars).
  • description — Description of what the container does.
  • inputType — Which input type the container uses: "DOM" or "Puppeteer".
  • resourceTypes — Array of resource types the container extracts metrics from: "Document", "Image", "Script", "Stylesheet".
  • containerParams — Default container parameters (JSON object) passed at crawl time via context.params.
  • runFirst — If true, this container runs before other containers in the execution order.
  • requiresAiFeatures — If true, this container requires AI features to be enabled on the project.

These fields are also used by metric create and metric bootstrap as defaults — when present in .oreorc, the CLI skips the interactive prompt for that field.

Admin-only container properties

The following fields require admin API access. They are accepted in .oreorc and synced automatically when the user has admin privileges. For non-admin users, these fields are silently ignored during sync.

  • scope — Container scope: "Container" (default) or "System".
  • executable — Whether the container can be executed.
  • isolate — Whether to isolate container execution.
  • isEssential — Whether this is an essential metric.
  • isGlobal — Whether this container is available to all projects.
  • obeyCSP — Whether to respect Content Security Policy headers.
  • requiresResponseBuffer — Whether the container requires the response buffer.
  • minRequiredCpu — Minimum required CPU: 1, 2, or 4.
  • includedDatasourceCodes — Array of supported datasource codes.
  • linkedExternalSources — Array of external source names linked to this container.
  • supportedFeatures — Array of supported feature flag names.
  • relatedContainerNames — Array of related container names.
  • requiredAddons — Array of required addon names.
  • requiredFeatureFlags — Array of required feature flags.
  • supportedCrawlTypes — Array of supported crawl type codes.
  • supportedUploadTypes — Array of supported upload types.
  • costs — Array of cost entries: [{ cost: number, moduleCode: string }].
  • creditAllocationTypeOverride — Credit allocation type override.
import type { IContainerConfigData } from "@deepcrawl/oreo";

const config: IContainerConfigData = {
id: "ccc_123",
name: "MyMetricContainer",
displayName: "My Metric Container",
description: "Extracts custom SEO metrics from crawled pages.",
inputType: "DOM",
resourceTypes: ["Document"],
secretsTypeName: "ContainerSecrets",
secretsTypePath: "src/secrets.ts",
paramsTypeName: "RunParams",
paramsTypePath: "src/params.ts",
allowedRenderingResources: ["Image"],
navigationTimeoutMs: 120000,
handlers: {
request: {
entrypoint: "src/index.ts",
handler: "handler",
metricsTypeName: "IMetrics",
},
postCrawl: {
entrypoint: "src/post-crawl.ts",
handler: "postCrawlHandler",
skipForSpr: true,
},
},
};

export default config;

Typing secrets and params

Pairing secretsTypeName / secretsTypePath with paramsTypeName / paramsTypePath lets the CLI point to the exact TypeScript interfaces that describe the secrets and runtime parameters your handlers expect. Secrets defined this way become available at runtime via process.env, while params are exposed on every handler invocation through context.params. By wiring those interfaces into MetricScriptHandler’s generics you get full IntelliSense for both the container input (IRequestContainerInput) and the params structure.

import type {
IRequestContainerInput,
MetricScriptHandler,
MetricScriptParamsType,
MetricScriptSecretsType,
} from "@deepcrawl/custom-metric-types";
import { MetricScriptBasicOutput } from "@deepcrawl/custom-metric-types";

export interface MySecrets extends MetricScriptSecretsType {
OPENAI_API_KEY: string | null | undefined;
}

export interface MyParams extends MetricScriptParamsType {
extractionRegex: string | null | undefined;
}

export interface MyMetrics extends MetricScriptBasicOutput {
/**
* @title Page Title
* @description The title of the page.
*/
pageTitle: string;
}

export const myHandler: MetricScriptHandler<MyMetrics, IRequestContainerInput, MyParams> = (input, context) => {
const openAiKey = process.env["OPENAI_API_KEY"];

if (input.phase === "request" && input.resourceType === "document") {
return {
pageTitle: document.title,
};
}

return undefined;
};

Export MySecrets and MyParams from the file referenced by secretsTypePath / paramsTypePath, then reference their names in .oreorc.ts so the CLI can validate your configuration and keep process.env plus context.params strongly typed:

import type { IContainerConfigData } from "@deepcrawl/oreo";

const config: IContainerConfigData = {
secretsTypeName: "MySecrets",
secretsTypePath: "src/metrics-types.ts",
paramsTypeName: "MyParams",
paramsTypePath: "src/metrics-types.ts",
handlers: {
request: {
entrypoint: "src/index.ts",
handler: "myHandler",
metricsTypeName: "MyMetrics",
},
},
};

Handler options (handlers.request, handlers.preCrawl, handlers.postCrawl)

Each handler entry is validated by CrawlCustomMetricContainerHandlerSchema. These keys are available:

  • handlerRequired. The name of the exported function in the entrypoint module.
  • entrypoint — Path to the TypeScript/JavaScript file that exports the handler. If omitted, the CLI falls back to the container-level entrypoint.
  • timeoutMs — Per-handler timeout in milliseconds. Use this to give long-running Puppeteer work more time or to enforce faster failures.
  • metricsTypeName — Name of the TypeScript interface describing this handler’s return shape. Must be exported from the file referenced by metricsTypePath (or the entrypoint if no path override is provided).
  • metricsTypeNames — Record used by multi-output handlers (outputType: "multi-output") to map each logical metric key to a TypeScript type, e.g. { product: "IProductMetrics" }.
  • metricsTypePath — File that exports the type(s) referenced by metricsTypeName or metricsTypeNames. Defaults to the handler’s entrypoint.
  • metricsMetadata — Optional metadata overrides scoped to this handler. It follows the same structure documented in Providing extra metadata for custom metrics for UI.
  • externalPackages — Array of native dependencies (e.g. ["sharp"]) that must be installed alongside the bundle so runtime extraction can load them.
  • tableType — Target storage table when the handler emits a single record type. Choose from CustomMetricContainerTableType (for example, "dc:crawler:project_metrics:item" when producing crawl-level data).
  • tableTypes — Record that lets multi-output handlers map individual metric keys to specific table types, e.g. { summary: "dc:crawler:project_metrics:item" }.
  • outputType"single-output" (default) means the handler returns one object per URL. "multi-output" signals that the handler returns multiple named metric objects and therefore requires metricsTypeNames / metricsSchemas hints.
  • metricsSchema — Path to a JSON Schema file that describes the handler’s output. This is useful when you author metrics in plain JavaScript and still want type-safe publishing.
  • metricsSchemas — Record of JSON Schema paths for multi-output handlers (one schema per metric key).
  • linksProducer — When true, the returned metrics flow through the crawler's link filtering, deduplication, and enqueueing pipeline rather than being stored as plain custom metrics. Use this when your container discovers URLs that the crawler should follow. See Link-producing containers.
  • linksProducers — Multi-output variant of linksProducer. A record mapping metric keys to booleans, indicating which outputs are treated as links.
  • groupingField — Name of a metric field to use as the grouping key when storing results.
  • groupingFields — Multi-output variant of groupingField. A record mapping metric keys to their respective grouping field names.
  • skipForSpr — Only valid on preCrawl and postCrawl handlers. When true, the handler is not executed during Single Page Requester (SPR) runs so you can keep those requests lightweight.

Tip: Handler-level options take precedence over legacy top-level fields, so you can migrate gradually by moving one handler at a time into handlers.

VS Code JSON schema validation

Keep your .oreorc.json files validated inside VS Code by pointing the built-in JSON validation to the schema that the CLI can generate for you.

  1. Generate the schema file (for example in .vscode/container-config-json-schema.json):
npx @deepcrawl/oreo@latest metric generate-config-schema
  1. Add the schema reference to your .vscode/settings.json:
{
"json.schemas": [
{
"fileMatch": ["**/.oreorc.json"],
"url": ".vscode/container-config-json-schema.json"
}
]
}

Now VS Code will validate .oreorc.json files against the generated schema as you edit them.

Alternatively, set the $schema property inside each .oreorc.json for file-local validation. Point it either to the schema you generated under .vscode or to the schema bundled with the CLI:

{
"$schema": ".vscode/container-config-json-schema.json"
}

If @deepcrawl/oreo is installed locally, you can re-use the schema stored in node_modules:

{
"$schema": "./node_modules/@deepcrawl/oreo/container-config-json-schema.json"
}

Multi-target deployment

When you need to publish the same container to multiple API environments (e.g., staging and production) or maintain separate dev/published container copies, you can configure deployment targets and auth profiles in the Oreo CLI.

Setting up profiles

Profiles store per-environment API URLs and authentication credentials. We recommend creating explicit named profiles for each environment.

# Create profiles
npx @deepcrawl/oreo@latest config profile create prod --api-url https://api.lumar.io/graphql
npx @deepcrawl/oreo@latest config profile create staging --api-url https://api.staging.lumar.io/graphql

# Authenticate each profile (opens browser for OAuth)
npx @deepcrawl/oreo@latest login --profile prod
npx @deepcrawl/oreo@latest login --profile staging

# List all profiles
npx @deepcrawl/oreo@latest config profile list

# Delete a profile
npx @deepcrawl/oreo@latest config profile delete staging

The name default is reserved — it refers to the root-level user config (your existing setup) and cannot be created as a named profile. Use "profile": "default" in a target to reference it. Existing setups without profiles continue to work unchanged.

Configuring targets in .oreorc

Add a targets block to your container config. Each target specifies a container ID, a named profile, and optionally an expected API URL for safety verification:

import type { IContainerConfigData } from "@deepcrawl/oreo";

const config: IContainerConfigData = {
// Root id is optional when all IDs are in targets
handlers: {
request: {
entrypoint: "src/index.ts",
handler: "handler",
metricsTypeName: "IMetrics",
},
},
targets: {
prod: {
id: "ccc_123",
profile: "prod",
apiUrl: "https://api.lumar.io/graphql",
},
staging: {
id: "ccc_456",
profile: "staging",
apiUrl: "https://api.staging.lumar.io/graphql",
},
"prod-dev": {
id: "ccc_789",
profile: "prod",
apiUrl: "https://api.lumar.io/graphql",
},
},
};

export default config;

Or in JSON:

{
"handlers": {
"request": {
"entrypoint": "src/index.ts",
"handler": "handler",
"metricsTypeName": "IMetrics"
}
},
"targets": {
"prod": {
"id": "ccc_123",
"profile": "prod",
"apiUrl": "https://api.lumar.io/graphql"
},
"staging": {
"id": "ccc_456",
"profile": "staging",
"apiUrl": "https://api.staging.lumar.io/graphql"
}
}
}

Each target entry has the following fields:

  • idRequired. The CustomMetricContainer ID for this target environment.
  • profileRequired. Name of the auth profile to use. Use "default" to reference the root config.
  • apiUrl — Optional. Expected API URL for safety verification. At publish time, the CLI checks that this matches the profile's configured URL and errors if they differ — preventing accidental cross-environment publishes.

You can include a root id alongside targets for backward compatibility. When no --target flag is provided, the root id and default auth are used. If there is no root id and --target is omitted, the CLI will error and list the available targets.

Publishing to a target

Build once, then publish to any configured target:

# Build is target-agnostic
npx @deepcrawl/oreo@latest metric build

# Publish to production
npx @deepcrawl/oreo@latest metric publish-dir dc.out/build/ --target prod

# Publish to staging
npx @deepcrawl/oreo@latest metric publish-dir dc.out/build/ --target staging

The --target flag (shorthand -t) works on all container-ID commands: publish-dir, publish-zip, link, unlink, secret set, global-secret set, and their set-from-dotenv variants.

Using profiles with other commands

Commands that interact with the API but don't use a container ID accept --profile directly:

# Create a container in staging
npx @deepcrawl/oreo@latest metric create --profile staging

# Bootstrap a new project in staging (generates target-aware .oreorc)
npx @deepcrawl/oreo@latest metric bootstrap my-metrics/ --profile staging

# Run a crawl in staging
npx @deepcrawl/oreo@latest crawl create --profile staging

# Create a project in staging
npx @deepcrawl/oreo@latest project create --profile staging

When metric bootstrap is run with --profile, the generated .oreorc.json automatically uses a targets block instead of a flat root id.

Releasing new version of CustomMetricContainer

Once you are happy with your container and want to release a new version of it, you need to build and upload it.

npm run build
npm run upload

To publish to a specific target environment, pass the --target flag:

npm run build
npx @deepcrawl/oreo@latest metric publish-dir dc.out/build/ --target staging

At this point you have a container that is published and ready to be linked with a project.

Automatic property syncing on publish

When your .oreorc config includes container properties (such as name, displayName, description, inputType, or resourceTypes), the CLI automatically checks them against the API before uploading a new version.

  • Name validation — If name is set in .oreorc, the CLI compares it against the API container's name. A mismatch triggers a hard error, preventing accidental publishes to the wrong container. This is a safety guard since the container name is immutable after creation.
  • Property sync — Any mutable properties that differ between .oreorc and the API are automatically updated. The CLI logs which fields are being checked and synced:
Validating container name: "MyMetricContainer"
Container name matches: "MyMetricContainer"
Checking container properties: displayName, description
Syncing container properties: description
Updated container properties: description

If nothing differs, no update is sent to the API.

Syncing properties without publishing

Use metric update to sync container properties from .oreorc to the API without publishing a new version. This is useful when you want to update metadata like displayName or description without deploying new code.

npx @deepcrawl/oreo@latest metric update

With a specific target:

npx @deepcrawl/oreo@latest metric update --target prod

Pulling properties from the API

Use metric pull to fetch the container's current properties from the API and write them into your .oreorc.json. This is useful when onboarding an existing container or when you want to ensure your local config matches the API state.

npx @deepcrawl/oreo@latest metric pull

With a specific target:

npx @deepcrawl/oreo@latest metric pull --target prod

The command reads from the API and merges the properties into your existing .oreorc.json, preserving any fields the API doesn't manage (like handlers, targets, etc.). If you have admin access, admin-only properties are also pulled automatically.

note

metric pull only works with .oreorc.json files. If your project uses .oreorc.ts, you'll need to update it manually.

Adding custom metrics to a Lumar project

Running a metric link command without any arguments will fetch your default account's projects and CustomMetricContainer ID from .oreorc.json or .oreorc.ts.

npm run oreo metric link
linking to multiple projects at the same time
npm run oreo metric link -- --projectIds 123456,12345,123454
linking a specific target
npm run oreo metric link -- --target staging

Fetching metrics via Single Page Requester

Before commiting to a full crawl, you can test your custom metrics using the Single Page Requester.

npm run oreo project request-custom-metrics -- --projectId 123456 --url http://example.com/

This will output a table with your custom metrics.

Fetching the metrics from the crawl

Running a crawl

Once a CustomMetricContainer is linked to a project, the next time a crawl is run it will inherit the container and extract custom metrics.

You can start the crawl as you would usually would via the UI, or start crawling from the CLI.

npm run oreo crawl create -- --projectId 123456

Fetching the metrics

Once the crawl finishes, you can access custom metrics through Graph-API Explorer or using Analyze UI.

Operation: query FetchCustomMetrics($reportInput: GetReportStatInput!) { getReportStat(input: $reportInput) { crawlUrls(first: 100) { nodes { url customMetrics } } } }Variables: { "reportInput": { "crawlId": 1762158, "reportTypeCode": "Basic", "reportTemplateCode": "all_pages" } }
FetchCustomMetricsTry in Explorer
GraphQL
query FetchCustomMetrics($reportInput: GetReportStatInput!) {
getReportStat(input: $reportInput) {
crawlUrls(first: 100) {
nodes {
url
customMetrics
}
}
}
}

Advanced topics

Providing extra metadata for custom metrics for UI

You can provide display names for your metrics in the .oreorc.json or .oreorc.ts file. This will make it easier to understand what the metric represents in the UI. You can also provide names for auto generated __count metrics.

{
"metricsMetadata": {
"pageTitle": {
"title": "Page Title",
"description": "The title of the page.",
"type": "string"
},
"myObjects": {
"title": "My Object With Specific Order",
"description": "My Object With Specific Order",
"type": "array",
"items": {
"type": "object",
"properties": {
"aString": {
"title": "String Metric",
"type": "string"
},
"cBoolean": {
"title": "Boolean Metric",
"description": "You can also provide description for specific properties in object arrays.",
"type": "boolean"
},
"bNumber": {
"title": "Number Metric",
"type": "number"
},
"dateString": {
"title": "My Date Field",
"type": "string",
"format": "date-time"
}
}
}
},
"extract": {
"type": "array",
"items": {
"type": "string"
},
"title": "Extracts",
"description": "The extracted values"
},
"pageSize": {
"title": "Page Size",
"description": "Page size in bytes.",
"type": "number",
"format": "bytes"
},
"myFloat": {
"type": "number"
},
"myInt": {
"type": "number",
"format": "integer"
},
"extract__count": {
"title": "Extracts Count",
"description": "Count of extracted values (auto-generated)"
}
}
}

You can also use JSDoc to provide metadata for your metrics

export interface MyMetrics extends MetricScriptBasicOutput {
/**
* @title Page Title
* @description The title of the page.
*/
pageTitle: string;
/**
* Order of properties will be kept in the UI.
*
* @title My Object With Specific Order
* @description My Object With Specific Order
*/
myObjects: Array<{
/**
* @title String Metric
*/
aString: string;
/**
* @title Boolean Metric
* @description You can also provide description for specific properties in object arrays.
*/
cBoolean?: boolean;
/**
* @title Number Metric
*/
bNumber: number;
/**
* @title My Date Field
* @format date-time
*/
dateString: string;
}>;
url: string;
extract: string[];
/**
* @title Page Size
* @description Page size in bytes.
* @format bytes
*/
pageSize: number;
myFloat: number;
/**
* @format integer
*/
myInt: number;
}

Report templates

You can define report templates directly in your .oreorc.json or .oreorc.ts file. Report templates allow you to create predefined filters and views for your custom metrics, making it easier to analyze specific subsets of your data.

Set reportTemplates to an array of template definitions. Each entry consists of:

  • code: unique identifier using lowercase letters, numbers, or underscores (no spaces) and must be unique across templates
  • filter: filter criteria based on your custom metrics
  • baseReportTemplateCode: the template code of the base report template (typically "all_pages")
  • name (optional): descriptive name for the template
  • description (optional): detailed description of what the template shows
  • orderBy (optional): array of sorting rules for the resulting report, each with a field and direction ("ASC" or "DESC")
  • metricsGroupings (optional): array of arrays that control how metrics are grouped and ordered in the UI
  • reportCategories (optional): array of category definitions used to organise the template in the UI

orderBy entries are applied in sequence, allowing you to define primary, secondary, and further sort keys. Use the column identifiers exposed by the base template or your custom metric paths, such as "customMetrics.randomNumber" or "url".

metricsGroupings define the column arrangement the UI should use when rendering the report. Each inner array represents a group of metrics shown together, in the order provided. Groups are rendered from top to bottom; the first group becomes the default set of columns visible to users.

When you supply multiple categories, list them starting with the deepest category. The first entry is used to build breadcrumbs, and each category can reference its parent via parentCode.

{
"reportTemplates": [
{
"code": "random_above_50",
"filter": {
"_and": [
{
"customMetrics": {
"randomNumber": {
"ge": 0.6
}
}
},
{
"url": {
"beginsWith": "https://example.com"
}
}
]
},
"baseReportTemplateCode": "all_pages",
"orderBy": [
{ "field": "customMetrics.randomNumber", "direction": "DESC" },
{ "field": "url", "direction": "ASC" }
],
"metricsGroupings": [
["pageTitle", "url", "description", "foundAtUrl"],
["customMetrics.randomNumber"]
],
"reportCategories": [
{
"code": "performance",
"name": "Performance",
"parentCode": {
"code": "seo",
"name": "SEO"
}
}
]
},
{
"code": "h1_tags_include_lumar",
"name": "H1 tags include Lumar",
"description": "my h1 tags include Lumar",
"filter": {
"customMetrics": {
"h1Tags": {
"arrayContainsLike": "Lumar"
}
}
},
"baseReportTemplateCode": "all_pages"
}
]
}

Available filter predicates

The following filter predicates are available based on the metric type:

String predicates:

  • eq - equals
  • ne - not equals
  • contains - contains substring
  • notContains - does not contain substring
  • beginsWith - starts with
  • endsWith - ends with
  • matchesRegex - matches regular expression
  • notMatchesRegex - does not match regular expression
  • in - value is in array
  • notIn - value is not in array
  • isEmpty - is empty string
  • isNull - is null

Number predicates:

  • eq - equals
  • ne - not equals
  • gt - greater than
  • ge - greater than or equal
  • lt - less than
  • le - less than or equal
  • in - value is in array
  • notIn - value is not in array
  • isEmpty - is empty
  • isNull - is null

Array predicates:

  • arrayContains - array contains exact value
  • arrayContainsLike - array contains value (case-insensitive)
  • arrayNotContains - array does not contain exact value
  • arrayNotContainsLike - array does not contain value (case-insensitive)
  • isEmpty - array is empty
  • isNull - array is null

Boolean predicates:

  • eq - equals
  • ne - not equals
  • isNull - is null

Logical predicates:

  • _and - logical AND (matches all filters in the array)
  • _or - logical OR (matches any filter in the array)

Crawl-level metrics

By default, custom metrics are stored at the URL level, meaning each URL gets its own set of metrics. However, you can configure your container to store metrics at the crawl level instead, which allows you to aggregate data across multiple URLs or store crawl-wide statistics.

To enable crawl-level metrics, you need to specify the tableType in your container configuration and return special metadata fields in your metrics.

{
"id": "605",
"handlers": {
"request": {
"entrypoint": "src/my-func.ts",
"handler": "myHandler",
"metricsTypeName": "MyMetrics",
"tableType": "dc:crawler:project_metrics:item"
}
}
}

When using crawl-level metrics, your handler must return an array of objects instead of a single object. Each object in the array represents a separate metric record and must include special metadata fields:

  • @stepId: The crawl step ID (available as input.id)
  • @itemType: A string identifier for the type of metric being stored
  • @itemKey: A unique key for this specific metric record
export interface MyMetrics extends MetricScriptBasicOutput {
randomNumber: number;
[`@stepId`]: string;
[`@itemType`]: string;
[`@itemKey`]: string;
}

export const myHandler: MetricScriptHandler<MyMetrics> = input => {
const randomNumber = Math.random();

return [
{
randomNumber,
[`@stepId`]: input.id,
[`@itemType`]: "random-number",
[`@itemKey`]: `${input.url}`,
},
];
};

Crawl-level metrics are useful for:

  • Storing aggregated statistics across multiple URLs
  • Creating crawl-wide reports and dashboards
  • Tracking metrics that don't belong to specific URLs
  • Building custom analytics that span the entire crawl

Native dependencies

You can use native dependencies in your custom metrics by including them in the externalPackages array in the .oreorc.json or .oreorc.ts file. You also need to have them in your package.json dependencies so we can install correct version.

{
"externalPackages": ["sharp"]
}

Secrets in custom metric containers

Sometimes there is a need to pass in a secret into your Container, or other variables which are unique to a project. (For example OPENAI_APIKEY.) You can set secrets for your CustomMetricContainer which will be accessible via environment variables.

const openaiApiKey = process.env["OPENAI_APIKEY"];

There are two scopes you can work with:

  • Container-level secrets: configured once on the CustomMetricContainer. Every project linked to the container inherits the value by default.
  • Project-level secrets: defined for a specific project. They override any container-level secret with the same name.

Use GraphQL to set container-level values with the setCustomMetricContainerSecret mutation.

Operation: mutation setCustomMetricContainerSecret( $input: SetCustomMetricContainerSecretInput! ) { setCustomMetricContainerSecret(input: $input) { customMetricContainerSecret { name } } }Variables: { "input": { "customMetricContainerId": 1, "name": "OPENAI_APIKEY", "value": "MY API SECRET KEY" } }
setCustomMetricContainerSecretTry in Explorer
GraphQL
mutation setCustomMetricContainerSecret(
$input: SetCustomMetricContainerSecretInput!
) {
setCustomMetricContainerSecret(input: $input) {
customMetricContainerSecret {
name
}
}
}

Set a project-level secret from the CLI when you need to override the shared value.

npm run oreo metric secret set -- --name OPENAI_APIKEY --projectId 123456 --value "mySecretKey"

You can also call the GraphQL API directly for project-level overrides.

Operation: mutation setCustomMetricContainerProjectSecret( $input: SetCustomMetricContainerProjectSecretInput! ) { setCustomMetricContainerProjectSecret(input: $input) { customMetricContainerProjectSecret { name } } }Variables: { "input": { "projectId": 1, "customMetricContainerId": 1, "name": "OPENAI_APIKEY", "value": "MY API SECRET KEY" } }
setCustomMetricContainerProjectSecretTry in Explorer
GraphQL
mutation setCustomMetricContainerProjectSecret(
$input: SetCustomMetricContainerProjectSecretInput!
) {
setCustomMetricContainerProjectSecret(input: $input) {
customMetricContainerProjectSecret {
name
}
}
}
info

Container-level secrets provide the default value for every linked project. Define a project-level secret only when you need a project-specific override.

CI/CD integration

You can integrate your custom metric container with your CI/CD pipeline. For example, you can use GitHub Actions to build and upload your container. For this you will need to login to the CLI programmatically without user interaction with a Lumar ACCOUNT_ID, API_KEY_ID and API_KEY_SECRET.

To create API_KEY_ID and API_KEY_SECRET you can use the CLI command locally or do so via Lumar Accounts app where you can also find ACCOUNT_ID.

npm run oreo user-key create

Once you have all secrets you can use them in your CI/CD worflow file.

npm run oreo login -- --id ${{ secrets.API_KEY_ID }} --secret ${{ secrets.API_KEY_SECRET }} --accountId ${{ secrets.ACCOUNT_ID }}
npm run build
npm run upload

Programmatic access

If you would like to run your custom metric container programmatically, you can do so using @deepcrawl/oreo-api-sdk package.

For more information, see Single Page Requester.

Container failures

If your container fails to extract metrics and returns an error, the information is stored as a separate metric containerExecutionFailures. Failing containers will not stop the crawl.

Supported types for filtering

Even though custom metric containers can extract and store almost any data type, not all of them will be queryable via the API in our UI.

Supported filterable types are:

  • boolean
  • number
  • number[]
  • string
  • string[]

Automatic __count metrics for arrays

If your metric returns an array, we will automatically generate a metric that counts the number of elements in the array. This metric will have the same name as the original metric with __count suffix.

Universal container

Project created from running bootstrap command will have a specific type either for DOM or Puppeteer specified, but you can create a universal container that can handle both, using input.inputType, input.resourceType to narrow down type during extraction.

export interface IMetrics extends MetricScriptBasicOutput {
isImage: boolean;
wasJsRenderingEnabled: boolean;
}

export const myHandler: MetricScriptHandler<IMetrics> = input => {
if (input.resourceType === "document") {
if (input.inputType === "dom") {
// do extractions without puppeteer
return {
wasJsRenderingEnabled: false,
};
} else if (input.inputType === "puppeteer") {
// do extractions with puppeteer
return {
wasJsRenderingEnabled: true,
};
}
} else if (input.resourceType === "image") {
// do extractions for images
return {
isImage: true,
wasJsRenderingEnabled: input.inputType === "puppeteer",
};
}
};

Handler input reference

Every handler receives an input object as its first argument. The shape of this object depends on the handler phase and (for request handlers) the container's input type and resource type.

Common fields (all phases)

All handler inputs share these base fields:

interface ICommonContainerInput {
id: string; // Step ID — unique identifier for this execution
projectId: number; // Lumar project ID
crawlId: number; // Current crawl ID
phase: "request" | "preCrawl" | "postCrawl";
}

Request handler input

Request handlers receive additional fields depending on the resource type and input type:

interface ICommonRequestContainerInput extends ICommonContainerInput {
resourceType: "document" | "image" | "script" | "stylesheet";
inputType: "dom" | "puppeteer";
url: string;
response?: {
statusCode: number;
headers: Record<string, string>;
requestDuration: number;
transferSize?: number;
};
parentUrl?: string;
crawlLevel?: number;
disallowed?: boolean;
error?: { errorMessage: string; errorCode: string };
consoleMessages?: IConsoleMessage[];
pageErrors?: Error[];
responses?: IHttpResponse[]; // All HTTP responses captured during page load
}

Puppeteer inputs (IPuppeteerRequestContainerInput) additionally include:

  • input.page — A live Puppeteer Page object for browser interaction

Document inputs additionally include:

interface IDocumentInputContent {
staticHtml: { text: string; document: Document }; // Pre-render HTML
renderedHtml: { text: string; document: Document }; // Post-render HTML
windowExtractions: Record<string, unknown>; // Data from window object
performance?: {
navigationTiming?: {
requestStart?: number;
responseStart?: number;
domContentLoadedEventEnd?: number;
domInteractive?: number;
};
paintTiming?: { startTime?: number };
webVitals?: { lcp?: number; cls?: number };
};
renderingTimedOut?: boolean;
}

Image inputs include content?: { body: Buffer }.

Script and StyleSheet inputs include content?: { body: string }.

Document inputs may also include redirect details:

interface IResolvedRedirectDetails {
resolvedTarget?:
| { url: string; statusCode: number } // Successful redirect
| { url: string; errorMessage: string; errorCode: string } // Failed redirect
| { url: string; targetExclusionReason: string }; // Broken (excluded) redirect
redirectChain: Array<{
url: string;
statusCode: number;
redirectType: "location" | "refresh" | "meta" | "js";
redirectsTo: string;
exclusionReason?: string;
metaRefreshDuration?: number;
}>;
}

Pre-crawl and post-crawl handler input

Both IPreCrawlContainerInput and IPostCrawlContainerInput extend the common fields and additionally include:

  • input.page — A Puppeteer Page object for browser interaction (e.g. fetching external data, calling APIs)

Handler context reference

Every handler receives a context object as its second argument. This is the same across all handler phases.

interface IMetricScriptContext<TParams = Record<string, string>> {
params: Partial<TParams>;
settings: {
userAgentToken: string;
isJsEnabled: boolean;
domain: {
primaryDomain: string;
secondaryDomains: string[];
startUrlsDomains: string[];
mobileDomain?: string;
includeSubdomains: boolean;
ignoreProtocol: boolean;
domainAlias?: string;
};
duplicatePrecisionIndices: number[];
aiFeaturesEnabled: boolean;
ignoreXRobots?: boolean;
};
externalSources?: {
googleSearchConsole?: Array<{ siteUrl: string; refreshToken: string /* ... */ }>;
};
keyValueStore: IContainerKeyValueStore;
graphStore: IContainerGraphStore;
storeAttachment: (attachment: { name: string; content: Buffer; contentType: string }) => Promise<void>;
costReporter: { report: (label: string, value: number) => Promise<void> };
next: (token: { value: string; delaySeconds?: number }) => Promise<void>;
nextToken?: string;
crawlStartedAt?: string; // ISO 8601 (e.g. '2024-11-05T10:41:33.077Z')
launchBrowser: (options?: { args?: string[] }) => Promise<Browser>;
isInternalUrl: (url: URL | string) => boolean;
logger?: { debug: Function; info: Function; warn: Function; error: Function };
}

Key context properties

  • context.params — Runtime parameters passed to the container. Strongly typed when you define paramsTypeName / paramsTypePath in your .oreorc config.
  • context.settings — Crawl project settings including domain configuration and rendering options.
  • context.crawlStartedAt — ISO 8601 timestamp of when the crawl started.
  • context.isInternalUrl(url) — Returns true if the given URL belongs to the crawled domain (respects subdomain and protocol settings).
  • context.logger — Structured logger with debug, info, warn, and error methods.

Storing attachments

Use context.storeAttachment() to save binary files (screenshots, PDFs, exports) alongside crawl results:

await context.storeAttachment({
name: "screenshot.png",
content: screenshotBuffer,
contentType: "image/png",
});

Launching additional browsers

Use context.launchBrowser() to spin up a separate browser instance when your handler needs to navigate to external pages or run parallel browser work:

const browser = await context.launchBrowser({ args: ["--no-sandbox"] });
const page = await browser.newPage();
await page.goto("https://external-api.example.com");
// ... extract data ...
await browser.close();

Reporting costs

Use context.costReporter.report() to track resource consumption (e.g. API calls to external services):

await context.costReporter.report("openai-tokens", 1500);

Key-value store

The key-value store lets handlers persist and share data across handler invocations. It has two scopes:

  • crawl — Data scoped to the current crawl. Automatically cleaned up when the crawl ends. Use this for sharing state between request handlers processing different URLs in the same crawl.
  • project — Data scoped to the project, persisted across crawls. Requires an explicit TTL (max 90 days). Use this for caching expensive external data that doesn't change between crawls.

Basic operations

Both set() and get() return an IKeyValueObject:

interface IKeyValueObject {
readonly key: string;
readonly value: string;
readonly ttl: number;
}
// Crawl-scoped storage
await context.keyValueStore.crawl.set("seen-urls", JSON.stringify(urls));
const result = await context.keyValueStore.crawl.get("seen-urls");
if (result) {
const urls = JSON.parse(result.value);
// result.key and result.ttl are also available
}
await context.keyValueStore.crawl.remove("seen-urls");

// Project-scoped storage (TTL in seconds, max 90 days)
const ttl = 30 * 24 * 60 * 60; // 30 days
await context.keyValueStore.project.set("api-cache", JSON.stringify(data), ttl);
const cached = await context.keyValueStore.project.get("api-cache");
await context.keyValueStore.project.remove("api-cache");

Collections

Collections are sets of unique string values. Useful for tracking membership (e.g. "have I seen this URL before?").

// Crawl-scoped collection (no TTL needed)
const crawlCollection = await context.keyValueStore.crawl.collections().set("processed-urls");
await crawlCollection.add("https://example.com/page-1");
const exists = await crawlCollection.has("https://example.com/page-1"); // true
await crawlCollection.remove("https://example.com/page-1");

// Iterate over all members
for await (const member of crawlCollection.streamMembers()) {
console.log(member);
}

// Project-scoped collection (TTL in seconds required)
const projectCollection = await context.keyValueStore.project.collections().set("known-sitemaps", 2592000);

Maps

Maps are key-value dictionaries. Useful for building lookup tables.

// Crawl-scoped map
const urlMap = await context.keyValueStore.crawl.collections().map("url-to-category");
await urlMap.set("/page-1", "blog");
const category = await urlMap.get("/page-1"); // "blog"
await urlMap.has("/page-1"); // true

// Iterate over all entries
for await (const [key, value] of urlMap.streamMembers()) {
console.log(`${key} => ${value}`);
}

// Project-scoped map (TTL in seconds required)
const cacheMap = await context.keyValueStore.project.collections().map("external-data", 2592000);

Graph store

The graph store lets you persist and query graph-structured data scoped to the project. It uses an openCypher-compatible API and is useful for modelling relationships between entities (e.g. internal link graphs, content hierarchies).

upsertNode takes a label, a match object (identity properties used to find or create the node), and an optional set object (mutable properties updated on each upsert):

// Upsert a single node — { url } is the identity key, { title } is updated on each upsert
await context.graphStore.project.upsertNode("Page", { url: input.url }, { title: "My Page" });

// Upsert relationships
await context.graphStore.project.upsertRelationship({ label: "Page", match: { url: input.url } }, "LINKS_TO", {
label: "Page",
match: { url: targetUrl },
});

// Query with openCypher
const result = await context.graphStore.project.query("MATCH (p:Page) WHERE p.url = $url RETURN p.title", {
url: input.url,
});

// Delete nodes and relationships
await context.graphStore.project.deleteNode("Page", { url: input.url });
await context.graphStore.project.deleteRelationships({ label: "Page", match: { url: input.url } }, "LINKS_TO");

All graph store writes accept an optional ttl (in seconds, max 90 days, defaults to 60 days):

await context.graphStore.project.upsertNode("Page", { url: input.url }, { title: "My Page" }, { ttl: 2592000 });

Batch operations

For better performance when writing many nodes or relationships, use the batch variants:

// Upsert multiple nodes at once
await context.graphStore.project.upsertNodes("Page", [
{ match: { url: "/page-1" }, set: { title: "Page 1" } },
{ match: { url: "/page-2" }, set: { title: "Page 2" } },
]);

// Upsert multiple relationships at once
await context.graphStore.project.upsertRelationships({ label: "Page", match: { url: input.url } }, "LINKS_TO", [
{ label: "Page", match: { url: "/target-1" } },
{ label: "Page", match: { url: "/target-2" } },
]);

Batch processing and pagination

For preCrawl and postCrawl handlers that need to process large datasets iteratively (e.g. fetching all pages from a sitemap index, paginating through an external API), use context.next() and context.nextToken.

Calling context.next() signals that the handler should be re-invoked with the provided token. On the next invocation, the token is available via context.nextToken. When context.next() is not called, the handler completes.

export const preCrawlHandler: MetricScriptHandler<{}, IPreCrawlContainerInput> = async (input, context) => {
const BATCH_SIZE = 100;
const currentOffset = Number(context.nextToken ?? 0);

const items = await fetchExternalData(currentOffset, BATCH_SIZE);

for (const item of items) {
await context.keyValueStore.crawl.set(`item:${item.id}`, JSON.stringify(item));
}

// If there are more items, schedule the next batch
if (items.length === BATCH_SIZE) {
await context.next({ value: String(currentOffset + BATCH_SIZE) });
}

return {};
};

You can also add a delay between invocations to avoid rate-limiting external APIs:

await context.next({ value: String(nextOffset), delaySeconds: 5 });

Custom metric containers can discover new URLs during a crawl and feed them back into the crawler's link pipeline. This is useful when you need to extract links from non-HTML resources (e.g. .txt or .md files) or from content that the crawler's built-in parser does not handle.

To enable this, set linksProducer: true on the handler in your .oreorc config:

{
"id": "1070",
"handlers": {
"request": {
"entrypoint": "src/handler.ts",
"handler": "handler",
"metricsTypeName": "ITextLinkOutput",
"metricsTypePath": "src/output.ts",
"linksProducer": true
}
}
}

The output interface describes the link fields the crawler pipeline expects:

import type { MetricScriptBasicOutput } from "@deepcrawl/custom-metric-types";

export interface ITextLinkOutput extends MetricScriptBasicOutput {
source: string;
type: string;
parentUrl: string;
isParentNofollow: boolean;
attributes: Record<string, string>;
}

The handler extracts links from the resource body and returns them. The crawler automatically filters, deduplicates, and enqueues the discovered URLs:

import type { IMetricScriptInput, MetricScriptHandler } from "@deepcrawl/custom-metric-types";
import type { ITextLinkOutput } from "./output";

export const handler: MetricScriptHandler<ITextLinkOutput, IMetricScriptInput> = (input, context) => {
if (input.phase !== "request") return undefined;
if (input.resourceType !== "script") return undefined;

const content = "content" in input ? input.content : undefined;
const body = content && "body" in content ? content.body : undefined;
if (!body || typeof body !== "string") return undefined;

const links = extractLinks(body, {
parentUrl: "url" in input ? input.url : "",
isInternalUrl: context.isInternalUrl,
});

return links as unknown as ITextLinkOutput[];
};

Note: For link-producing containers targeting non-HTML resources (e.g. .txt files), the crawl project must have Crawl non-HTML URLs enabled so that discovered text URLs are followed.

Google Search Console

If your container needs to connect to GSC, it should be enabled via linkedExternalSources during creation or updating of the custom metric container. Include googleSearchConsole in the array. Once enabled, publish a new version of the container code. During runtime, context.externalSources will include the googleSearchConsole configuration.

Changing API URL

npx @deepcrawl/oreo@latest config set --name=apiUrl --value=https://api.staging.lumar.io/graphql