This is a beta feature. If you would like to participate in the beta, please contact your account manager or support.
Getting started with custom metrics
Extend Lumar with your custom metrics and unlock the full power of extracting data from the web.
Using TypeScript/JavaScript you can create custom metrics that extract data from your pages. You can use Puppeteer to access the DOM and extract data or use the provided API to extract data from the DOM.
Custom metrics can be grouped into custom metric containers. Each custom metric container can contain multiple custom metrics.
Create a new project
Generate a fresh TypeScript custom metric container project and register it with the API.
- bootstrap into new directory
- bootstrap into current directory
npx @deepcrawl/oreo@latest metric bootstrap my-metrics/
npx @deepcrawl/oreo@latest metric bootstrap .
When bootstraping a new container you will be asked to provide a name. This name will be used to identify the container in the API and it needs to be globally unique.
You will also have a choice between DOM or Puppeteer container. Using Puppeteer requires the project in Lumar to be run with JS rendering enabled.
In case you want to extract metrics from images or style sheets, you can enable that as well.
You can also provide all of these options via CLI without getting prompted. Refer to our CLI docs. for all the available arguments.
- providing all options via CLI
npx @deepcrawl/oreo@latest metric bootstrap examples/bootstrap-example3 --name "MyGloballyUniqueMetricContainerName" --inputType=Puppeteer --resourceTypes=Document --description "Example metric extraction container"
After bootstrapping your container project, you will need to install dependencies with your package manager (npm, yarn, pnpm or other).
Change to the directory where you have bootstrapped your container project and run the following command.
npm install
Writing metrics extraction code
Open the container project in your favorite editor. You will find a src/index.ts file with a sample metrics extraction script.
export interface IMetrics extends MetricScriptBasicOutput {
  url: string;
}
export const handler: MetricScriptHandler<IMetrics, IPuppeteerRequestContainerInput> = async (input, _context) => {
  return {
    url: input.page.url(),
  };
};
Your container needs to explicitly specify its return types. This is done by defining an interface that extends the MetricScriptBasicOutput interface, and referencing that interface name in .oreorc.json or .oreorc.ts file under metricsTypeName key. (Make sure it's exported from the entrypoint file.)
You can use either .oreorc.json (legacy) or .oreorc.ts (type-safe) for configuration. The TypeScript option provides better type safety and IntelliSense support.
- .oreorc.json
- .oreorc.ts
{
"id": "XXX",
"handler": "handler",
"entrypoint": "src/index.ts",
"metricsTypeName": "IMetrics"
}
import type { IContainerConfigData } from "@deepcrawl/oreo";
const config: IContainerConfigData = {
id: "XXX",
handler: "handler",
entrypoint: "src/index.ts",
metricsTypeName: "IMetrics",
};
export default config;
Your container can export one or more metrics. Each metric needs to have a unique name and a type.
Releasing new version of CustomMetricContainer
Once you are happy with your container and want to release a new version of it, you need to build and upload it.
npm run build
npm run upload
At this point you have a container that is published and ready to be linked with a project.
Adding custom metrics to a Lumar project
Running a metric link command without any arguments will fetch your default account's projects and CustomMetricContainer ID from .oreorc.json or .oreorc.ts.
npm run oreo metric link
npm run oreo metric link -- --projectIds 123456,12345,123454
Fetching metrics via Single Page Requester
Before commiting to a full crawl, you can test your custom metrics using the Single Page Requester.
npm run oreo project request-custom-metrics -- --projectId 123456 --url http://example.com/
This will output a table with your custom metrics.
Fetching the metrics from the crawl
Running a crawl
Once a CustomMetricContainer is linked to a project, the next time a crawl is run it will inherit the container and extract custom metrics.
You can start the crawl as you would usually would via the UI, or start crawling from the CLI.
npm run oreo crawl create -- --projectId 123456
Fetching the metrics
Once the crawl finishes, you can access custom metrics through Graph-API Explorer or using Analyze UI.
- Query
- Variables
- cURL
query FetchCustomMetrics($reportInput: GetReportInputType!) {
  getReport(input: $reportInput) {
    crawlUrls(first: 100) {
      nodes {
        url
        customMetrics
      }
    }
  }
}
{
  "reportInput": {
    "crawlId": 1762158,
    "reportTypeCode": "Basic",
    "reportTemplateCode": "all_pages"
  }
}
curl -X POST -H "Content-Type: application/json" -H "apollographql-client-name: docs-example-client" -H "apollographql-client-version: 1.0.0" -H "x-auth-token: YOUR_API_SESSION_TOKEN" --data '{"query":"query FetchCustomMetrics($reportInput: GetReportInputType!) { getReport(input: $reportInput) { crawlUrls(first: 100) { nodes { url customMetrics } } } }","variables":{"reportInput":{"crawlId":1762158,"reportTypeCode":"Basic","reportTemplateCode":"all_pages"}}}' https://api.lumar.io/graphql
Advanced topics
Providing extra metadata for custom metrics for UI
You can provide display names for your metrics in the .oreorc.json or .oreorc.ts file. This will make it easier to understand what the metric represents in the UI. You can also provide names for auto generated __count metrics.
- .oreorc.json
- .oreorc.ts
{
"metricsMetadata": {
  "pageTitle": {
    "title": "Page Title",
    "description": "The title of the page.",
    "type": "string"
  },
  "myObjects": {
    "title": "My Object With Specific Order",
    "description": "My Object With Specific Order",
    "type": "array",
    "items": {
      "type": "object",
      "properties": {
        "aString": {
          "title": "String Metric",
          "type": "string"
        },
        "cBoolean": {
          "title": "Boolean Metric",
          "description": "You can also provide description for specific properties in object arrays.",
          "type": "boolean"
        },
        "bNumber": {
          "title": "Number Metric",
          "type": "number"
        },
        "dateString": {
          "title": "My Date Field",
          "type": "string",
          "format": "date-time"
        }
      }
    }
  },
  "extract": {
    "type": "array",
    "items": {
      "type": "string"
    },
    "title": "Extracts",
    "description": "The extracted values"
  },
  "pageSize": {
    "title": "Page Size",
    "description": "Page size in bytes.",
    "type": "number",
    "format": "bytes"
  },
  "myFloat": {
    "type": "number"
  },
  "myInt": {
    "type": "number",
    "format": "integer"
  },
  "extract__count": {
    "title": "Extracts Count",
    "description": "Count of extracted values (auto-generated)"
  }
}
}
import type { IContainerConfigData } from "@deepcrawl/oreo";
const config: IContainerConfigData = {
metricsMetadata: {
pageTitle: {
title: "Page Title",
description: "The title of the page.",
type: "string",
},
myObjects: {
title: "My Object With Specific Order",
description: "My Object With Specific Order",
type: "array",
items: {
type: "object",
properties: {
aString: {
title: "String Metric",
type: "string",
},
cBoolean: {
title: "Boolean Metric",
description: "You can also provide description for specific properties in object arrays.",
type: "boolean",
},
bNumber: {
title: "Number Metric",
type: "number",
},
dateString: {
title: "My Date Field",
type: "string",
format: "date-time",
},
},
},
},
extract: {
type: "array",
items: {
type: "string",
},
title: "Extracts",
description: "The extracted values",
},
pageSize: {
title: "Page Size",
description: "Page size in bytes.",
type: "number",
format: "bytes",
},
myFloat: {
type: "number",
},
myInt: {
type: "number",
format: "integer",
},
extract__count: {
title: "Extracts Count",
description: "Count of extracted values (auto-generated)",
},
},
};
export default config;
You can also use JSDoc to provide metadata for your metrics
export interface MyMetrics extends MetricScriptBasicOutput {
  /**
   * @title Page Title
   * @description The title of the page.
   */
  pageTitle: string;
  /**
   * Order of properties will be kept in the UI.
   *
   * @title My Object With Specific Order
   * @description My Object With Specific Order
   */
  myObjects: Array<{
    /**
     * @title String Metric
     */
    aString: string;
    /**
     * @title Boolean Metric
     * @description You can also provide description for specific properties in object arrays.
     */
    cBoolean?: boolean;
    /**
     * @title Number Metric
     */
    bNumber: number;
    /**
     * @title My Date Field
     * @format date-time
     */
    dateString: string;
  }>;
  url: string;
  extract: string[];
  /**
   * @title Page Size
   * @description Page size in bytes.
   * @format bytes
   */
  pageSize: number;
  myFloat: number;
  /**
   * @format integer
   */
  myInt: number;
}
Report templates
You can define report templates directly in your .oreorc.json or .oreorc.ts file. Report templates allow you to create predefined filters and views for your custom metrics, making it easier to analyze specific subsets of your data.
Set reportTemplates to an array of template definitions. Each entry consists of:
- code: unique identifier using lowercase letters, numbers, or underscores (no spaces) and must be unique across templates
- filter: filter criteria based on your custom metrics
- baseReportTemplateCode: the template code of the base report template (typically- "all_pages")
- name(optional): descriptive name for the template
- description(optional): detailed description of what the template shows
- orderBy(optional): array of sorting rules for the resulting report, each with a- fieldand- direction(- "ASC"or- "DESC")
- metricsGroupings(optional): array of arrays that control how metrics are grouped and ordered in the UI
- reportCategories(optional): array of category definitions used to organise the template in the UI
orderBy entries are applied in sequence, allowing you to define primary, secondary, and further sort keys. Use the column identifiers exposed by the base template or your custom metric paths, such as "customMetrics.randomNumber" or "url".
metricsGroupings define the column arrangement the UI should use when rendering the report. Each inner array represents a group of metrics shown together, in the order provided. Groups are rendered from top to bottom; the first group becomes the default set of columns visible to users.
When you supply multiple categories, list them starting with the deepest category. The first entry is used to build breadcrumbs, and each category can reference its parent via parentCode.
- .oreorc.json
- .oreorc.ts
{
"reportTemplates": [
  {
    "code": "random_above_50",
    "filter": {
      "_and": [
        {
          "customMetrics": {
            "randomNumber": {
              "ge": 0.6
            }
          }
        },
        {
          "url": {
            "beginsWith": "https://example.com"
          }
        }
      ]
    },
    "baseReportTemplateCode": "all_pages",
    "orderBy": [
      { "field": "customMetrics.randomNumber", "direction": "DESC" },
      { "field": "url", "direction": "ASC" }
    ],
    "metricsGroupings": [
      ["pageTitle", "url", "description", "foundAtUrl"],
      ["customMetrics.randomNumber"]
    ],
    "reportCategories": [
      {
        "code": "performance",
        "name": "Performance",
        "parentCode": {
          "code": "seo",
          "name": "SEO"
        }
      }
    ]
  },
  {
    "code": "h1_tags_include_lumar",
    "name": "H1 tags include Lumar",
    "description": "my h1 tags include Lumar",
    "filter": {
      "customMetrics": {
        "h1Tags": {
          "arrayContainsLike": "Lumar"
        }
      }
    },
    "baseReportTemplateCode": "all_pages"
  }
]
}
import type { IContainerConfigData } from "@deepcrawl/oreo";
const config: IContainerConfigData = {
reportTemplates: [
{
code: "random_above_60",
name: "Random number above or equal to 0.6",
description: "my random number is above or equal to 0.6",
filter: {
_and: [
{
customMetrics: {
randomNumber: {
ge: 0.6,
},
},
},
{
url: {
beginsWith: "https://example.com",
},
},
],
},
baseReportTemplateCode: "all_pages",
orderBy: [
{ field: "customMetrics.randomNumber", direction: "DESC" },
{ field: "url", direction: "ASC" },
],
metricsGroupings: [
["pageTitle", "url", "description", "foundAtUrl"],
["customMetrics.randomNumber"],
],
reportCategories: [
{
code: "performance",
name: "Performance",
parentCode: {
code: "seo",
name: "SEO",
},
},
],
},
{
code: "h1_tags_include_lumar",
name: "H1 tags include Lumar",
description: "my h1 tags include Lumar",
filter: {
customMetrics: {
h1Tags: {
arrayContainsLike: "Lumar",
},
},
},
baseReportTemplateCode: "all_pages",
},
],
};
export default config;
Available filter predicates
The following filter predicates are available based on the metric type:
String predicates:
- eq- equals
- ne- not equals
- contains- contains substring
- notContains- does not contain substring
- beginsWith- starts with
- endsWith- ends with
- matchesRegex- matches regular expression
- notMatchesRegex- does not match regular expression
- in- value is in array
- notIn- value is not in array
- isEmpty- is empty string
- isNull- is null
Number predicates:
- eq- equals
- ne- not equals
- gt- greater than
- ge- greater than or equal
- lt- less than
- le- less than or equal
- in- value is in array
- notIn- value is not in array
- isEmpty- is empty
- isNull- is null
Array predicates:
- arrayContains- array contains exact value
- arrayContainsLike- array contains value (case-insensitive)
- arrayNotContains- array does not contain exact value
- arrayNotContainsLike- array does not contain value (case-insensitive)
- isEmpty- array is empty
- isNull- array is null
Boolean predicates:
- eq- equals
- ne- not equals
- isNull- is null
Crawl-level metrics
By default, custom metrics are stored at the URL level, meaning each URL gets its own set of metrics. However, you can configure your container to store metrics at the crawl level instead, which allows you to aggregate data across multiple URLs or store crawl-wide statistics.
To enable crawl-level metrics, you need to specify the tableType in your container configuration and return special metadata fields in your metrics.
- .oreorc.json
- .oreorc.ts
{
"id": "605",
"handlers": {
  "request": {
    "entrypoint": "src/my-func.ts",
    "handler": "myHandler",
    "metricsTypeName": "MyMetrics",
    "tableType": "dc:crawler:project_metrics:item"
  }
}
}
import type { IContainerConfigData } from "@deepcrawl/oreo";
const config: IContainerConfigData = {
id: "605",
handlers: {
request: {
entrypoint: "src/my-func.ts",
handler: "myHandler",
metricsTypeName: "MyMetrics",
tableType: "dc:crawler:project_metrics:item",
},
},
};
export default config;
When using crawl-level metrics, your handler must return an array of objects instead of a single object. Each object in the array represents a separate metric record and must include special metadata fields:
- @stepId: The crawl step ID (available as- input.id)
- @itemType: A string identifier for the type of metric being stored
- @itemKey: A unique key for this specific metric record
export interface MyMetrics extends MetricScriptBasicOutput {
  randomNumber: number;
  [`@stepId`]: string;
  [`@itemType`]: string;
  [`@itemKey`]: string;
}
export const myHandler: MetricScriptHandler<MyMetrics> = input => {
  const randomNumber = Math.random();
  return [
    {
      randomNumber,
      [`@stepId`]: input.id,
      [`@itemType`]: "random-number",
      [`@itemKey`]: `${input.url}`,
    },
  ];
};
Crawl-level metrics are useful for:
- Storing aggregated statistics across multiple URLs
- Creating crawl-wide reports and dashboards
- Tracking metrics that don't belong to specific URLs
- Building custom analytics that span the entire crawl
Native dependencies
You can use native dependencies in your custom metrics by including them in the externalPackages array in the .oreorc.json or .oreorc.ts file. You also need to have them in your package.json dependencies so we can install correct version.
- .oreorc.json
- .oreorc.ts
{
"externalPackages": ["sharp"]
}
import type { IContainerConfigData } from "@deepcrawl/oreo";
const config: IContainerConfigData = {
externalPackages: ["sharp"],
};
export default config;
Secrets in custom metric containers
Sometimes there is a need to pass in a secret into your Container, or other variables which are unique to a project. (For example OPENAI_APIKEY.) You can set secrets for your CustomMetricContainer which will be accessible via environment variables.
const openaiApiKey = process.env["OPENAI_APIKEY"];
You can set the secret from the CLI
npm run oreo metric secret set -- --name OPENAI_APIKEY --projectId 123456 --value "mySecretKey"
or using GraphQL API directly
- Query
- Variables
- cURL
mutation setCustomMetricContainerProjectSecret(
  $input: SetCustomMetricContainerProjectSecretInput!
) {
  setCustomMetricContainerProjectSecret(input: $input) {
    customMetricContainerProjectSecret {
      name
    }
  }
}
{
  "input": {
    "projectId": 1, 
    "customMetricContainerId": 1, 
    "name": "OPENAI_APIKEY", 
    "value": "MY API SECRET KEY" 
  }
}
curl -X POST -H "Content-Type: application/json" -H "apollographql-client-name: docs-example-client" -H "apollographql-client-version: 1.0.0" -H "x-auth-token: YOUR_API_SESSION_TOKEN" --data '{"query":"mutation setCustomMetricContainerProjectSecret( $input: SetCustomMetricContainerProjectSecretInput! ) { setCustomMetricContainerProjectSecret(input: $input) { customMetricContainerProjectSecret { name } } }","variables":{"input":{"projectId":1,"customMetricContainerId":1,"name":"OPENAI_APIKEY","value":"MY API SECRET KEY"}}}' https://api.lumar.io/graphql
Secrets are set on the project level. So if you have multiple projects using the same container, each project will need to have the secrets set.
CI/CD integration
You can integrate your custom metric container with your CI/CD pipeline. For example, you can use GitHub Actions to build and upload your container. For this you will need to login to the CLI programmatically without user interaction with a Lumar ACCOUNT_ID, API_KEY_ID and API_KEY_SECRET.
To create API_KEY_ID and API_KEY_SECRET you can use the CLI command locally or do so via Lumar Accounts app where you can also find ACCOUNT_ID.
npm run oreo user-key create
Once you have all secrets you can use them in your CI/CD worflow file.
npm run oreo login -- --id ${{ secrets.API_KEY_ID }} --secret ${{ secrets.API_KEY_SECRET }} --accountId ${{ secrets.ACCOUNT_ID }}
npm run build
npm run upload
Programmatic access
If you would like to run your custom metric container programmatically, you can do so using @deepcrawl/oreo-api-sdk package.
For more information, see Single Page Requester.
Container failures
If your container fails to extract metrics and returns an error, the information is stored as a separate metric containerExecutionFailures. Failing containers will not stop the crawl.
Supported types for filtering
Even though custom metric containers can extract and store almost any data type, not all of them will be queryable via the API in our UI.
Supported filterable types are:
- boolean
- number
- number[]
- string
- string[]
Automatic __count metrics for arrays
If your metric returns an array, we will automatically generate a metric that counts the number of elements in the array. This metric will have the same name as the original metric with __count suffix.
Universal container
Project created from running bootstrap command will have a specific type either for DOM or Puppeteer specified, but you can create a universal container that can handle both, using input.inputType, input.resourceType to narrow down type during extraction.
export interface IMetrics extends MetricScriptBasicOutput {
  isImage: boolean;
  wasJsRenderingEnabled: boolean;
}
export const myHandler: MetricScriptHandler<IMetrics> = input => {
  if (input.resourceType === "document") {
    if (input.inputType === "dom") {
      // do extractions without puppeteer
      return {
        wasJsRenderingEnabled: false,
      };
    } else if (input.inputType === "puppeteer") {
      // do extractions with puppeteer
      return {
        wasJsRenderingEnabled: true,
      };
    }
  } else if (input.resourceType === "image") {
    // do extractions for images
    return {
      isImage: true,
      wasJsRenderingEnabled: input.inputType === "puppeteer",
    };
  }
};
Google Search Console
If your container needs to be able to connect to GSC. It should be enabled via linkedExternalSources during creation or updating of custom metric container. You need to include googleSearchConsole in the array.
Once enabled, publish a new version of container code and during runtime externalSources will include configuration googleSearchConsole option.
Changing API Url
npx @deepcrawl/oreo@latest config set --name=apiUrl --value=https://api.staging.lumar.io/graphql