Skip to main content

Tutorial: Export Crawl Data

This tutorial covers how to export crawl data from Lumar, including report downloads for bulk export and pagination for programmatic access.

Report downloads generate a downloadable file (CSV or other formats) containing the full dataset for a report. This is the most efficient way to export large amounts of data.

Step 1: Create a report download

Use the createReportDownload mutation to request a file. You can specify which metrics (columns) to include and apply filters.

Operation: mutation CreateReportDownload($input: CreateReportDownloadInput!) { createReportDownload(input: $input) { reportDownload { ...ReportDownloadDetails } } } fragment ReportDownloadDetails on ReportDownload { id status outputType # ...other fields you want to retrieve }Variables: { "input": { "crawlId": "TjAwNUNyYXdsMTc2NjI0MQ", "reportTemplateCode": "all_pages", "reportTypeCode": "Basic", "selectedMetrics": ["url", "httpStatusCode"], "fileName": "url-error-status-codes", "filter": { "httpStatusCode": { "ge": 500 } } } }Response Example: { "data": { "createReportDownload": { "reportDownload": { "id": "TjAxNFJlcG9ydERvd25sb2FkNzU4Njg3", "status": "Generating", "outputType": "CsvZip" } } } }
CreateReportDownloadTry in Explorer
GraphQL
mutation CreateReportDownload($input: CreateReportDownloadInput!) {
createReportDownload(input: $input) {
reportDownload {
...ReportDownloadDetails
}
}
}

fragment ReportDownloadDetails on ReportDownload {
id
status
outputType
# ...other fields you want to retrieve
}

Step 2: Poll for download completion

The report download starts with a Generating status. Poll until it reaches Completed, then use the fileUrl to download the file.

Operation: query GetReportDownloadStatus($reportDownloadId: ObjectID!) { node(id: $reportDownloadId) { ... on ReportDownload { id status outputType fileURL createdAt } } }Variables: { "reportDownloadId": "TjAxNFJlcG9ydERvd25sb2FkNzU4Njg3" }Response Example: { "data": { "node": { "id": "TjAxNFJlcG9ydERvd25sb2FkNzU4Njg3", "status": "Completed", "outputType": "CsvZip", "fileURL": "https://storage.example.com/reports/download.csv.zip", "createdAt": "2025-01-15T10:00:00.000Z" } } }
GetReportDownloadStatusTry in Explorer
GraphQL
query GetReportDownloadStatus($reportDownloadId: ObjectID!) {
node(id: $reportDownloadId) {
... on ReportDownload {
id
status
outputType
fileURL
createdAt
}
}
}
async function waitForDownload(reportDownloadId: string): Promise<string> {
while (true) {
const result = await executeQuery(STATUS_QUERY, { reportDownloadId });
const download = result.data.node;

if (download.status === "Completed") {
return download.fileUrl;
}

if (download.status === "Failed") {
throw new Error("Report download failed");
}

console.log(`Status: ${download.status}. Checking again in 10s...`);
await new Promise(resolve => setTimeout(resolve, 10000));
}
}

Step 3: Download the file

The fileUrl is a signed URL that you can download using any HTTP client:

curl -o report.csv.zip "SIGNED_FILE_URL_HERE"

Option 2: Paginated API queries

For smaller datasets or when you need real-time access, paginate through the API directly.

Operation: query ExportCrawlUrls($crawlId: ObjectID!, $cursor: String) { getReportStat( input: { crawlId: $crawlId, reportTemplateCode: "all_pages" } ) { crawlUrls(first: 500, after: $cursor) { pageInfo { hasNextPage endCursor } nodes { url httpStatusCode pageTitle wordCount fetchTime } totalCount } } }Variables: { "crawlId": "TjAwNUNyYXdsMTU4MzI0NQ", "cursor": null }Response Example: { "data": { "getReportStat": { "crawlUrls": { "pageInfo": { "hasNextPage": true, "endCursor": "NTAw" }, "nodes": [ { "url": "https://www.example.com/", "httpStatusCode": 200, "pageTitle": "Home - Example", "wordCount": 1250, "fetchTime": 0.42 } ], "totalCount": 2186 } } } }
ExportCrawlUrlsTry in Explorer
GraphQL
query ExportCrawlUrls($crawlId: ObjectID!, $cursor: String) {
getReportStat(
input: { crawlId: $crawlId, reportTemplateCode: "all_pages" }
) {
crawlUrls(first: 500, after: $cursor) {
pageInfo {
hasNextPage
endCursor
}
nodes {
url
httpStatusCode
pageTitle
wordCount
fetchTime
}
totalCount
}
}
}

Pagination loop

async function exportAllUrls(crawlId: string): Promise<any[]> {
const allUrls: any[] = [];
let cursor: string | null = null;
let hasNextPage = true;

while (hasNextPage) {
const result = await executeQuery(EXPORT_QUERY, { crawlId, cursor });
const connection = result.data.getReportStat.crawlUrls;

allUrls.push(...connection.nodes);
hasNextPage = connection.pageInfo.hasNextPage;
cursor = connection.pageInfo.endCursor;

console.log(`Fetched ${allUrls.length} / ${connection.totalCount} URLs`);
}

return allUrls;
}

Tips for large datasets

  • Use report downloads for datasets over 10,000 URLs. Paginating through tens of thousands of records via the API is slow and consumes your rate limit budget.
  • Select only the metrics you need in selectedMetrics to reduce file size.
  • Apply filters to limit the export to relevant URLs (e.g., only broken pages or a specific segment).
  • Use first: 500 as a reasonable page size when paginating via the API. Larger page sizes increase response time.

Next steps