# Tutorial: Export Crawl Data https://api-docs.lumar.io/docs/graphql/tutorials/export-crawl-data This tutorial covers how to export crawl data from Lumar, including report downloads for bulk export and pagination for programmatic access. ## Option 1: Report downloads (recommended for bulk export) Report downloads generate a downloadable file (CSV or other formats) containing the full dataset for a report. This is the most efficient way to export large amounts of data. ### Step 1: Create a report download Use the `createReportDownload` mutation to request a file. You can specify which metrics (columns) to include and apply filters. ```graphql mutation CreateReportDownload($input: CreateReportDownloadInput!) { createReportDownload(input: $input) { reportDownload { ...ReportDownloadDetails } } } fragment ReportDownloadDetails on ReportDownload { id status outputType # ...other fields you want to retrieve } ``` **Variables:** ```json { "input": { "crawlId": "TjAwNUNyYXdsMTc2NjI0MQ", "reportTemplateCode": "all_pages", "reportTypeCode": "Basic", "selectedMetrics": ["url", "httpStatusCode"], "fileName": "url-error-status-codes", "filter": { "httpStatusCode": { "ge": 500 } } } } ``` **Response:** ```json { "data": { "createReportDownload": { "reportDownload": { "id": "TjAxNFJlcG9ydERvd25sb2FkNzU4Njg3", "status": "Generating", "outputType": "CsvZip" } } } } ``` ### Step 2: Poll for download completion The report download starts with a `Generating` status. Poll until it reaches `Completed`, then use the `fileUrl` to download the file. ```graphql query GetReportDownloadStatus($reportDownloadId: ObjectID!) { node(id: $reportDownloadId) { ... on ReportDownload { id status outputType fileURL createdAt } } } ``` **Variables:** ```json { "reportDownloadId": "TjAxNFJlcG9ydERvd25sb2FkNzU4Njg3" } ``` **Response:** ```json { "data": { "node": { "id": "TjAxNFJlcG9ydERvd25sb2FkNzU4Njg3", "status": "Completed", "outputType": "CsvZip", "fileURL": "https://storage.example.com/reports/download.csv.zip", "createdAt": "2025-01-15T10:00:00.000Z" } } } ``` ```typescript async function waitForDownload(reportDownloadId: string): Promise { while (true) { const result = await executeQuery(STATUS_QUERY, { reportDownloadId }); const download = result.data.node; if (download.status === "Completed") { return download.fileUrl; } if (download.status === "Failed") { throw new Error("Report download failed"); } console.log(`Status: ${download.status}. Checking again in 10s...`); await new Promise(resolve => setTimeout(resolve, 10000)); } } ``` ### Step 3: Download the file The `fileUrl` is a signed URL that you can download using any HTTP client: ```bash curl -o report.csv.zip "SIGNED_FILE_URL_HERE" ``` ## Option 2: Paginated API queries For smaller datasets or when you need real-time access, paginate through the API directly. ```graphql query ExportCrawlUrls($crawlId: ObjectID!, $cursor: String) { getReportStat( input: { crawlId: $crawlId, reportTemplateCode: "all_pages" } ) { crawlUrls(first: 500, after: $cursor) { pageInfo { hasNextPage endCursor } nodes { url httpStatusCode pageTitle wordCount fetchTime } totalCount } } } ``` **Variables:** ```json { "crawlId": "TjAwNUNyYXdsMTU4MzI0NQ", "cursor": null } ``` **Response:** ```json { "data": { "getReportStat": { "crawlUrls": { "pageInfo": { "hasNextPage": true, "endCursor": "NTAw" }, "nodes": [ { "url": "https://www.example.com/", "httpStatusCode": 200, "pageTitle": "Home - Example", "wordCount": 1250, "fetchTime": 0.42 } ], "totalCount": 2186 } } } } ``` ### Pagination loop ```typescript async function exportAllUrls(crawlId: string): Promise { const allUrls: any[] = []; let cursor: string | null = null; let hasNextPage = true; while (hasNextPage) { const result = await executeQuery(EXPORT_QUERY, { crawlId, cursor }); const connection = result.data.getReportStat.crawlUrls; allUrls.push(...connection.nodes); hasNextPage = connection.pageInfo.hasNextPage; cursor = connection.pageInfo.endCursor; console.log(`Fetched ${allUrls.length} / ${connection.totalCount} URLs`); } return allUrls; } ``` ## Tips for large datasets - **Use report downloads** for datasets over 10,000 URLs. Paginating through tens of thousands of records via the API is slow and consumes your rate limit budget. - **Select only the metrics you need** in `selectedMetrics` to reduce file size. - **Apply filters** to limit the export to relevant URLs (e.g., only broken pages or a specific segment). - **Use `first: 500`** as a reasonable page size when paginating via the API. Larger page sizes increase response time. ## Next steps - [Generate Report Downloads](../generate-report-downloads.md) -- detailed reference for the report download workflow. - [Filtering](../filtering.md) -- apply filters to narrow down exported data. - [Pagination](../pagination.md) -- full cursor-based pagination reference.