Error Pages with Bot Hits

All pages that are invalid (return an error HTTP Status code) but still receive google bot hits

Priority: High

Impact: Negative

What issues it may cause

Pages which have changed to an error status, such as 404 or 410, need to be crawled by search engines in order for the error status to be discovered so the pages can be removed from the index.

Search engines will also need to periodically recrawl pages which return an error status so check the pages are not working again, otherwise they would not be able to reindex the pages.

However if a significant amount of crawl budget is being spent crawling the permanently removed pages then this may consume crawl budget which could be better used on updating other pages.

How do you fix it

If a 404 status is being used, consider switching to a 410 status which is a stronger indication that the page has been permanently removed and will not need to be crawled as often.
Internal links to the broken pages should be removed.
The error pages should be removed from any Sitemaps they are included in.

What is the positive impact

Crawl budget can be saved so other pages may be crawled more frequently, or save on server costs.

How to fetch the data for this report template

You will need to run a crawl for report template to generate report. When report has been generated and you have crawl id you can fetch data for the report using the following query:

Query
Variables
cURL

query GetReportStatForCrawl(
    $crawlId: ObjectID!
    $reportTemplateCode: String!
    $after: String
   ) {
      getReportStat(
        input: {crawlId: $crawlId, reportTemplateCode: $reportTemplateCode}
      ) {
        crawlUrls(after: $after, reportType: Basic) {
          nodes {
            pageTitle
            url
            foundAtUrl
            deeprank
            level
            logRequestsTotal
            httpStatusCode
            indexable
            duplicatePage
            foundInGoogleAnalytics
            foundInGoogleSearchConsole
            foundInBacklinks
            foundInList
            foundInLogSummary
            foundInWebCrawl
            foundInSitemap
          }
          totalCount
          pageInfo {
            endCursor
            hasNextPage
          }
        }
     }
   }

{"crawlId":"TjAwNUNyYXdsNDAwMA","reportTemplateCode":"error_pages_with_bot_hits"}

curl -X POST -H "Content-Type: application/json" -H "apollographql-client-name: docs-example-client" -H "apollographql-client-version: 1.0.0" -H "x-auth-token: YOUR_API_SESSION_TOKEN" --data '{"query":"query GetReportStatForCrawl( $crawlId: ObjectID! $reportTemplateCode: String! $after: String ) { getReportStat( input: {crawlId: $crawlId, reportTemplateCode: $reportTemplateCode} ) { crawlUrls(after: $after, reportType: Basic) { nodes { pageTitle url foundAtUrl deeprank level logRequestsTotal httpStatusCode indexable duplicatePage foundInGoogleAnalytics foundInGoogleSearchConsole foundInBacklinks foundInList foundInLogSummary foundInWebCrawl foundInSitemap } totalCount pageInfo { endCursor hasNextPage } } } }","variables":{"crawlId":"TjAwNUNyYXdsNDAwMA","reportTemplateCode":"error_pages_with_bot_hits"}}' https://api.lumar.io/graphql

Try in explorer

What issues it may cause​

How do you fix it​

What is the positive impact​

How to fetch the data for this report template​

What issues it may cause

How do you fix it

What is the positive impact

How to fetch the data for this report template