Non-Indexable Pages with Bot Hits

All pages that are non-indexable but still receive google bot hits

Priority: Low

Impact: Negative

What issues it may cause

Crawl budget is being spent on pages that are not able to drive any organic search traffic and that could be better spent on higher value pages or pages that are more frequently updated.

How do you fix it

The pages can be disallowed to prevent them being crawled, although this prevents any PageRank from backlinks being passed to other pages.

Internal links can be removed, or have a nofollow applied which will hide the links from search engines.

The pages should be removed from any Sitemaps which is a signal to search engines that the pages have value and might have become indexable.

What is the positive impact

Crawl budget spent crawling the non-indexable pages may be reduced, allowing crawl budget to be used more important pages, or saving on server costs.

How to fetch the data for this report template

You will need to run a crawl for report template to generate report. When report has been generated and you have crawl id you can fetch data for the report using the following query:

Query
Variables
cURL

query GetReportStatForCrawl(
    $crawlId: ObjectID!
    $reportTemplateCode: String!
    $after: String
   ) {
      getReportStat(
        input: {crawlId: $crawlId, reportTemplateCode: $reportTemplateCode}
      ) {
        crawlUrls(after: $after, reportType: Basic) {
          nodes {
            pageTitle
            url
            foundAtUrl
            logRequestsTotal
            indexable
            httpStatusCode
            noindex
            canonicalizedPage
            nofollowedPage
            disallowedPage
            unavailableAfter
            foundInGoogleAnalytics
            foundInGoogleSearchConsole
            foundInBacklinks
            foundInList
            foundInLogSummary
            foundInWebCrawl
            foundInSitemap
          }
          totalCount
          pageInfo {
            endCursor
            hasNextPage
          }
        }
     }
   }

{"crawlId":"TjAwNUNyYXdsNDAwMA","reportTemplateCode":"non_indexable_pages_with_bot_hits"}

curl -X POST -H "Content-Type: application/json" -H "apollographql-client-name: docs-example-client" -H "apollographql-client-version: 1.0.0" -H "x-auth-token: YOUR_API_SESSION_TOKEN" --data '{"query":"query GetReportStatForCrawl( $crawlId: ObjectID! $reportTemplateCode: String! $after: String ) { getReportStat( input: {crawlId: $crawlId, reportTemplateCode: $reportTemplateCode} ) { crawlUrls(after: $after, reportType: Basic) { nodes { pageTitle url foundAtUrl logRequestsTotal indexable httpStatusCode noindex canonicalizedPage nofollowedPage disallowedPage unavailableAfter foundInGoogleAnalytics foundInGoogleSearchConsole foundInBacklinks foundInList foundInLogSummary foundInWebCrawl foundInSitemap } totalCount pageInfo { endCursor hasNextPage } } } }","variables":{"crawlId":"TjAwNUNyYXdsNDAwMA","reportTemplateCode":"non_indexable_pages_with_bot_hits"}}' https://api.lumar.io/graphql

Try in explorer

What issues it may cause​

How do you fix it​

What is the positive impact​

How to fetch the data for this report template​

What issues it may cause

How do you fix it

What is the positive impact

How to fetch the data for this report template