Disallowed Pages with Bot Hits
Pages that are disallowed, but which were crawled by search engine crawlers
Priority: Critical
Impact: Negative
What issues it may cause
These pages may have been disallowed within the timeframe of the log data (in which case you should ensure that they are intentionally disallowed), or the server logs may be recording a different URL to the one which was requested.
How do you fix it
If the pages were disallowed during the timeframe of the log data they will disappear from this report when the timeframe of the log data is changed.
If the pages were not disallowed during the timeframe of the log data, the server logs should be checked to ensure they use the exact requested URLs.
What is the positive impact
The log files will provide a more accurate understanding of crawl budget useage.
How to fetch the data for this report template
You will need to run a crawl for report template to generate report. When report has been generated and you have crawl id you can fetch data for the report using the following query:
- Query
- Variables
- cURL
query GetReportStatForCrawl(
    $crawlId: ObjectID!
    $reportTemplateCode: String!
    $after: String
   ) {
      getReportStat(
        input: {crawlId: $crawlId, reportTemplateCode: $reportTemplateCode}
      ) {
        crawlUrls(after: $after, reportType: Basic) {
          nodes {
            pageTitle
            url
            description
            foundAtUrl
            logRequestsTotal
            deeprank
            level
            disallowedPage
            logRequestsDesktop
            logRequestsMobile
            robotsTxtRuleMatch
            foundInGoogleAnalytics
            foundInGoogleSearchConsole
            foundInBacklinks
            foundInList
            foundInLogSummary
            foundInWebCrawl
            foundInSitemap
          }
          totalCount
          pageInfo {
            endCursor
            hasNextPage
          }
        }
     }
   }
{"crawlId":"TjAwNUNyYXdsNDAwMA","reportTemplateCode":"disallowed_pages_with_bot_hits"}
curl -X POST -H "Content-Type: application/json" -H "apollographql-client-name: docs-example-client" -H "apollographql-client-version: 1.0.0" -H "x-auth-token: YOUR_API_SESSION_TOKEN" --data '{"query":"query GetReportStatForCrawl( $crawlId: ObjectID! $reportTemplateCode: String! $after: String ) { getReportStat( input: {crawlId: $crawlId, reportTemplateCode: $reportTemplateCode} ) { crawlUrls(after: $after, reportType: Basic) { nodes { pageTitle url description foundAtUrl logRequestsTotal deeprank level disallowedPage logRequestsDesktop logRequestsMobile robotsTxtRuleMatch foundInGoogleAnalytics foundInGoogleSearchConsole foundInBacklinks foundInList foundInLogSummary foundInWebCrawl foundInSitemap } totalCount pageInfo { endCursor hasNextPage } } } }","variables":{"crawlId":"TjAwNUNyYXdsNDAwMA","reportTemplateCode":"disallowed_pages_with_bot_hits"}}' https://api.lumar.io/graphql