Disallowed Pages with Bot Hits (Uncrawled)
Pages that are disallowed, but which were crawled by search engine crawlers
Priority: Critical
Impact: Negative
What issues it may cause
These pages may have been disallowed within the timeframe of the log data (in which case you should ensure that they are intentionally disallowed), or the server logs may be recording a different URL to the one which was requested.
How do you fix it
If the pages were disallowed during the timeframe of the log data they will disappear from this report when the timeframe of the log data is changed.
If the pages were not disallowed during the timeframe of the log data, the server logs should be checked to ensure they use the exact requested URLs.
What is the positive impact
The log files will provide a more accurate understanding of crawl budget useage.
How to fetch the data for this report template
You will need to run a crawl for report template to generate report. When report has been generated and you have crawl id you can fetch data for the report using the following query:
- Query
- Variables
- cURL
query GetReportStatForCrawl(
$crawlId: ObjectID!
$reportTemplateCode: String!
$after: String
) {
getReportStat(
input: {crawlId: $crawlId, reportTemplateCode: $reportTemplateCode}
) {
crawlUncrawledUrls(after: $after, reportType: Basic) {
nodes {
url
foundAtUrl
foundAtSitemap
level
restrictedReason
robotsTxtRuleMatch
foundInWebCrawl
foundInGoogleAnalytics
foundInGoogleSearchConsole
foundInBacklinks
foundInList
foundInLogSummary
foundInSitemap
}
totalCount
pageInfo {
endCursor
hasNextPage
}
}
}
}
{"crawlId":"TjAwNUNyYXdsNDAwMA","reportTemplateCode":"disallowed_pages_with_bot_hits_uncrawled"}
curl -X POST -H "Content-Type: application/json" -H "apollographql-client-name: docs-example-client" -H "apollographql-client-version: 1.0.0" -H "x-auth-token: YOUR_API_SESSION_TOKEN" --data '{"query":"query GetReportStatForCrawl( $crawlId: ObjectID! $reportTemplateCode: String! $after: String ) { getReportStat( input: {crawlId: $crawlId, reportTemplateCode: $reportTemplateCode} ) { crawlUncrawledUrls(after: $after, reportType: Basic) { nodes { url foundAtUrl foundAtSitemap level restrictedReason robotsTxtRuleMatch foundInWebCrawl foundInGoogleAnalytics foundInGoogleSearchConsole foundInBacklinks foundInList foundInLogSummary foundInSitemap } totalCount pageInfo { endCursor hasNextPage } } } }","variables":{"crawlId":"TjAwNUNyYXdsNDAwMA","reportTemplateCode":"disallowed_pages_with_bot_hits_uncrawled"}}' https://api.lumar.io/graphql