Error Pages with Bot Hits
All pages that are invalid (return an error HTTP Status code) but still receive google bot hits
Priority: High
Impact: Negative
What issues it may cause
Pages which have changed to an error status, such as 404 or 410, need to be crawled by search engines in order for the error status to be discovered so the pages can be removed from the index.
Search engines will also need to periodically recrawl pages which return an error status so check the pages are not working again, otherwise they would not be able to reindex the pages.
However if a significant amount of crawl budget is being spent crawling the permanently removed pages then this may consume crawl budget which could be better used on updating other pages.
How do you fix it
If a 404 status is being used, consider switching to a 410 status which is a stronger indication that the page has been permanently removed and will not need to be crawled as often.
- Internal links to the broken pages should be removed.
- The error pages should be removed from any Sitemaps they are included in.
What is the positive impact
Crawl budget can be saved so other pages may be crawled more frequently, or save on server costs.
How to fetch the data for this report template
You will need to run a crawl for report template to generate report. When report has been generated and you have crawl id you can fetch data for the report using the following query:
- Query
- Variables
- cURL
query GetReportStatForCrawl(
$crawlId: ObjectID!
$reportTemplateCode: String!
$after: String
) {
getReportStat(
input: {crawlId: $crawlId, reportTemplateCode: $reportTemplateCode}
) {
crawlUrls(after: $after, reportType: Basic) {
nodes {
pageTitle
url
foundAtUrl
deeprank
level
logRequestsTotal
httpStatusCode
indexable
duplicatePage
foundInGoogleAnalytics
foundInGoogleSearchConsole
foundInBacklinks
foundInList
foundInLogSummary
foundInWebCrawl
foundInSitemap
}
totalCount
pageInfo {
endCursor
hasNextPage
}
}
}
}
{"crawlId":"TjAwNUNyYXdsNDAwMA","reportTemplateCode":"error_pages_with_bot_hits"}
curl -X POST -H "Content-Type: application/json" -H "apollographql-client-name: docs-example-client" -H "apollographql-client-version: 1.0.0" -H "x-auth-token: YOUR_API_SESSION_TOKEN" --data '{"query":"query GetReportStatForCrawl( $crawlId: ObjectID! $reportTemplateCode: String! $after: String ) { getReportStat( input: {crawlId: $crawlId, reportTemplateCode: $reportTemplateCode} ) { crawlUrls(after: $after, reportType: Basic) { nodes { pageTitle url foundAtUrl deeprank level logRequestsTotal httpStatusCode indexable duplicatePage foundInGoogleAnalytics foundInGoogleSearchConsole foundInBacklinks foundInList foundInLogSummary foundInWebCrawl foundInSitemap } totalCount pageInfo { endCursor hasNextPage } } } }","variables":{"crawlId":"TjAwNUNyYXdsNDAwMA","reportTemplateCode":"error_pages_with_bot_hits"}}' https://api.lumar.io/graphql