Skip to main content

Broken/Disallowed Sitemaps

Sitemaps included in the crawl which return a broken status code (400, 404, 410, 500, or 501) or are disallowed by the robots.txt file.

Priority: Critical

Impact: Negative

What issues it may causeโ€‹

Sitemaps must be crawlable and return a 200 status in order to be processed by search engines.

How do you fix itโ€‹

If the Sitemaps are broken, they should be reviewed to determine if they are expected to work. If they are no longer used then they should be removed from the crawl and the robots.txt if they are included.

If the Sitemap URLs are disallowed, the robots.txt rules should be reviewed to identify the specific rules, and changed to allow the Sitemaps to be crawled.

What is the positive impactโ€‹

The Sitemap can be crawled and processed by search engines.

How to fetch the data for this report templateโ€‹

You will need to run a crawl for report template to generate report. When report has been generated and you have crawl id you can fetch data for the report using the following query:

query GetReportForCrawl($crawlId: ObjectID!, $reportTemplateCode: String!) {
getCrawl(id: $crawlId) {
reportsByCode(
input: {
reportTypeCodes: Basic
reportTemplateCodes: [$reportTemplateCode]
}
) {
rows {
nodes {
... on CrawlSitemaps {
url
sitemapType
urlCount
httpStatusCode
restrictedReason
sitemapType
logRequestsTotal
}
}
}
}
}
}

Try in explorer