Disallowed/Malformed URLs in Sitemaps
URLs which were found in sitemaps, but could not be crawled because they were disallowed, or malformed.
Priority: Medium
Impact: Negative
What issues it may causeโ
Disallowed and malformed URLs cannot be crawled by search engines and should not be included within sitemaps.
How do you fix itโ
Review the robots.txt rules to ensure the URLs have been disallowed correctly, or remove all of the disallowed URLs from sitemaps.
Review the malformed URLs and either remove them from the Sitemap or update the URLs to be valid URLs.
What is the positive impactโ
Having clean Sitemaps with all valid, indexable and unique pages help Search Engines like Google to crawl, index and update all of the important pages of your website more efficiently.
How to fetch the data for this report templateโ
You will need to run a crawl for report template to generate report. When report has been generated and you have crawl id you can fetch data for the report using the following query:
- Query
- Variables
- cURL
query GetReportForCrawl($crawlId: ObjectID!, $reportTemplateCode: String!) {
getCrawl(id: $crawlId) {
reportsByCode(
input: {
reportTypeCodes: Basic
reportTemplateCodes: [$reportTemplateCode]
}
) {
rows {
nodes {
... on CrawlUncrawledUrls {
url
foundAtUrl
foundAtSitemap
rewriteChain
level
restrictedReason
robotsTxtRuleMatch
foundInWebCrawl
foundInGoogleAnalytics
foundInGoogleSearchConsole
foundInBacklinks
foundInList
foundInLogSummary
foundInSitemap
}
}
}
}
}
}
{"crawlId":"TjAwNUNyYXdsNDAwMA","reportTemplateCode":"sitemaps_disallowed_malformed_links"}
curl -X POST -H "Content-Type: application/json" -H "apollographql-client-name: docs-example-client" -H "apollographql-client-version: 1.0.0" -H "x-auth-token: YOUR_API_SESSION_TOKEN" --data '{"query":"query GetReportForCrawl($crawlId: ObjectID!, $reportTemplateCode: String!) { getCrawl(id: $crawlId) { reportsByCode( input: { reportTypeCodes: Basic reportTemplateCodes: [$reportTemplateCode] } ) { rows { nodes { ... on CrawlUncrawledUrls { url foundAtUrl foundAtSitemap rewriteChain level restrictedReason robotsTxtRuleMatch foundInWebCrawl foundInGoogleAnalytics foundInGoogleSearchConsole foundInBacklinks foundInList foundInLogSummary foundInSitemap } } } } } }","variables":{"crawlId":"TjAwNUNyYXdsNDAwMA","reportTemplateCode":"sitemaps_disallowed_malformed_links"}}' https://api.lumar.io/graphql