Disallowed/Malformed URLs in Sitemaps
URLs which were found in sitemaps, but could not be crawled because they were disallowed, or malformed.
Priority: Medium
Impact: Negative
What issues it may cause
Disallowed and malformed URLs cannot be crawled by search engines and should not be included within sitemaps.
How do you fix it
Review the robots.txt rules to ensure the URLs have been disallowed correctly, or remove all of the disallowed URLs from sitemaps.
Review the malformed URLs and either remove them from the Sitemap or update the URLs to be valid URLs.
What is the positive impact
Having clean Sitemaps with all valid, indexable and unique pages help Search Engines like Google to crawl, index and update all of the important pages of your website more efficiently.
How to fetch the data for this report template
You will need to run a crawl for report template to generate report. When report has been generated and you have crawl id you can fetch data for the report using the following query:
- Query
- Variables
- cURL
query GetReportStatForCrawl(
$crawlId: ObjectID!
$reportTemplateCode: String!
$after: String
) {
getReportStat(
input: {crawlId: $crawlId, reportTemplateCode: $reportTemplateCode}
) {
crawlUncrawledUrls(after: $after, reportType: Basic) {
nodes {
url
foundAtUrl
foundAtSitemap
rewriteChain
level
restrictedReason
robotsTxtRuleMatch
foundInWebCrawl
foundInGoogleAnalytics
foundInGoogleSearchConsole
foundInBacklinks
foundInList
foundInLogSummary
foundInSitemap
}
totalCount
pageInfo {
endCursor
hasNextPage
}
}
}
}
{"crawlId":"TjAwNUNyYXdsNDAwMA","reportTemplateCode":"sitemaps_disallowed_malformed_links"}
curl -X POST -H "Content-Type: application/json" -H "apollographql-client-name: docs-example-client" -H "apollographql-client-version: 1.0.0" -H "x-auth-token: YOUR_API_SESSION_TOKEN" --data '{"query":"query GetReportStatForCrawl( $crawlId: ObjectID! $reportTemplateCode: String! $after: String ) { getReportStat( input: {crawlId: $crawlId, reportTemplateCode: $reportTemplateCode} ) { crawlUncrawledUrls(after: $after, reportType: Basic) { nodes { url foundAtUrl foundAtSitemap rewriteChain level restrictedReason robotsTxtRuleMatch foundInWebCrawl foundInGoogleAnalytics foundInGoogleSearchConsole foundInBacklinks foundInList foundInLogSummary foundInSitemap } totalCount pageInfo { endCursor hasNextPage } } } }","variables":{"crawlId":"TjAwNUNyYXdsNDAwMA","reportTemplateCode":"sitemaps_disallowed_malformed_links"}}' https://api.lumar.io/graphql