Skip to main content

Duplicate Pages in Sitemaps

Priority: Low

Impact: Negative

What issues it may cause

Duplicate pages can result in the dilution of authority signals which can negatively impact the rankings of all the pages within the duplicate set.

A link in a Sitemap is a signal to search engines about which pages should be indexed and may result in the incorrect version being indexed. Although search engines will attempt to automatically identify duplicate pages and roll them together, this may not be completely effective, particularly on very large websites or pages with a short lifespan. Search engines may choose a duplicate version of the page which is not considered the primary version.

The duplicate pages will be crawled by search engines, wasting crawl budget, incurring additional server costs and reducing the crawl efficiency of the site.

How do you fix it

The duplicate pages should be reviewed and removed from the Sitemaps.

What is the positive impact

  1. Reducing the amount of duplicate pages can avoid the dilution of PageRank helping the remaining pages to rank better, resulting in more traffic and conversions.
  2. Canonicalised or redirected pages will be crawled less often, improving crawl efficiency and saving on server costs.

How to fetch the data for this report template

You will need to run a crawl for report template to generate report. When report has been generated and you have crawl id you can fetch data for the report using the following query:

query GetReportForCrawl($crawlId: ObjectID!, $reportTemplateCode: String!) {
getCrawl(id: $crawlId) {
reportsByCode(
input: {
reportTypeCodes: Basic
reportTemplateCodes: [$reportTemplateCode]
}
) {
rows {
nodes {
... on CrawlUrls {
pageTitle
url
foundAtUrl
foundAtSitemap
deeprank
level
sitemapsInCount
httpStatusCode
indexable
duplicatePage
foundInGoogleAnalytics
foundInGoogleSearchConsole
foundInBacklinks
foundInList
foundInLogSummary
foundInWebCrawl
foundInSitemap
}
}
}
}
}
}

Try in explorer