# Duplicate Page Sets

Sets of indexable pages that share an identical title, description and near identical content with other pages found in the same crawl.

The primary page with the highest DeepRank from each set of duplicated pages is shown to represent each duplicate set. Two examples of duplicate pages within each of duplicate sets are included with the primary page.

Canonicalized pages, noindexed pages and pages with reciprocated hreflangs will not be reported as duplicates.

**Priority**: Critical

**Impact**: Negative

## What issues it may cause

[Although search engines will attempt to automatically identify duplicate pages and roll them together](https://www.youtube.com/watch?v=oCNi7dTircw#t=44m11s), this may not be completely effective on very large websites or those with a high churn of URLs.

Duplicate pages can result in the dilution of authority signals, which can affect the ranking performance and reduce the crawl efficiency of the site wasting crawl budget.

## How do you fix it

It's likely that the primary duplicate shown is the main version which should be kept. You can review the amount of search traffic for each page in the duplicate set to identify if one has been preferred by search engines. All the remaining duplicates should be eliminated by either;

<ul>
  <li>removing internal links to the URLs</li>
  <li>redirecting all duplicate URLs to the primary URL</li>
  <li>adding canonical tags which point to the primary duplicate</li>
</ul>

## What is the positive impact

<ol>
  <li>
    Reducing the amount of duplicate pages can avoid the dilution of PageRank helping the remaining pages to rank
    better, resulting in more traffic and conversions.
  </li>
  <li>
    Canonicalised or redirected pages will be crawled less often, improving crawl efficiency and saving on server costs.
  </li>
</ol>

## How to fetch the data for this report template

You will need to run a crawl for report template to generate report. When report has been generated and you have
crawl id you can fetch data for the report using the following query:


```graphql
query GetReportStatForCrawl(
      $crawlId: ObjectID!
      $reportTemplateCode: String!
      $after: String
     ) {
        getReportStat(
          input: {crawlId: $crawlId, reportTemplateCode: $reportTemplateCode}
        ) {
          crawlDuplicateUrls(after: $after, reportType: Basic) {
            nodes {
              pageTitle
              description
              primaryUrl
              exampleDuplicate1
              exampleDuplicate2
              duplicateCount
              deeprank
              level
              duplicateType
            }
            totalCount
            pageInfo {
              endCursor
              hasNextPage
            }
          }
       }
     }
```

**Variables:**
```json
{"crawlId":"TjAwNUNyYXdsNDAwMA","reportTemplateCode":"duplicate_pages_2"}
```