Duplicate Pages

Pages that share an identical title, description and near identical content with other pages found in the same crawl, excluding the primary page from each duplicated set. The primary page from each duplicated set of pages is based on the highest DeepRank.

Priority: Critical

Impact: Negative

What issues it may cause

Although search engines will attempt to automatically identify duplicate pages and roll them together, this may not be completely effective on very large websites or those with a high churn of URLs.

Duplicate pages can result in the dilution of authority signals, which can affect the ranking performance and reduce the crawl efficiency of the site wasting crawl budget.

How do you fix it

It's likely that the primary duplicate shown is the main version which should be kept. You can review the amount of search traffic for each page in the duplicate set to identify if one has been preferred by search engines. All the remaining duplicates should be eliminated by either;

removing internal links to the URLs
redirecting all duplicate URLs to the primary URL
adding canonical tags which point to the primary duplicate

What is the positive impact

Reducing the amount of duplicate pages can avoid the dilution of PageRank helping the remaining pages to rank better, resulting in more traffic and conversions.
Canonicalised or redirected pages will be crawled less often, improving crawl efficiency and saving on server costs.

How to fetch the data for this report template

You will need to run a crawl for report template to generate report. When report has been generated and you have crawl id you can fetch data for the report using the following query:

Operation: query GetReportStatForCrawl( $crawlId: ObjectID! $reportTemplateCode: String! $after: String ) { getReportStat( input: {crawlId: $crawlId, reportTemplateCode: $reportTemplateCode} ) { crawlUrls(after: $after, reportType: Basic) { nodes { pageTitle url primaryUrl description foundAtUrl duplicatePageCount deeprank level duplicatePage duplicateTitle duplicateDescription duplicateBody paginatedPage foundInGoogleAnalytics foundInGoogleSearchConsole foundInBacklinks foundInList foundInLogSummary foundInWebCrawl foundInSitemap } totalCount pageInfo { endCursor hasNextPage } } } }Variables: {"crawlId":"TjAwNUNyYXdsNDAwMA","reportTemplateCode":"duplicate_pages"}

GetReportStatForCrawlTry in Explorer

GraphQL

query GetReportStatForCrawl(
    $crawlId: ObjectID!
    $reportTemplateCode: String!
    $after: String
   ) {
      getReportStat(
        input: {crawlId: $crawlId, reportTemplateCode: $reportTemplateCode}
      ) {
        crawlUrls(after: $after, reportType: Basic) {
          nodes {
            pageTitle
            url
            primaryUrl
            description
            foundAtUrl
            duplicatePageCount
            deeprank
            level
            duplicatePage
            duplicateTitle
            duplicateDescription
            duplicateBody
            paginatedPage
            foundInGoogleAnalytics
            foundInGoogleSearchConsole
            foundInBacklinks
            foundInList
            foundInLogSummary
            foundInWebCrawl
            foundInSitemap
          }
          totalCount
          pageInfo {
            endCursor
            hasNextPage
          }
        }
     }
   }

What issues it may cause​

How do you fix it​

What is the positive impact​

How to fetch the data for this report template​

What issues it may cause

How do you fix it

What is the positive impact

How to fetch the data for this report template