// you’re reading...

Best practice

Duplicate content publishing – SEO and Open Text Web Solutions


When pages are connected to multiple links allover the project – for example by using keywords – you will end up with one page published multiple times in several folder. This leads to duplicate content and search engines like Google, Yahoo, Bing, … basically any engine crawling your page will assume that you are trying to get a higher ranking by duplicating your content and keyword density. This can unfortunately lead to a full exclusion from Google’s search index and none of your pages within your domain would be found. Disturbing, isn’t it?

Why does it happen?

So why does the publishing of multiple pages happen?
Let’s have a look at the architecture of link elements within the Web Solutions Management Server RedDot CMS:
The PageBuilder is the core piece which follows the link structure of your Content Management Project and verifies every link and page within your project. Beside that the PageBuiler has been rewritten recently in .NET and shows now a performance than before when configured properly.

Different link types

There are two or furthermore three types of link elements:

  1. References
    they just reference/point to the origin place where a page “lives” within the project and
  2. Connected pages
    which can be expanded in SmartTree and are ‘truly connected’ to the link element
  3. Mainlink
    One exception is the Mainlink, this is the place where the page usually has been created at first and it defines the place where the page really lives in the CMS project.

What happens during publishing?

During the publishing process within WSMS RedDot CMS referenced links are not followed by the PageBuilder and hence don’t produce a published page follwing this link.
Connected pages to links are recognized by the PageBuilder which picks up the publication package assigned to the link and according to the settings in this package the page gets published. If a page is connected multiple times in a project, not just referenced, this structure creates multiple pages with the same content.

How do we solve multiple publishing?

We will convince the CMS PageBuilder that it is looking at a reference instead of a connected page link. The code for this is easy:

<%=Replace("<%list_teaser%>","islink=2","islink=10")%>

The only limitation here is:

  • The link element needs to be set to ‘insert path- and filename only’.
  • The code block runs in a prexecute so that the code is executed and returned as html when published

What the code does is that it replaces the link type

  • 2 = link, follow and publish the site based on the publication package attached to this link element
    to
  • 10 = reference, don’t follow this page and don’t publish it, just use the MainLink the page is connected to

The (almost) ready to go template code

  <ul id="following-pages">
    <!IoRangeList>
    <li>
      <h3><a href="<%!! Context:Pages.GetPage(Guid:<%info_PageGuid%>).GetUrl() !!%>"><%hdl_pagetitle%></a></h3>
      <!IoRangeNoRedDotMode><!IoRangeRedDotMode><!--
                'treat the link as reference not as link to avoid duplicate publishing
                <%=Replace("<%list_teaser%>","islink=2","islink=10")%>
      --><!/IoRangeRedDotMode><!/IoRangeNoRedDotMode>
      <p><%stf_metadescription%></p>
    </li>
    <!/IoRangeList>
  </ul>

Attention: Don’t forget to wrap the code above in a pre execute

EXPLANATION: So what does this all do?

<!IoRangeNoRedDotMode><!IoRangeRedDotMode><!--
  'treat the link as reference not as link to avoid duplicate publishing
  <%=Replace("<%list_teaser%>","islink=2","islink=10")%>
--><!/IoRangeRedDotMode><!/IoRangeNoRedDotMode>

This code hides the link from the HTML structure. Although it is hidden now from the HTML it doesn’t stop the PageBuilder from following the link and publishing the page, so we have to replace the two link types as discussed above.

Render Tag Code? What do we do here with the Render Tag?

<%!! Context:Pages.GetPage(Guid:<%info_PageGuid%>).GetUrl() !!%>

This nice Render Tag sets the link URL to the MainLink of the page. If this is used everywhere all links will point to the same place and make sure that you won’t have duplicate content or unwanted sites published in the wrong place.

Other ways?

I am sure there are of course other ways to solve this issue and I am keen to read them below!

Share and Enjoy:
  • Print
  • email
  • Twitter
  • Digg
  • Reddit
  • StumbleUpon
  • Google Bookmarks
  • del.icio.us
  • MisterWong
  • Facebook
  • LinkedIn

No related posts.

About the author:

Markus Giesen Markus Giesen is a Solutions Architect and RedDot CMS Consultant, formerly based in Germany. Travelling around the world to find and offer solutions for a better world (in a very web based meaning). He just found a way to do this as part of a Melbourne based online consultant house. On this blog Markus shares his personal (not his employers) thoughts and opinions on CMS and web development. In his spare time you will find him reading, snowboarding or travelling. Also, you should follow him on Twitter!

Discussion

24 comments for “Duplicate content publishing – SEO and Open Text Web Solutions”

  1. Interesting stuff, Markus – I’m probably going to have to read it one or two times over, just to get the gist! The page duplication issue is one we’ve been struggling with for a while now, and my preferred option has been to try to reduce the number of connections, in favour of referencing. Ideally, I’d like every page to be connected in just a single place: its main link. This should completely eliminate the problem, and has the added benefit of simplifying the SmartTree structure quite a bit.
    Regards,
    - Bobby

    P.S. I tried to add this comment to your blog directly, but got an
    error saying your spam filter has been configured to reject all
    comments from behind proxies. Shame!

    Posted by Bobbyjack | October 28, 2009, 1:54 am
  2. Shame indeed! I changed the SPAM settings in our plugin here and this will hopefully solve this issue! I also allowed myself to copy your comment over here if you don’t mind.

    Posted by Markus Giesen | October 28, 2009, 1:58 am
  3. Beside this I believe it takes a fair bit of work to replace all the links in a project and change the settings where required. I found this useful where we use teaser boxes or sub content in the right hand side which is used multiple times in several sections of a website.

    Posted by Markus Giesen | October 28, 2009, 2:03 am
  4. Thanks so much for this article. I ended up using the same concept to link our news releases when they are pulled in by keywords. Now we can pull in news releases from more than one location without forcing them to all publish to the same location as duplicates.

    Posted by Nick Galotti | October 30, 2009, 3:30 pm
  5. No worries, I am glad you like it. Credit for this piece of code goes to Tiffany France from the Virginia Commonwealth University (http://www.ts.vcu.edu). She remembered me of this option and I am happy to be able to share it here.

    If your code should fall over and you have to debug it you should go check out this post on how to debug ASP in the Open Text CMS properly:
    http://www.reddotcmsblog.com/debugging-pre-executed-classic-asp

    Posted by Markus Giesen | October 30, 2009, 3:35 pm
  6. Awesome post, unforunately, such workaround stopped working after the latest patch in 7.5, doesn’t work in any version of 9 or 10.

    The rendertag .GetUrl() returns Url of type connected.

    Posted by Julio Leger | November 4, 2009, 8:58 pm
  7. Hey Julio, can’t confirm this. I’ve done the above in a recent v9 project, more specifically:
    Management Server 9 Build 9.0.1.29
    I’ve got several pages connected below different main navigation pages in different levels. Those are connected to a teaserlist on the homepage with the described method above and are all pointing to the right place/URL.

    What makes you think this doesn’t work anymore?

    Posted by Markus Giesen | November 4, 2009, 11:39 pm
  8. I’ve previously been doing this in a less than ideal way with RQL so I will happily try your approach instead. thx for sharing :)

    Posted by RustyLogic | November 9, 2009, 5:49 pm
  9. No worries, I believe you were one of the guys who started sharing big time with your .NET wrapper :D

    Posted by Markus Giesen | November 10, 2009, 12:45 am
  10. Does anyone know if this is true of v9 SP1 or v10 of the CMS? I’d be interested as we have potentially hundreds of sites that will use this system.

    Posted by Joel | February 1, 2010, 8:55 pm
  11. Hey Joel, what do you need to know? I can assure you that this method works in v9 SP1 and you can easily give it a try yourself on your v10 environment. Let me know if you need anything else.

    Posted by Markus Giesen | February 1, 2010, 11:09 pm
  12. Thanks for the article! Unfortunately, the pages are not displayed in the target container.
    Do they still have an idea what we can do.

    Posted by Thomas Forkert | February 23, 2010, 6:17 pm
  13. Hey Thomas, good point. I am sorry to say that this method only returns the MainLink to a page and doesn’t work with a target container solution. On that note I suggest not using target container because they are (in my opinion) deprecated and only have limited use based on this and other similar limitations. I know that this doesn’t help you but target container in my opinion do more harm than good most of the time.

    Posted by Markus Giesen | February 24, 2010, 12:18 am
  14. Thanks for your answer, unfortunately, we now have the “container variant used” very intense. Have you any idea how to solve the problem of the double content.
    or how to build complex pages without container.

    Posted by Thomas Forkert | February 24, 2010, 3:34 pm
  15. dear markus,
    we have solve the ‘target-container-problem’ with your great replace(…”islink=2″,”islink=10″)-solution, this works fine:
    ———————————————————————-
    <%
    strRefLink = Replace("”, “islink=2″, “islink=10″)
    %>
    <a href="”>…</a>
    ———————————————————————-
    i’m a little bit confused about your … construct. for my understanding, this will never published (or pre-executed). have you seen this code-snippet in your preexcute-tmp-files (when you switch the flag-section in the rdserver.ini +256)? as a reference for the lists to pull some content from the following site will <!– –> also works.
    best regards
    tobias

    Posted by Tobias Schmidt | February 24, 2010, 9:11 pm
  16. At this stage not really, I would reconsider the target container approach and build the site based on Navigation Manager and simple container constructs. We never had a need for target container usage in our projects so far. Guess that is material for another article.

    Posted by Markus Giesen | February 24, 2010, 10:59 pm
  17. We are using a modified version of this on our site using target containers and it works fine. What we did was not use the render tag and instead use the ASP code that does the replace as the link (currently commented out in the sample code). This turned out to work perfectly and not publish any extra pages.

    Posted by Nick Galotti | February 24, 2010, 11:05 pm
  18. @Tobias, good point. The example above solves the same problem in two different ways. The list is required to get the elements but the ASP code is unnecessary if the render tag approach works for you.
    Either way thanks to you Thomas and to you Nick for clarifying that with using the commented code above might be used as well to solve the issue when using a target container.

    Posted by Markus Giesen | February 25, 2010, 3:10 am
  19. Thanks that was a very enlightening discussion.

    Posted by Thomas Forkert | February 25, 2010, 11:18 am
  20. Question for you Mark…

    Do you know the different values, or where to find the different values of “isLink” and what they refer to? We are using this code but ran into a situation where if there is a URL in the list, it doesn’t work properly. I am thinking this is regarding how the Page Builder handles the URL in the list, versus a page. Any ideas?

    Posted by Joel Kinzel | May 12, 2010, 5:08 pm
  21. Hey Joel, I know of isLink=2 which is a simple page connect, then there is isLink=10 which is a reference.
    During the publishing process the PageBuilder only follows the ‘truly’ connected pages not any references (unless you set the global publishing settings to follow references too!)

    For your case, maybe you can write an exception and test the URL before you process it. If it’s an internal CMS link it can be processed otherwise it will be skipped.

    Posted by Markus Giesen | May 17, 2010, 7:21 am
  22. I’m trying to use this approach for a project publishing into DeliveryServer, (I’m on CMS 9.0.1.5) but can’t get it to work.

    Can you confirm that it works if you actually publish the page that is connected to the list, (and not just if you publish your entire site on a crawl). I have my list wrapped in the appropriate code, pre-executed, etc, but I still get the child page published into two places when I publish it. I know it’s doing the replace because I kicked out the URL, and I can see that on the replaced URL, it does target the Main link publication folder, not the folder for the package on the list.

    Posted by Wayne | September 2, 2010, 8:01 pm
  23. Just tested the render tag approach again on Management Server 10.1 Build 10.1.1.334. Still works fine.

    Posted by Markus Giesen | November 8, 2011, 9:46 am
  24. Just as a follow up to my initial question: The URL is a special case and thus it won’t work using this. Additionally when you pre-assign content classes is breaks the Add URL function completely (it will give a “content class of this type isn’t allowed” warning).

    Also, as another side note, and I know there is some debate in the community about this, but apparently this same effect can be achieved by using info elements (page URL). In our particular set up though, we have not been able to get this working properly.

    Posted by Joel Kinzel | November 8, 2011, 3:49 pm

Post a comment



Stay up to date! - Get notified about followup comments

If you don't feel the urge to comment but wish to stay in the loop:
Just enter your email and subscribe to new comments.

Subscribe without commenting

Recent Tweets

  • RT @AirKraft: Transport Canada breakout: they manage 80K pages and 300K assets with WSM(RedDot). Wow! #OTCW 2010-11-11
  • The RedDot usergroup session 'Future of WCM' is in National Harbor 7, now. See you there! #otcw 2010-11-11
  • RT @yttergren: @AirKraft: Calling all WSM(RedDot) devs: share your solutions on http://bit.ly/bgPIof EVERY solution can win an iPad #OTCW 2010-11-10
  • Come to the Solution Exchange session. Enhance your (#reddot) CMS project! Chesapeake 12, 3:20pm #otcw Looking forward to see you there! 2010-11-10
  • More updates...