// you’re reading...

Best practice

How to get a tidy website


Whenever a page or an entire website is published, there is the problem of producing valid XHTML code. Let’s face it: Open Text Websolutions is not (and RedDot CMS was not) the most eager CMS in the world to achieve this goal.
Some open source solutions (e.g. WordPress ;-) ) do a better job.

So what can we do? Since nearly everyone is able to easily create a standards compliant webpage, Open Text and us partners have to cope with a big challenge in explaining why an expensive system is not as good as a free one.

So there are two goals for us:

  • Give the users the freedom of formatting they expect
  • Show your fellow keep-to-the-code developers that you don’t consider the standards just for more or less guidelines.

Are you ready? Let’s go…

Just ASCII for the editors?

First of all, let’s have look at where do the problems come from:
Assuming that we developers are able to program tidy and standards compliant templates, there is only one source causing all of our problems: The text editor!

Uppercase tags, mso-styles and incorrect empty tags like <BR> instead of <br /> spread our code that was once so beautiful.

So the most simple way to avoid all of that would be to keep the editors away from anything where they could produce one single tag without our control. This ends up in denying them to use the text editor, at least the “not in ASCII mode” one.
But I have never seen a project like this…

Use tidy

Another good way to publish standard compliant web pages is to use tidy. Simply activate it in your project variant settings and all problems are gone!

Well, almost all:

  • As tidy will not only touch code from the text editor but the whole page, you will sooner or later run into the situation that it also changes some of your template code while publishing.
  • If you publish just static HTML, tidy may help you a lot.
  • If you publish server side scripts, too, it might interfer a lot and make very harmful changes!
  • Tidy checks and corrects the published code which is transferred to the web server. But this code does not necessarily need to be tidy yet – just the code the web server finally send to the client has to be!
  • Never use it in a LiveServer (Delivery Server) project! It will strip out all of your nice Dynaments as it does not recognize them (when in XHTML mode – I did not try out one of the other modes). If you start to declare these tags (you can of course configure tidy), it will keep them in the BODY, but not in the HEAD.

At this point, I did not investigate any further.

Use another text editor

Version 9 introduced Telerik RadEditor as a new approach. This seems to be a good and well integrated tool that produces really cool tidy code. Good for brand new projects. Changing a running project can be a bit tricky, but the main problem is that you would have to open and close every single text element to see the effect, however.

And there are some issues around, which have been discussed yet in other posts here.

Use a conversion table

Have a look at the /cms/asp folder of your server. There you should find a file named HtmlConvertTable.txt.

It’s a simple plain text document containing tabulator separated strings. Matches left will be replaced by the text right. It has been used in ancient times (before unicode) to convert these strange characters like our german umlauts to their corresponding HTML entity (e.g. ä became ä).

The great advantage of this solution is that it only changes the content of elements, but not one single character of the template code!

In your project variant settings, you can choose between three options (section Conversion of RedDot content):

  • Do not convert characters (I think this means in fact: use the standard file HtmlConvertTable.txt if the element is not set to "Do not convert characters to HTML")
  • Convert characters to XML (changes just these five characters: &, ", <, > and ‘ – but all and everywhere)
  • Convert characters to the following file format (followed by a text entry field)

So let’s create a conversion table for XHTML code, save it to the /cms/asp folder of your server, name it e.g. HtmlConvertTableXhtml.txt and enter this file name into the text entry field.

Into this file, write down all uppercase tags to the left and the corresponding lowercase tags to the right, separated by a tabulator.

Here’s a little example (when you copy it ensure that you get the tabulators right):

<BR>	<br />
<IMG 	<img
<P 	<p
<P>	<p>
</P>	</p>
<A 	<a
</A>	</a>
<STRONG>	<strong>
</STRONG>	</strong>
<EM>	<em>
</EM>	</em>

Very simple, but powerful. What do we see in this example?

  • You must list both the start tags and the end tags.
  • Some start tags appear twice, because they exist both with and without attributes.
    Attention: You should then note the version with attributes with a following space to avoid unwanted conversion of other tags starting with the same character or string.
  • You can also list any attribute to be converted.
    Attention: Don’t list the href attribute, because this confuses the pagebuilder – internal links will no longer be published!

You can list the ampersand, too, of course, if you do it as follows:

 & 	 &

Both the left and the right one must have a leading and a trailing space to ensure that only the ampersands standing alone will be converted and not those which are already part of an entity.

Last but not least you can convert deprecated tags into a standard compliant way:

<NOBR>	<span class="nowrap">
</NOBR>	</span>

The behaviour of the “nowrap” class is then defined using CSS.

In my experience so far, this solution delivers the best results for projects using the built-in RedDot text editor (I’ve tried it with version 7.5), although it’s not possible to convert all tags. For example, you cannot convert empty tags that must have attributes (e.g. img) into their XHTML variant.

That’s it. Now I’d like to hear about your experience with this.

Share and Enjoy:
  • Print
  • email
  • Twitter
  • Digg
  • Reddit
  • StumbleUpon
  • Google Bookmarks
  • del.icio.us
  • MisterWong
  • Facebook
  • LinkedIn

No related posts.

About the author:

Stefan Buchali Stefan Buchali lives in a small village close to the Black Forest in Germany. He works as a CMS and LiveServer developer since 2000 at SF eBusiness GmbH and knows how to cope with all its depths since the time its name was Infooffice. He is winner of the silver RedDot Innovation Award 2008.

Discussion

8 comments for “How to get a tidy website”

  1. Nice read, and I agree it took some time (until Web Solutions v9) for OpenText to come up with a Out-Of-The-Box Solution for proper code.
    For ealier project versions I can recommend the FCKEditor that gets shipped with RedDot CMS.

    Before version 9 it was really a pain to get code properly and without a RedDot Partner who knew what they were doing you were often lost in dodgy HTML output..

    Posted by Markus Giesen | November 10, 2009, 2:13 am
  2. It’s always been a source of embarrassment for me that our website can’t be built from valid markup. 2009 and there’s still software producing invalid markup? That’s a joke.

    Using tidy is one option, apart from all the drawbacks you highlight; it’s also a giant hack to try to get around a buggy text editor which should be fixed in the first place.

    One of the big problems with RedDot (and, to be fair, some other CMS’s) is that it treats everyone the same: from the novice who’s never written for the web or marked up some content in their life, right through to the professional who understands exactly what makes a document valid (and why it should be), there’s no distinction. If ONLY there were a ‘turn off all bugs RedDot will introduce in my markup’ option, our site would be so much cleaner. Of course, there IS the ASCII text editor, and it’s always a joy to use that to markup content correctly, but as you rightly point out, it’s not for everyone.

    Wordpress is hardly a saint in this area, either – it fails to show paragraph tags correctly in HTML view, and has a tendency to include all that Microsoft crap that gets added when copy+pasting from word, outlook, etc.

    Oh, BTW, is not an “incorrect empty tag”; it’s perfectly valid in HTML4 although I prefer to use the much more palatable .

    - Bobby

    Posted by Five Minute Argument | November 12, 2009, 7:47 pm
  3. P.S. The form has swallowed my HTML; the last sentence should read:

    Oh, BTW, <BR> is not an “incorrect empty tag”; it’s perfectly valid in HTML4 although I prefer to use the much more palatable <br>.

    Posted by Five Minute Argument | November 12, 2009, 7:52 pm
  4. I do fully agree. We get around this by adding two separate content classes with exactly the same code. The only difference are the formatting options for the text editor element in the “advanced” ck. This has a ck authorization packet and a preassigned edit auth. pack. which allows its use only for editors that are in the “advanced users group”.
    BTW: <BR< surely is correct HTML4, but I wrote the article from a XHTML point of view, where it is not correct (<br> isn’t either).

    Posted by Stefan Buchali | November 13, 2009, 10:24 am
  5. Ah – I obviously missed that reference to “XHTML” in the first line! ;-) But it’s worth pointing out that creating valid, and semantically sound HTML using RedDot is fraught with the same sorts of problem.

    Thanks for the dual content class tip – if content classes were modular, I’d probably dive headfirst into that, but we’ve seen too many problems arising from having several versions of content class for essentially the same thing – very difficult to keep them all in sync.

    - Bobby

    Posted by Five Minute Argument | November 13, 2009, 11:53 am
  6. Thanks for this, Stefan. I have struggled with Tidy and have been disappointed with the implementation of the Telerik RADEditor, with no support for rtl language variants. Your article clearly explained a solution which I have now set up.

    Posted by Ian | November 18, 2009, 10:59 am
  7. Hi Stefan

    We have discovered that the HtmlConvertTableXhtml.txt does not convert quote marks correctly. This has caused errors on some ASP.NET contact forms.

    I have tried inserting the following in the HtmlConvertTableXhtml.txt but without success:

    <> &raquo:

    Oh well…I hope version 9 sp1 has resolved the issue the Telerik RADEditor had with right to left language variants. I might be able to use this for xhtml compliant text!

    Regards
    Ian

    Posted by Ian | January 5, 2010, 5:28 pm
  8. Hi all,

    I think its worth mentioning that after updating the HtmlConvertTable file, the file to be used needs to be specified in the Project Variant settings as described in this Google Groups post:

    http://www.mail-archive.com/reddot-cms-users@googlegroups.com/msg01054.html

    Posted by Shane Handley | January 8, 2010, 12:00 am

Post a comment



Stay up to date! - Get notified about followup comments

If you don't feel the urge to comment but wish to stay in the loop:
Just enter your email and subscribe to new comments.

Subscribe without commenting

Recent Tweets

  • RT @AirKraft: Transport Canada breakout: they manage 80K pages and 300K assets with WSM(RedDot). Wow! #OTCW 2010-11-11
  • The RedDot usergroup session 'Future of WCM' is in National Harbor 7, now. See you there! #otcw 2010-11-11
  • RT @yttergren: @AirKraft: Calling all WSM(RedDot) devs: share your solutions on http://bit.ly/bgPIof EVERY solution can win an iPad #OTCW 2010-11-10
  • Come to the Solution Exchange session. Enhance your (#reddot) CMS project! Chesapeake 12, 3:20pm #otcw Looking forward to see you there! 2010-11-10
  • More updates...