Whenever a page or an entire website is published, there is the problem of producing valid XHTML code. Let’s face it: Open Text Websolutions is not (and RedDot CMS was not) the most eager CMS in the world to achieve this goal.
Some open source solutions (e.g. WordPress
) do a better job.
So what can we do? Since nearly everyone is able to easily create a standards compliant webpage, Open Text and us partners have to cope with a big challenge in explaining why an expensive system is not as good as a free one.
So there are two goals for us:
Are you ready? Let’s go…
First of all, let’s have look at where do the problems come from:
Assuming that we developers are able to program tidy and standards compliant templates, there is only one source causing all of our problems: The text editor!
Uppercase tags, mso-styles and incorrect empty tags like <BR> instead of <br /> spread our code that was once so beautiful.
So the most simple way to avoid all of that would be to keep the editors away from anything where they could produce one single tag without our control. This ends up in denying them to use the text editor, at least the “not in ASCII mode” one.
But I have never seen a project like this…
Another good way to publish standard compliant web pages is to use tidy. Simply activate it in your project variant settings and all problems are gone!
Well, almost all:
At this point, I did not investigate any further.
Version 9 introduced Telerik RadEditor as a new approach. This seems to be a good and well integrated tool that produces really cool tidy code. Good for brand new projects. Changing a running project can be a bit tricky, but the main problem is that you would have to open and close every single text element to see the effect, however.
And there are some issues around, which have been discussed yet in other posts here.
Have a look at the /cms/asp folder of your server. There you should find a file named HtmlConvertTable.txt.
It’s a simple plain text document containing tabulator separated strings. Matches left will be replaced by the text right. It has been used in ancient times (before unicode) to convert these strange characters like our german umlauts to their corresponding HTML entity (e.g. ä became ä).
The great advantage of this solution is that it only changes the content of elements, but not one single character of the template code!
In your project variant settings, you can choose between three options (section Conversion of RedDot content):
So let’s create a conversion table for XHTML code, save it to the /cms/asp folder of your server, name it e.g. HtmlConvertTableXhtml.txt and enter this file name into the text entry field.
Into this file, write down all uppercase tags to the left and the corresponding lowercase tags to the right, separated by a tabulator.
Here’s a little example (when you copy it ensure that you get the tabulators right):
<BR> <br /> <IMG <img <P <p <P> <p> </P> </p> <A <a </A> </a> <STRONG> <strong> </STRONG> </strong> <EM> <em> </EM> </em>
Very simple, but powerful. What do we see in this example?
You can list the ampersand, too, of course, if you do it as follows:
& &
Both the left and the right one must have a leading and a trailing space to ensure that only the ampersands standing alone will be converted and not those which are already part of an entity.
Last but not least you can convert deprecated tags into a standard compliant way:
<NOBR> <span class="nowrap"> </NOBR> </span>
The behaviour of the “nowrap” class is then defined using CSS.
In my experience so far, this solution delivers the best results for projects using the built-in RedDot text editor (I’ve tried it with version 7.5), although it’s not possible to convert all tags. For example, you cannot convert empty tags that must have attributes (e.g. img) into their XHTML variant.
That’s it. Now I’d like to hear about your experience with this.
Nice read, and I agree it took some time (until Web Solutions v9) for OpenText to come up with a Out-Of-The-Box Solution for proper code.
For ealier project versions I can recommend the FCKEditor that gets shipped with RedDot CMS.
Before version 9 it was really a pain to get code properly and without a RedDot Partner who knew what they were doing you were often lost in dodgy HTML output..
It’s always been a source of embarrassment for me that our website can’t be built from valid markup. 2009 and there’s still software producing invalid markup? That’s a joke.
Using tidy is one option, apart from all the drawbacks you highlight; it’s also a giant hack to try to get around a buggy text editor which should be fixed in the first place.
One of the big problems with RedDot (and, to be fair, some other CMS’s) is that it treats everyone the same: from the novice who’s never written for the web or marked up some content in their life, right through to the professional who understands exactly what makes a document valid (and why it should be), there’s no distinction. If ONLY there were a ‘turn off all bugs RedDot will introduce in my markup’ option, our site would be so much cleaner. Of course, there IS the ASCII text editor, and it’s always a joy to use that to markup content correctly, but as you rightly point out, it’s not for everyone.
Wordpress is hardly a saint in this area, either – it fails to show paragraph tags correctly in HTML view, and has a tendency to include all that Microsoft crap that gets added when copy+pasting from word, outlook, etc.
Oh, BTW, is not an “incorrect empty tag”; it’s perfectly valid in HTML4 although I prefer to use the much more palatable .
- Bobby
P.S. The form has swallowed my HTML; the last sentence should read:
Oh, BTW, <BR> is not an “incorrect empty tag”; it’s perfectly valid in HTML4 although I prefer to use the much more palatable <br>.
I do fully agree. We get around this by adding two separate content classes with exactly the same code. The only difference are the formatting options for the text editor element in the “advanced” ck. This has a ck authorization packet and a preassigned edit auth. pack. which allows its use only for editors that are in the “advanced users group”.
BTW: <BR< surely is correct HTML4, but I wrote the article from a XHTML point of view, where it is not correct (<br> isn’t either).
Ah – I obviously missed that reference to “XHTML” in the first line!
But it’s worth pointing out that creating valid, and semantically sound HTML using RedDot is fraught with the same sorts of problem.
Thanks for the dual content class tip – if content classes were modular, I’d probably dive headfirst into that, but we’ve seen too many problems arising from having several versions of content class for essentially the same thing – very difficult to keep them all in sync.
- Bobby
Thanks for this, Stefan. I have struggled with Tidy and have been disappointed with the implementation of the Telerik RADEditor, with no support for rtl language variants. Your article clearly explained a solution which I have now set up.
Hi Stefan
We have discovered that the HtmlConvertTableXhtml.txt does not convert quote marks correctly. This has caused errors on some ASP.NET contact forms.
I have tried inserting the following in the HtmlConvertTableXhtml.txt but without success:
<> »:
Oh well…I hope version 9 sp1 has resolved the issue the Telerik RADEditor had with right to left language variants. I might be able to use this for xhtml compliant text!
Regards
Ian
Hi all,
I think its worth mentioning that after updating the HtmlConvertTable file, the file to be used needs to be specified in the Project Variant settings as described in this Google Groups post:
http://www.mail-archive.com/reddot-cms-users@googlegroups.com/msg01054.html