This is a guest post by Dale Waldt, an Information Architect consultant with 30 years experience in XML application design.

Markup languages have become more ubiquitous and are understood by more people than ever. They can be designed to express formatting, structure or semantics of the information being marked up. The web has brought the power of the angle brackets to folks not usually associated with developing content applications. This is good. A little intelligent encoding goes a long way to more accurately communicate information.

HTML is the most common structured format. While it is widely understood and used, it really describes the presentation format of the content to enable rendering in a browser for human consumption. The structure of HTML is loosely enforced. Tables and lists need to have the correct structure, but heading levels can occur in any order. You don’t need a <h1> before you use a <h2> for instance. This flexibility might be part of the reason HTML is so wildly successful. It is also, in my opinion, why it not optimal for authoring where information needs to follow even a few simple rules.

In more demanding environments we create much stricter, more complicated, semantically oriented markup that allows us to create powerful, scalable content applications. At one end of the spectrum are WIKI encoding schemes that provide only a few elements and very simple compact markup, but do try to enforce structure and hierarchy. Below is a sample of MarkDown, one of several WIKI coding schemes.

Sample Markdown

On the left side of the Oxygen editor being used is the editing frame where I typed the content and simple MarkDown markup. On the right is the rendered version that Oxygen displays as you type.
As you can see, the markup in use is very simple and compact. It is a handy way to capture small amounts of information and allow it to be read in a legible way across the Web. You don’t need to be an expert to tag up documents this way. But is it a good format to create and manage content?

Well, it depends.

MarkDown is a simplified way to express what are basically select HTML constructs that are rendered in a browser and a few rules for hierarchy, etc. Included are simple coding for paragraphs, lists, tables, highlighting, cross-reference links, and images. These elements are useful for many types of documents, but they are not very descriptive. MarkDown describes mostly format, but does insist on headings being in the right sequence.

MarkDown is Not Smart Enough to Enforce Business Rules

What MarkDown cannot do is provide semantic information about your content elements. Semantic markup tells us what the information is, not just what it looks like. Semantic markup Improves the ability to label and validate things and enforce powerful business rules such as :

  • All training documents must have a <objectivesStatement>, a <lesson> and a <test>
  • All invoices must have a <dueDate> and a <totalAmount>
  • A software manual must markup all <commands> and <menuItems>

If a software manual is tagged in MarkDown, the inline commands and menu items could be marked up as bold or italic, but it would be hard to tell them apart based only on the format-oriented markup. You could interpret the actual text of the bold or italic element, but interrogating content to infer meaning is far less accurate than tagging with explicit semantic markup.

MarkDown Makes it Difficult to Reuse Content

We often find we need to rearrange our content for multiple uses. For instance, key elements may be extracted from product documentation and be reorganized as a catalog of products. Summaries of news articles might be gathered to create a “Highlights” or “Summary” document.

Content tagged in MarkDown is more difficult to parse through to find and select specific elements for this type of reuse of content. Without richer markup, publications would need more manual editing for repurposing content.

MarkDown is Limited

Content tagged in MarkDown may not look the same in different browsers and renderers. It also has limits on formatting design capabilities. You only have so many options for formatting your content. Default rendering is sometimes a little “klunky” looking. A user can spend a lot of time trying hard to adjust the appearance of the content only to hit the ceiling on formatting capabilities. And, the content may look different in a different browser anyway. This reminds me of how some people can spend way too much formatting a Word Document for one use when stylesheets may provide more consistency and adherence to design standards.

MarkDown is Easy and Good for Simple Uses But…

Don’t get me wrong, I don’t dislike MarkDown. I use it often for some tasks, like documenting concepts in a WIKI in a collaborative development environment like GitHub. It is a quick and easy way to make a few “help pages” to share with my teammates. It comes with the development tools, so, is essentially free. However, business requirements can quickly outstrip its capabilities and convenience.

I would not use MarkDown as the basis for valuable corporate documentation or marketing web content. These require more sophisticated markup and processing capabilities. Even if the content structures are fairly simple, like newspaper articles, if they need to be classified for reuse, transformed to multiple output formats, or otherwise reorganized, MarkDown will not provide the necessary detailed markup.

The diagram below shows where I think it might make sense to consider using MarkDown for some documents. Also, as content and functional complexity of content, as well as the volume increases, you should consider using more powerful structural XML markup. Finally, for robust content ecosystems, only rich, semantic XML will provide the capabilities and automation that is needed.

Three rings with wiki markup, structured XML, and Semantic XML overlapping

Free 30 Day Trial