XHTML5 Tutorial


Overview

  XHTML5 is the XML serialization of HTML5. The syntax is described by the HTML5 specification. However, one shouldn’t be confused since XHTML5 is as an application of XML. In other words, HTML5 and XHTML5 have identical vocabulary but different parsing rules.
  HTML5 documents might also be valid XML documents. This markup is often referred as a “polyglot” language. It is the overlap language of documents which are HTML5 and XML documents at the same time. HTML5 and XHTML5 serializations are cross-compatible. However, XHTML5 has a stricter syntax. Furthermore, some parts of XHTML5 are not valid in HTML5, e.g., processing instructions.
  Documents served as XML MIME type, such as application/xhtml+xml, are treated as XML documents by browsers, i.e. they are parsed by an XML processor. It is important to keep in mind that XML and HTML are processed differently. In fact, even minor syntax errors will prevent an XML document (or the ones that claimed to be XML) from being rendered correctly. In contrast, the errors of such documents would be ignored in the HTML syntax. A parsing error of XML documents can easily result in a “Yellow Screen of Death”.

Character encoding declarations

Character encoding of XHTML5 documents can be determined in many ways:

  • Using the HTTP header
  • Using in-document declarations (pragma directive / meta charset attribute / XML declaration)
    )

The older kind of declaration should be used at the top of the head element:

<meta http-equiv="Content-Type"
content="application/xhtml+xml; charset=utf-8" />

XHTML5 also provides a newly specified meta charset attribute (either of them could be used but only one at the same time). It should also be ensured that the whole declaration fits within the first 512 bytes of the page.

<meta charset="utf-8" />

This kind of meta element declaration cannot be used in the head element of XHTML5 documents if the character encoding is UTF-16. A byte-order mark should be present at the beginning of UTF-16 encoded files.

The encoding declaration of XHTML documents depends on which MIME type they are served with. If they are served as text/html, the pragma directive can be used at the top of the head element:

<meta http-equiv="Content-type" content="text/html;charset=utf-8" />

XHTML documents served as XML can use the encoding declaration of the XML declaration on the first line of the document, i.e.

<?xml version="1.0" encoding="utf-8"?>

It should be ensured that there is no other content before the declaration (a byte-order mark can be used).

Sample XHTML5 document


<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>An XHTML5 example</title>
</head>
<body>

</body>
</html>

Reply