Tutorial on XHTML
Software Engineering Group 4
Stijn Coene
stijn.coene@vub.ac.be
"March 19, 2004"
1 Introduction
This tutorial will describe the difference between XHTML and HTML
4.0.
2 Basic differences
I split this in two parts. The basic differences are those that we
will use more. The advanced differences can also be useful, but I
think the're not of crucial importance for our project.
2.1 Documents must be well-formed
Essentially this means that all elements must either have closing
tags or be written in a special form (as described below), and that
all the elements must nest properly.
Example
CORRECT: nested elements.
<p>here is an emphasized <em>paragraph</em>.</p>
INCORRECT: overlapping elements
<p>here is an emphasized <em>paragraph.</p></em>
2.2 Element and attribute names must be in lower case
XHTML documents must use lower case for all HTML element and attribute
names. This difference is necessary because XML is case-sensitive
e.g. <li> and <LI> are different tags.
2.3 For non-empty elements, end tags are required
In SGML-based HTML 4 certain elements were permitted to omit the end
tag; with the elements that followed implying closure. XML does not
allow end tags to be omitted. All elements other than those declared
in the DTD as EMPTY must have an end tag. Elements that are declared
in the DTD as EMPTY can have an end tag or can use empty element shorthand
Example
CORRECT: terminated elements
<p>here is a paragraph.</p><p>here is another paragraph.</p>
INCORRECT: unterminated elements
<p>here is a paragraph.<p>here is another paragraph.
2.4 Attribute values must always be quoted
All attribute values must be quoted, even those which appear to be
numeric.
Example
CORRECT: quoted attribute values
<td rowspan=\char`\"{}3\char`\"{}>
INCORRECT: unquoted attribute values
<td rowspan=3>
2.5 Attribute Minimization
XML does not support attribute minimization. Attribute-value pairs
must be written in full. Attribute names such as compact and checked
cannot occur in elements without their value being specified.
Example
CORRECT: unminimized attributes
<dl compact=\char`\"{}compact\char`\"{}>
INCORRECT: minimized attributes
<dl compact>
2.6 Empty Elements
Empty elements must either have an end tag or the start tag must end
with />. For instance, <br/> or <hr></hr>.
Example
CORRECT: terminated empty elements
<br/><hr/>
INCORRECT: unterminated empty elements
<br><hr>
3 Advanced differences
3.1 Script and Style elements
In XHTML, the script and style elements are declared as having #PCDATA
content. As a result, < and & will be treated as the start of markup,
and entities such as < and & will be recognized as entity
references by the XML processor to < and & respectively. Wrapping
the content of the script or style element within a CDATA marked section
avoids the expansion of these entities.
<script type=\char`\"{}text/javascript\char`\"{}>
<!{[}CDATA{[}
... unescaped script content~...
{]}{]}>
</script>
CDATA sections are recognized by the XML processor and appear as nodes
in the Document Object Model, see Section 1.3 of the DOM Level 1 Recommendation
[DOM] (http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html#ID-E067D597).
An alternative is to use external script and style documents.
3.2 SGML exclusions
SGML gives the writer of a DTD the ability to exclude specific elements
from being contained within an element. Such prohibitions (called
"exclusions") are not possible in XML.
For example, the HTML 4 Strict DTD forbids the nesting of an 'a' element
within another 'a' element to any descendant depth. It is not possible
to spell out such prohibitions in XML. Even though these prohibitions
cannot be defined in the DTD, certain elements should not be nested.
A summary of such elements and the elements that should not be nested
in them is found in the normative Element Prohibitions.
3.3 The elements with 'id' and 'name' attributes
HTML 4 defined the name attribute for the elements a, applet, form,
frame, iframe, img, and map. HTML 4 also introduced the id attribute.
Both of these attributes are designed to be used as fragment identifiers.
In XML, fragment identifiers are of type ID, and there can only be
a single attribute of type ID per element. Therefore, in XHTML 1.0
the id attribute is defined to be of type ID. In order to ensure that
XHTML 1.0 documents are well-structured XML documents, XHTML 1.0 documents
MUST use the id attribute when defining fragment identifiers on the
elements listed above.
Note that in XHTML 1.0, the name attribute of these elements is formally
deprecated, and will be removed in a subsequent version of XHTML.
3.4 Attributes with pre-defined value sets
HTML 4 and XHTML both have some attributes that have pre-defined and
limited sets of values (e.g. the type attribute of the input element).
In SGML and XML, these are called enumerated attributes. Under HTML
4, the interpretation of these values was case-insensitive, so a value
of TEXT was equivalent to a value of text. Under XML, the interpretation
of these values is case-sensitive, and in XHTML 1 all of these values
are defined in lower-case.
3.5 Entity references as hex values
SGML and XML both permit references to characters by using hexadecimal
values. In SGML these references could be made using either &#Xnn;
or &#xnn;. In XML documents, you must use the lower-case version
(i.e. &#xnn;)
File translated from
TEX
by
TTH,
version 3.40.
On 13 Jun 2004, 11:10.