HTML5 Microdata


HTML5 Microdata

HTML5 Microdata is an HTML5 module defined in a separate specification, extending the HTML5 core vocabulary with attributes for representing structured data. Other machine-readable annotation formats to add structured data to the markup are RDFa (RDF in attributes) and JSON-LD. A limited format, introduced before all the others, is called microformats.

Global HTML5 Microdata Attributes

HTML5 Microdata represents structured data as a group of name-value pairs. The groups are called items, and each name-value pair is a property. Items and properties are represented by regular elements. To create an item, the itemscope attribute is used. To add a property to an item, the itemprop attribute is used on a descendant of the item (a child element of the container element), as shown in Listing 1.

Listing 1. A Person’s Description in HTML5 Microdata


<div itemscope="itemscope" itemtype="http://schema.org/Person">
  <span itemprop="name">John Smith</span>
  <img src="johnsmith.jpg" alt="John Smith" itemprop="image" />
  John’s web site:
  <a href="http://www.johnsmithexample.com" itemprop="url">johnsmithexample.com</a>
</div>

Property values are usually strings (sequences of characters) but can also be web addresses, as the value of the href attribute on the a element, the value of the src attribute on the img element, or other elements that link to or embed external resources. In Listing 1, for example, the value of the image item property is the attribute value of the src attribute on the img element, which is johnsmith.jpg. Similarly, the value of the url item property is not the content of the a element, johnsmithexample.com, but the attribute value of the href attribute on the a element, which is http://www.johnsmithexample.com. By default, however, the value of the item is the content of the element, such as the value of the name item property in this example: John Smith (delimited by the <span> and </span> tag pair).
The type of the items and item properties are expressed using the itemtype attribute, by declaring the web address of the external vocabulary that defines the corresponding item and properties. In our example, we used the Person vocabulary from http://schema.org that defines properties of a person, such as familyName, givenName, birthDate, birthPlace, gender, nationality, and so on. The full list of properties is defined at http://schema.org/Person, which is the value of the itemtype. In the example, we declared the name with the name property, the depiction of the person with the image property, and his web site address using the url property. The allowed values and expected format of these properties are available at http://schema.org/name, http://schema.org/image, and http://schema.org/url, respectively.
The item type is different for each knowledge domain, and if you want to annotate the description of a book rather than a person, the value of the itemtype attribute will be http://schema.org/Book, where the properties of books are collected and defined, such as bookFormat, bookEdition, numberOfPages, author, publisher, etc. If the item has a global identifier (such as the unique ISBN number of a book), it can be annotated using the idemid attribute, as shown in Listing 2.

Listing 2. The Description of a Book in HTML5 Microdata


<div itemscope="itemscope" itemtype="http://schema.org/Book" itemid="urn:isbn:978-1-484208-84-7">
  <img itemprop="image" src="http://www.masteringhtml5css3.com/img/webstandardsbook.jpg" alt="Web Standards" />
  <span itemprop="name">Web Standards: Mastering HTML5, CSS3, and XML</span> by <a itemprop="author" href="http://www.lesliesikos.com">Leslie Sikos</a>
</div>

Although HTML5 Microdata is primarily used for semantic descriptions of people, organizations, events, products, reviews, and links, you can annotate any other knowledge domains with the endless variety of external vocabularies. Groups of name-value pairs can be nested in a Microdata property by declaring the itemscope attribute on the element that declared the property (see Listing 3).

Listing 3. Nesting a Group of Name-Value Pairs


<div itemscope="itemscope">
  <p>Name: <span itemprop="name">Herbie Hancock</span></p>
  <p>Band: <span itemprop="band" itemscope="itemscope"> 
             <span itemprop="name">The Headhunters</span> 
            (<span itemprop="size">7</span> members)
          </span>
  </p> 
</div>

In the preceding example, the outer item (top-level Microdata item) annotates a person, and the inner one represents a jazz band.

An optional attribute of elements with an itemscope attribute is itemref, which gives a list of additional elements to crawl to find the name-value pairs of the item. In other words, properties that are not descendants of the element with the itemscope attribute can be associated with the item using the itemref attribute, providing a list of element identifiers with additional properties elsewhere in the document (see Listing 4). The itemref attribute is not part of the HTML5 Microdata data model.

Listing 4. Using the itemref Attribute


<div itemscope="itemscope" id="herbie" itemref="a b"></div>
<p id="a">Name: <span itemprop="name">Herbie Hancock</span></p>
<div id="b" itemprop="band" itemscope="itemscope" itemref="c"></div>
<div id="c">
  <p>Band: <span itemprop="name">The Headhunters</span></p>
  <p>Size: <span itemprop="size">7</span> members</p>
</div>

The first item has two properties, declaring the name of jazz keyboardist Herbie Hancock, and annotates his jazz band separately on another item, which has two further properties, representing the name of the band as The Headhunters, and sets the number of members to 7 using the size property.

HTML5 Microdata DOM API

HTML5 Microdata has a DOM API for web developers to programmatically access the structured data represented in HTML5 Microdata.

You can read more on HTML5 Microdata in the book Mastering Structured Data on the Semantic Web.