The content of conventional web sites is human-readable only, which is unsuitable for automatic processing and inefficient when searching for related information. Web datasets can be considered as isolated data silos that are not linked to each other. This limitation can be addressed by organizing and publishing data using powerful formats that add structure and meaning to the content of web pages and link related data to one another so that computers can use the data more easily and efficiently.
A typical web page contains structuring elements, formatted text, and some even multimedia objects. By default, the headings, texts, links, and other web site components created by the web designer are meaningless to computers. While browsers can display web documents based on the markup, only the human mind can interpret the meaning of information, so there is a huge gap between what computers and humans understand. Even if alternate text is specified for images (
alt attribute with descriptive value on the
figure elements), the data is not structured or linked to related data, and human-readable words of conventional web page paragraphs are not associated with any particular software syntax or structure. Without context, the information provided by web sites can be ambiguous to search engines.
In contrast to the conventional Web (the “Web of documents”), the Semantic Web includes the “Web of Data,” which connects “things” (representing real-world humans and objects) rather than documents meaningless to computers. The machine-readable datasets of the Semantic Web are used in a variety of web services, such as search engines, data integration, resource discovery and classification, cataloging, intelligent software agents, content rating, and intellectual property right descriptions, museum portals, community sites, podcasting, Big Data processing, business process modeling, and medical research. On the Semantic Web, data can be retrieved from seemingly unrelated fields automatically, in order to combine them, find relations, and make discoveries.
Note that the word semantic is used on the Web in other contexts as well. For example, in HTML5 there are semantic (in other words, meaningful) structuring elements, but this expression refers to the “meaning” of elements. In this context, the word semantic contrasts the “meaning” of elements, such as that of
section (a thematic grouping), with the generic elements of older HTML versions, such as the “meaningless”
div. The semantics of markup elements should not be confused with the semantics (in other words, machine-processability) of metadata annotations and web ontologies used on the Semantic Web. The latter can provide far more sophisticated data than the meaning of a markup element.
Main Components of the Semantic Web
- Knowledge Representation: structured data, such as machine-interpretable metadata written in RDFa, HTML5 Microdata, and JSON-LD annotations
- Knowledge Management: data modeling with RDF, knowledge organization with SKOS, machine-interpretable controlled vocabularies and ontologies in RDFS and OWL suitable for integrity checking (reasoning) and the automatic discovery of relationships between seemingly unrelated structured data (inference); querying RDF with SPARQL
- Linked Data: interlinked machine-readable statements, often published as LOD datasets with an open license (Linked Open Data)
Fundamental Semantic Web Features
- Machine-interpretable, structured data repositories with free, open access interlinked with each other
- Globally edited adaptive information storage, classification, reorganization, and data reuse
- Unique web resource identifiers for every bit of information (e.g., each table data cell of a table has a unique web resource identifier)
You can master the Semantic Web from the book Mastering Structured Data on the Semantic Web.