Do you want better rankings on search engine result pages? Do you consider hiring SEO experts? Read on to know what to ask for.
Traditional Search Engine Optimization
Search Engine Optimization (SEO) is the practice of various methods to improve search engine rankings of a website as part of the Internet marketing strategy. Generally, the higher ranked a website on the Search Engine Results Page (SERP) of Google, Yahoo!, AltaVista, Bing, Ask Jeeves, AOL, LookSmart, About, Excite, HotBot, Lycos, and other search engines and the more frequently it appears in the search results list, the more visitors it will receive from the search engine. That is the reason why increasing the ranking of webpages is very important, simply because it has a significant impact on website traffic.
Search Engine Marketing (SEM) is a practice in Search Engine Optimization to promote websites internally (on-page) and externally (off-page), and through paid or free advertisements (in contrast to Pay-Per-Click (PPC) ads).
Some of the most important Search Engine Optimizaton tasks are the following:
- Site analysis (existing sites): Google Pagerank (PR), Google index, SEMrush rank, Bing index, Alexa rank, domain age (webarchive), Twitter tweets, Facebook linkes, Google PlusOne, internatl and external links, keyword density, URL, title tags, meta description, level 1 headings, images, text/HTML ratio, code optimality, robots.txt exclusion rules, language declaration, encoding declaration, machine-readable metadata (microformats, RDFa, HTML5 microdata), geo metadata, server response time and uptime
- Getting the webpages indexed by Google, Yahoo!, Bing, and other search engines through properly written content (original content should usually be adjusted or rewritten)
- Submitting the websites to link libraries
- Link building
- Competitor research, whois lookup
- Quality assurance using back-end tools (e.g., Google Analytics, W3C validators)
- Keyword research and keyword density analysis, editing website contents to increase the occurence of relevant keywords and to expedite search engine indexing
- Website saturation and popularity, online advertisement and promotion to increase the number of backlinks (inbound links)
- Social Media Marketing (Facebook fan page, Twitter, LinkedIn, RSS or Atom news feed channel, etc.)
- Prevent sensitive or undesirable contents from being crawled
From meta Tags to Automatically Extractable Structured Data
In the 1990s, meta
elements had a large effect on web search results. Since then, their significance has been decreasing, partly because of the unethical tricks that have been used to manipulate search engine rankings. A good example is keyword stuffing, which was used to load a web page with popular keywords that were not necessarily relevant to the page content, either in the meta tags or in the content. In the latter case, the keywords were often hidden, but the web page that contained them was indexed by search engines. Such tricks made it possible for developers to achieve higher ranking on search results but significantly increased the number of irrelevant links on search result lists. Although they are less important nowadays, meta tags still should be used to provide information on web page contents for search engines.
The meta tags in HTML/XHTML can define a variety of metadata, for example, content type, author, publication date, keywords, page content description, character encoding, and so on. These tags were introduced in HTML 2.0 and are still current.
The following attributes can be used on the meta element: content
, http equiv
, name
, and scheme
. The first one is the only required attribute. In HTML5, the scheme
attribute is not supported on the meta
element, and there is another attribute called charset
. The meta element attributes can specify the following:
- Alternatives to HTTP headers that are sent by web servers prior to the web page content. For instance, a document expiry date can be provided by the meta Tag as
<meta http-equiv="expires" content="Fri, 15 October 2010 14:15:00 GMT" />
- Names and associated content attributes describing aspects of (X)HTML pages. For example,
<meta name="keywords" content="standardization, accessibility" />
is a keyword declaration with themeta
tag. Meta schemes specify a semantic framework defining the meaning of the key and its value (prior to HTML5). They can also prevent potential ambiguity. For example,<meta name="foo" content="bar" scheme="DC" />
declares the meta scheme as Dublin Core (DC).
The language
, keywords
, description
, and robots
attributes contribute to more precise web searches by defining document language, the most relevant keywords, and a short description. The value of the last attribute, robots
, provides control over search engine behavior for a limited extent. Web pages can be prevented from being indexed (noindex
), crawled (nofollow
), cached (noarchive
), described (nosnippet
), or described according to the Open Directory Project (noodp
). The combination of the noindex
, nofollow
values can be substituted by the value none. This setting can be used, for example, for confidential documents whose content and links should not be indexed by search engines. Web page descriptions retrieved from ODP used by Google, Yahoo!, and Bing can be disallowed specifically. The meta
name to be applied is Googlebot
for Google, Slurp
for Yahoo!, and msnbot
for Bing, i.e., <meta name="Googlebot" content="noodp" />
, <meta name="Slurp" content="noodp" />
, and >meta name="msnbot" content="noodp" /<
respectively. If you want to prevent the descriptions and titles retrieved from the Yahoo! Directory from being displayed in search results, you can use the noydir
value such as <meta name="robots" content="noydir" />
. In spite of the variety of attribute values, using meta
tags for preventing search engine indexing or crawling is not the best solution. The robots.txt
file should be used instead for this purpose.
To summarize, the typical general metadata provided in the head
section of XHTML5 looks like
<meta charset="UTF-8" />
<meta name="robots" content="index, follow" />
<meta name="content-language" content="en" />
<meta name="author" content="John Smith" />
<meta name="keywords" content="My Darling, pet shop, pet accessories, dog, collar, harness, dog lead, dog kennel, dog bowl, dog coats" />
<meta name="description" content="The website of the pet shop My Darling." />
Since the attribute value of the name attribute on the meta
element is robots
, the value of the content attribute (index
, follow
) is applied to all search engines rather than a specific one.
Machine-Readable Metadata Using Semantic Web Standards
To improve the processability of web sites, formal knowledge representation standards are required, that can be used not only to annotate markup elements for simple machine-readable data, but also to express complex statements and relationships in a machine-processable manner. On the Semantic Web, structured datasets are usually expressed in, or based on, the Resource Description Framework (RDF), which is the primary modeling language for structured data representations and is suitable for creating a machine-readable description about any kind of web resource. Structured data can be efficiently annotated in the markup, or written in separate, machine-readable metadata files. The very first approch to such metadata annotations are the microformats that lost their popularity in the past few years. The three most common machine-readable annotations that are recognized and processed by search engines are RDFa, HTML5 Microdata, and JSON-LD of which HTML5 Microdata is the recommended format. The machine-readable annotations extend the core (X)HTML markup with additional elements and attributes through external vocabularies that contain the terminology and properties of a knowledge representation domain, as well as the relationship between the properties in a machine-readable form. Web ontologies can be used for searching, querying, indexing, and agent or service metadata management, or to improve application and database interoperability. Web ontologies are especially useful for knowledge-intensive applications, where text extraction, decision support, or resource planning are common tasks, as well as in knowledge repositories used for knowledge acquisition. The schemas defining the most common concepts of life and the relationships between them are collected by semantic knowledge bases. These knowledge organization schemas are the de facto standards used by machine-readable annotations serialized in RDFa, HTML5 Microdata, or JSON-LD, as well as in RDF files of Linked Open Data datasets. The four machine-readable annotation formats can be summarized as follows (by order of introduction):
- Microformats, which publishes structured data about basic concepts such as people, places, events, recipes, and audio through core (X)HTML attributes
- RDFa, which expresses RDF in markup attributes that are not part of the core (X)HTML vocabularies
- HTML5 Microdata, which extends the HTML5 markup with structured metadata (a HTML5 Application Programming Interface)
- JSON-LD, which adds structured data to the markup as JavaScript code
Metadata annotations such as RDFa can effectively contribute to better search results. The more semantic content is provided on the Web, the more reasonable and relevant search results can be expected from search engines. Properly set metadata can help search engines better process and provide personal introductions, contact data, and full descriptions of persons and human relationships in search results. The indexing of brochure-style business cards and personal information described in (X)HTML markup, XML, RDF, FOAF, and DOAC is straightforward. However, semantic contents embedded in conventional markup can be processed only if they are supported by the mechanisms used by web crawlers. Fortunately, search engines can process more and more metadata types.
In spite of the huge potential of metadata implementations in search engine optimization, there are several limitations in real-life applications. For example, image metadata cannot be fully exploited since a large share of social media and photo-sharing web sites either remove all embedded metadata during upload or apply a new, on-the-fly generated file without them (even in another file format). Images uploaded to the Internet by anonymous Wikipedia editors, on the other hand, can be found by their embedded metadata indexed by Google (if available). It is arguable whether this feature is advantageous.
Similar to any other data, it is important to decide wisely what to publish on the Web. It is no problem at all to publish the ISBN number of a book or a link to the DBpedia description of a web site item; however, several types of metadata are risky to publish since they can be abused. Especially e-mail addresses, phone numbers, and instant messenger screen names should be provided with extreme precaution.
Metadata embedding goes hand in hand with accessibility. Accessibility guidelines can ensure that alternate content is provided for objects and that the document structure is well organized.
Some SEO practices do not contribute to user experience (UX) such as frequently repeated keywords that can decrease human readability.
The RDF data model is based on statements to describe and feature resources, especially web resources, in the form of subject-predicate-object (resource–property–value) expressions RDF triples. The predicate (property) describes the relationship between the subject and the object. For example, the statement “Leslie’s homepage is http://www.lesliesikos.com” can be expressed as
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
<foaf:Person rdf:about="http://www.lesliesikos.com/metadata/sikos.rdf#lesliesikos">
<foaf:homepage rdf:resource="http://www.lesliesikos.com" />
<foaf:family_name>Sikos</foaf:family_name>
<foaf:givenname>Leslie</foaf:givenname>
</foaf:Person>
</rdf:RDF>
The about
attribute of RDF declares the subject of the RDF statement, which is in this case http://www.lesliesikos.com/metadata/sikos.rdf#lesliesikos
. The fragment identifier #lesliesikos
is used to identify an actual person rather than a document (sikos.rdf
). Those objects whose value is a web address such as the homepage of a person are declared using the resource attribute of RDF, in contrast to those that are string literals (character sequences) such as Sikos
(the value of the family_name
property from the FOAF vocabulary). The syntax of this example is known as the RDF/XML serialization (RDF/XML), which is the normative syntax of RDF, using the application/rdf+xml
Internet media type and the .rdf
or .xml
file extension. Structured datasets can be written in RDF using a variety of other syntax notations and data serialization formats, for example, RDFa, JSON-LD, Notation3 (N3), Turtle, N Triples, TRiG, and TRiX, so the syntax of RDF triples varies from format to format. The N3 syntax is, for example, less verbose than the RDF/XML serialization, where the namespace
prefix is declared by the @prefix
directive, the URIs are delimited by the lass then (<
) and greater than (>
) signs, and the triples are separated by semicolons (;
). For instance, a person can be described in N3 as
@prefix : <http://www.lesliesikos.com/metadata/sikos.rdf#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
:lesliesikos a foaf:Person ;
foaf:givenname "Leslie" ;
foaf:family_name "Sikos" ;
foaf:homepage <http://www.lesliesikos.com> .
These triples can be represented as an RDF graph, which is a directed, labeled graph where the nodes are the resources and values. The nodes and predicate arcs of the RDF graph correspond to node elements and property elements. The default node element is rdf:Description
, which is very frequently used as the generic container of RDF statements in RDF/XML. While the machine-readable RDF files are useful, their primary use is data modelling, so the RDF files are separate from the markup of your web site. You can add structured data directly to the markup such as (X)HTML5 by using machine-readable annotations, which can be processed by semantic data extractors and if needed, converted into RDF. RDFa and JSON-LD can be used in most markup language versions and variants, while HTML5 Microdata can be used in HTML5 and XHTML5 only. All these annotation formats have their own syntax, for example the vocabulary is declared with the vocab
attribute in RDFa, the itemtype
attribute in Microdata and context
in JSON-LD:
Annotation | Data |
---|---|
Markup Without Structured Data |
|
Markup With HTML5 Microdata |
|
Markup With RDFa |
|
Markup With JSON-LD |
|
HTML5 Microdata
HTML5 Microdata is an HTML5 module defined in a separate specification, extending the HTML5 core vocabulary with attributes for representing structured data. HTML5 Microdata represents structured data as a group of name-value pairs. The groups are called items, and each name-value pair is a property. Items and properties are represented by regular elements. To create an item, the itemscope
attribute is used. To add a property to an item, the itemprop
attribute is used on a descendant of the item (a child element of the container element) as follows:
<div itemscope="itemscope" itemtype="http://schema.org/Person">
<span itemprop="name">Leslie Sikos</span>
<img src="lesliesikos.jpg" alt="Leslie Sikos" itemprop="image" />
Leslie’s web site:
<a href="http://www.lesliesikos.com" itemprop="url">lesliesikos.com</a>
</div>
Property values are usually strings (sequences of characters), but can also be web addresses as the value of the href
attribute on the a
element, the value of the src
attribute on the img
element, or other elements that link to or embed external resources. In the above code for example, the value of the image item property is the attribute value of the src
attribute on the img
element, which is lesliesikos.jpg
. Similarily, the value of the url
item property is not the content of the a
element, lesliesikos.com
, but the attribute value of the href
attribute on the a
element, which is http://www.lesliesikos.com
. By default, however, the value of the item is the content of the element, such as the value of the name item property in this example is Leslie Sikos
(delimited by the <span>
and </span>
tag pair). The type of the items and item properties are expressed using the itemtype
attribute by declaring the web address of the external vocabulary that defines the corresponding item and properties. In our example, we used the Person
vocabulary from http://schema.org
that defines properties of a person such as familyName
, givenName
, birthDate
, birthPlace
, gender
, nationality
, and so on. The full list of properties is defined at http://schema.org/Person
, which is the value of the itemtype
. In the example we declared the name with the name
property, the depiction of the person with the image
property, and his website address using the url
property. The allowed values and expected format of these properties are available at http://schema.org/name
, http://schema.org/image
, and http://schema.org/url
, respectively.
The item type is different for each knowledge domain, and if you want to annotate the description of a book rather than a person, the value of the itemtype
attribute will be http://schema.org/Book
, where the properties of books are collected and defined such as bookFormat
, bookEdition
, numberOfPages
, author
, publisher
, etc. If the item has a global identifier (such as the unique ISBN number of a book), it can be annotated using the idemid
attribute as follows.
<div itemscope="itemscope" itemtype="http://schema.org/Book"
itemid="urn:isbn: 978-1-484208-84-7">
<img itemprop="image" src="http://www.masteringhtml5css3.com/img/webstandardsbook.jpg" alt="Web Standards" />
<span itemprop="name">Web Standards: Mastering HTML5, CSS3, and XML</span>
by <a itemprop="author" href="http://www.lesliesikos.com">Leslie Sikos</a>
</div>
Although HTML5 Microdata is primarily used for semantical descriptions of people, organizations, events, products, reviews, and links, you can annotate any other knowledge domains with the endless variety of external vocabularies. Groups of name-value pairs can be nested in a Microdata property by declaring the itemscope
attribute on the element that declared the property, as for example
<div itemscope="itemscope">
<p>Name: <span itemprop="name">Herbie Hancock</span></p>
<p>Band: <span itemprop="band" itemscope="itemscope">
<span itemprop="name">The Headhunters</span>
(<span itemprop="size">7</span> members)
</span>
</p>
</div>
In the above example, the outer item (top-level Microdata item) annotates a person, and the inner one represents a jazz band. An optional attribute of elements with an itemscope
attribute is itemref
, which gives a list of additional elements to crawl to find the name-value pairs of the item. In other words, properties that are not descendants of the element with the itemscope
attribute can be associated with the item using the itemref
attribute, providing a list of element identifiers with additional properties elsewhere in the document:
<div itemscope="itemscope" id="herbie" itemref="a b"></div>
<p id="a">Name: <span itemprop="name">Herbie Hancock</span></p>
<div id="b" itemprop="band" itemscope="itemscope" itemref="c"></div>
<div id="c">
<p>Band: <span itemprop="name">The Headhunters</span></p>
<p>Size: <span itemprop="size">7</span> members</p>
</div>
The first item has two properties, declaring the name of jazz keyboardist Herbie Hancock
, and annotates his jazz band separately on another item, which has two further properties, representing the name of the band as The Headhunters
, and set the number of members to 7
using the size
property. The itemref
attribute is not part of the HTML5 Microdata data model.
The HTML5 Microdata DOM API provides direct access for web developers to structured data expressed in Microdata.
JSON-LD
In contrast to RDFa and HTML5 Microdata, the two other formats to add structured data to the web site markup, JavaScript Object Notation for Linked Data (JSON-LD) is described as JavaScript code rather than markup elements and attributes. As a result, JSON-LD is completely separate from the (X)HTML code. One of the advantages of this lightweight Linked Data format is that it is easy for humans to read and write. JSON-LD transports Linked Data using the JavaScript Object Notation (JSON), an open standard format using human-readable text to transmit attribute-value pairs. If the JSON-LD code is written in a separate file rather than the markup, the de facto file extension is .jsonld
. The Internet media type of JSON-LD is application/ld+json
and, if written in the markup, the JSON-LD code is delimited by curly braces between the <script>
and </script>
tags as
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "Person",
"image": "lesliesikos.jpg",
"name": "Leslie Sikos",
"url": "http://www.lesliesikos.com"
}
</script>
This example uses the the compact syntax of JSON-LD, which can be expanded to the full syntax notation as
[
{
"@type": [
"http://schema.org/Person"
],
"http://schema.org/image": [
{
"@id": "http://www.lesliesikos.com/images/lesliesikos.jpg"
}
],
"http://schema.org/name": [
{
"@value": "Leslie Sikos"
}
],
"http://schema.org/url": [
{
"@id": "http://www.lesliesikos.com"
}
]
}
]
The JSON-LD API provides a way to transform JSON-LD documents to be more easily consumed by specific applications.