SPARQL


The primary query language for the Resource Description Framework is SPARQL (pronounced “Sparkle”), which can be used to retrieve and manipulate information stored in RDF or in any format that can be retrieved as RDF. The output can be a results set or an RDF graph. It is also possible to update RDF graphs through a protocol known as the SPARQL 1.1 Uniform HTTP Protocol.

SPARQL uses a Notation3-like syntax. The URIs can be written in full between the less than (<) and greater than (>) characters such as <http://example.com>, or abbreviated using the namespace mechanism with the PREFIX keyword such as PREFIX schema: <http://schema.org>.

After declaring the Schema.org namespace (http://schema.org) for example, http://schema.org/Person can be abbreviated as schema:Person. Similar to N3, the URI http://www.w3.org/1999/02/22-rdf-syntax-ns#type or rdf:type can be abbreviated as a. Literals can be written with or without a language tag and typing. Plain string literals are delimited by quotation marks such as "a plain literal", while plain literals including a language tag end in the @ sign and the standard language code such as "Wagen"@de (the word “car” in German). Typed literals are written analogously to the typed literals in RDF, as for example, "55"^^xsd:integer (55 is an integer number rather than two meaningless characters in a string literal). Frequently used typed literals can be abbreviated such "true"^^xsd:boolean corresponds to true, while integer and decimal numbers are automatically assumed to be of type xsd:integer or xsd:decimal, respectively. As a consequently, "5"^^xsd:integer can be abbreviated as 5 while "13.1"^^xsd:decimal can be written as 13.1.

Each SPARQL query has a head and a body. The head of a SPARQL query is an expression for constructing the answer for the query. The evaluation of a query against an RDF graph is performed by checking the body whether it is matched against the graph, which results in a set of bindings for the variables in the body. These bindings are processed using relational operators such as projection and distinction to generate the output for the query. The body can be a simple triple pattern expression or a complex RDF graph pattern expression containing triple patterns such as subject-predicate-object RDF triples where each subject, predicate, or object can be a variable. The body can also contain conjunctions, disjunctions, optional parts, and variable value constraints.

The BASE directive, the namespace declarations (PREFIX), the dataset declaration (FROM, FROM NAMED), and the query modifiers (GROUP BY, HAVING, ORDER BY, LIMIT, OFFSET, BINDINGS) are optional. The BASE directive and the list of prefixes are used to abbreviate URIs. The BASE keyword defines the base URI against which all relative URIs in the query are resolved. The list of prefixes can contain are arbitrary number of PREFIX statements. The prefix abbreviation pref preceding the semicolon represents the prefix URI, which can be used throughout the SPARQL query, making it unnecessary to repeat long URIs (standard namespace mechanism). The FROM clause specifies the dataset to search. In some cases, as for example when the SPARQL endpoint used for the query is dedicated to the LOD dataset from which you want to retrieve data, the FROM clause is optional and can be safely omitted. The WHERE clause specifies the patterns to extract the desired results. The query modifiers such as ORDER BY or LIMIT, if present, are in the last part of the query.

In SPARQL 1.1, you can use aggregation. To perform aggregation, first you have to segregate the results into groups based on the expression(s) in the GROUP BY clause. Then, you evaluate the projections and aggregate functions in the SELECT clause to get one result per group. Finally, the aggregated results have to be filtered in a HAVING clause.

Query Types

The optional namespace declarations are followed by the query. The four query types in SPARQL are the SELECT, the ASK, the CONSTRUCT, and the DESCRIBE queries. SELECT queries provide a value selection for the variables matching the query patterns. The yes/no queries (ASK queries) provide a Boolean value. CONSTRUCT queries create new RDF data from the above values, as well as resource descriptions. DESCRIBE queries return a new RDF graph containing matched resources. The most frequently used SPARQL queries are the SELECT queries.

Pattern Matching

The query output result clause is followed by the pattern matching. Two different pattern types can be used in SPARQL queries: the triple patterns and the graph patterns. The SPARQL triple patterns are similar to the subject-predicate-object triples of RDF, but they can also include variables. This makes it possible to select RDF triples from an RDF graph that match your criteria described in the pattern. Any or all subject, predicate, or object values can be variables, all of which are identified by a question mark preceding the string such as ?name. To match an exact RDF triple, you have to write the subject-predicate-object names followed by a . such as ex:myDataset schema:familyName "Sikos" .

To match one variable, you have to substitute the appropriate triple component (the subject, the predicate, or the object) with a variable such as ?person schema:familyName "Sikos" . A triple pattern is not limited to one variable. You can substitute any triple components (the subject, the predicate, or the object) with variables such as ?person schema:familyName ?name . Even all components can be variables. For instance, the triple pattern ?subject ?object ?name will match all triples in your RDF graph. Sometimes much more complex selection rules are required than what you can express in a single triple pattern. A collection of triple patterns is called a graph pattern and is delimited by curly braces, as for example


{
  ?who schema:name ?name.
?who iswc:research_topic ?research_topic.
  ?who foaf:knows ?others.
}

Solution Modifiers

The last optional parts of the SPARQL queries are the solution modifiers. Once the output of the pattern has been computed (in the form of a table of values of variables), solution modifiers allow to modify these values applying standard classical operators like projection, DISTINCT (removes duplicates), ORDER (sorting mechanism) and LIMIT (sets the maximum number of results returned).

SELECT Queries

The most common SPARQL queries are the SELECT queries. The SELECT clause specifies data items (variable bindings) to be returned by the SPARQL query. Even through LOD datasets can contain thousands or even millions of RDF triples, you can select those items that meet your criteria. For example, from a writer’s dataset you can list those writers you lived in the twentieth century or those are Americans. SPARQL supports joker characters, so you can select all variables mentioned in the query using SELECT *. If you want to eliminate potential duplicates, use the DISTINCT keyword after SELECT such as SELECT DISTINCT ?var. SELECT queries are often used to extract triples through specific variables and expressions. For example, assume we need a query to extract all names mentioned in someone’s FOAF file declared using foaf:name. The abbreviation of the namespace requires a PREFIX declaration. The query is a SELECT query, which uses a variable for the names (?name), and a WHERE clause with a triple pattern to find all subjects (?person) and objects (?name) linked with the foaf:name predicate such as


PREFIX foaf:  
SELECT ?name
WHERE {
    ?person foaf:name ?name .
}