To leverage the power of RDF, data on the Semantic Web can be stored in graph datases rather than relational databases. A graph database is a database that implements graph structures for semantic queries using nodes, edges, and properties to represent and retrieve data. General graph databases can store any RDF graph. Some graph databases are specialized in the storage and retrieval of triples, called triplestores or subject-predicate-object databases. If a graph database stores the graph name (representing the graph context or provenance information) for each triple, i.e., quads, the database is called a quadstore. The triplestores that have been built on top of existing commercial relational database engines (such as SQL-based databases) are typically not as efficient as the native triplestores with a database engine built from scratch for storing and retrieving RDF triples. The performance of native triplestores is usually better due to the difficulty of mapping the graph-based RDF model to SQL queries.
The advantages of graph databases are derived from the advantageous features of RDF, OWL, and SPARQL. RDF data elements are globally unique and linked, leveraging the advantages of the graph structure. Adding a new schema element is as easy as inserting a triple with a new predicate. Graph databases also support ad hoc SPARQL queries. Unlike the column headers, foreign keys, or constraints of relational databases, the entities of graph databases are categorized with classes, predicates are properties or relationships, and they are all part of the data. Due to the RDF implementation, graph databases support automatic inferencing for knowledge discovery. The data stored in these databases can unify vocabularies, dictionaries, and taxonomies through machine-readable ontologies. Graph databases are commonly used in semantic data integration, social network analysis, and Linked Open Data applications.
Popular RDF Triplestores and Quadstores
AllegroGraph is an industry-leading graph database, which constantly sets new records in loading and querying huge amounts of RDF triples.
Oracle Spatial and Graph
Oracle Spatial and Graph, Oracle’s RDF triplestore and ontology management platform, provides automatic partitioning and data compression, as well as high-performance parallel and direct path loading with the Oracle Database and loading through Jena
FoundationDB is a multi-model NoSQL database with a distributed computing architecture. FoundationDB stores all data in the Key-Value Store, which can be distributed across many machines and has a simple but powerful key-value API with full ACID transactions.
Datastax is a database platform purpose-built for the Internet of Things, web apps and mobile apps, which delivers Cassandra, Apache’s open source distributed database management system as part of the platform. DataStax manages massive volumes of dynamic data for search and analysis of user activity. It supports multi-data center and cloud replication.
Neo4j is one of the world’s leading graph databases which queries connected data a thousand times faster than relational databases. Neo4j provides constant-time query performance and very flexible complex hierarchy handling.
Systap’s Bigdata is an open source storage and computing platform. It features an ultra-high performance RDF graph database that supports RDFS and OWL Lite reasoning, as well as SPARQL 1.1 querying. Bigdata supports robust enterprise deployments with high up-time and Quality of Service (QoS) demands with its quorum-based high-availability (HA) architecture.
Dydra is a powerful graph database in the cloud, allowing businesses to make the most of highly connected data such as social networks.