Big Semantic Data


Big Data refers to any high-volume, high-velocity datasets large and complex enough to process using traditional data processing tools, applications, and database systems. Representing petabytes of data, such datasets store billions of hidden values unavailable for efficient and automatic machine processing. Big Data is characterized by three Vs:

  • Volume: hige amounts of data stored in and retrieved from massive datasets. The challenge is to achieve a reasonable processing speed, especially in real-time applications.
  • Velocity: high rate data flow. The challenge is the streaming data processing.
  • Variety: different data structures, data formats, and serializations.

One of the promising approaches to address the issues associated with Big Data is to store data as structured data and represent the datasets as graphs, which allows software agents to query these semantic databases and yield enriched results. The processing of Linked Data makes it possible to find information currently difficult or impossible to find as well as to find further, related information.

Google Knowledge Graph

A great example of Big Data applications on the Semantic Web is the Knowledge Graph of Google, which was introduced in 2012. The Google Knowledge Graph is a semantic knowledge base to enhance traditional Search Engine Result Pages (SERPs) with semantic search information gathered from a wide variety of sources. The result of a Knowledge Graph search is not only relevant information far more accurate that what you would find with traditional searches, but also related information such as similar resources people search for the most. For example, if you search for the title of an action movie, the results will include similar movies, while searching for a particular inventor will show further inventors with similar research fields and awards. The Knowledge Graph contains more than half a billion objects and over 18 billion facts about relationships between different objects that help software agents “understand” the meaning of the search keywords, and these figures are constantly growing.

Another good example for Big Data implementations on the Semantic Web is social media. The Facebook Open Graph protocol enables any web page to become a rich object in a social graph. Facebook Graph Search is a semantic search engine introduced in 2013 to give answers to natural language queries of users rather than a list of links. Facebook Graph Search works with Big Data acquired from over one billion Facebook users and external data into a search engine to provide advanced, accurate, user-specific search results.

Many graph databases are powerful enough for Big Data applications. For instance, both AllegroGraph and Oracle Spatial and Graph are capable of storing, indexing, and retrieving more than 1 trillion triples with an extraordinary load rate of up to 1.4 million triples per second.