Semantic Web Technologies in Data Science


Data Science

Data science (data-driven science) is an interdisciplinary field utilizing Big Data analytics, machine learning, statistics, and other scientific methods, processes, and systems to extract previously hidden knowledge and gain insight from data in various forms, SQL or NoSQL databases, unstructured or structured data, etc. In data science, Semantic Web technologies can be used for data integration, normalization, and visualization, as seen in many data science tools.

In the enterprise, data is often produced using a wide range of software tools in a variety of file formats with greatly varying data quality. Many of the corresponding data integration issues can be addressed using RDF, which can be efficiently processed automatically.

Because of the richness of natural language, a single word may have more than one correct spelling, which can be handled easily using Semantic Web standards, and concepts identified with unique, deriferencable identifiers.

Once data is expressed in RDF, there are lots of different options for data visualization, even when it comes to Big Data. For example, Gephi can visualize tens of millions of nodes in a graph with proportional bubbles, colors, etc.