Multimedia Semantics, Spatiotemporal Reasoning, and 3D Scene Interpretation

Semantic Web Research

Video Semantics for Scene Interpretation

The immense and constantly growing number of videos urges efficient automated processing mechanisms for multimedia contents, which is a real challenge due to the huge Semantic Gap between what computers can automatically interpret from audio and video signals and humans can comprehend based on cognition, knowledge, and experience. Low-level features, which correspond to local and global characteristics of audio and video signals, and low-level feature aggregates and statistics, such as various histograms based on low-level features, can be represented by low-level feature descriptors. Such automatically extractable descriptors, such as dominant color and motion trajectory, are suitable for a limited range of applications only (e.g., machine learning-based classification), and are not connected directly to sophisticated human-interpretable, high-level descriptors, such as concepts depicted in a video.

Semantic Gap

To narrow the Semantic Gap, feature extraction and analysis can be complemented by machine-interpretable background knowledge formally grounded in description logics. The depicted concepts and their spatial relationships are usually described in RDF, which can express machine-readable statements in the form of subject-predicate-object triples (RDF triples), e.g., scene-depicts-heart. The formal definition of the depicted concepts and relationships are derived from controlled vocabularies, ontologies, commonsense knowledge bases, and Linked Open Data (LOD) datasets. The Regions of Interest (RoI) can be annotated using media fragment identifiers. The temporal annotation of actions and video events can be performed using temporal description logics and rule-based mechanisms. The fusion of these descriptors, including descriptors of different modalities, is suitable for the machine-interpretable spatiotemporal annotation of complex video scenes.

Spatiotemporal Medical Video Annotation

3D Model Semantics for Feature-Based 3D Model Retrieval

3D models play an important role in a wide range of applications, including engineering, medicine, 3D printing, scientific visualization, education, and entertainment. However, the automated processing of 3D models is rather limited due to the difficulty of capturing their semantics. For instance, surgeons can use accurate 3D printed human organs for preoperative diagnostic techniques to improve surgical decisions (as opposed to 2D images of X-ray, ultrasound, and MRI for surgical planning), but the most appropriate 3D models cannot be retrieved efficiently from an organ repository by simple keywords or labels used in traditional information retrieval methods. 3D ontologies can provide formal definitions for 3D objects, thereby enabling feature-based 3D model retrieval.

Semantic Annotation of Medical 3D Model