MPEG-7-Aligned Spatiotemporal Video Annotation and Scene Interpretation via an Ontological Framework for Intelligent Applications in Medical Image Sequence and Video Analysis

In the past decades, the rapid growth of medical technologies has dramatically increased the amount of multimedia data generated in medical imaging, which urges the development of efficient automated mechanisms for processing video contents. This is a big challenge due to the huge gap between what software agents can obtain from signal processing and what practitioners can comprehend based on cognition, expert knowledge, and experience. Automatically extracted low-level video features typically do not correspond to concepts, such as human organs and medical devices, and procedural events depicted in medical videos. To narrow the Semantic Gap, the depicted concepts and their spatial relations can be described in a machine-interpretable form using formal definitions from structured data resources. Rule-based mechanisms are efficient in describing the temporal information of actions and video events. The fusion of these structured descriptions with textual and, where available, audio descriptors is suitable for the machine-interpretable spatiotemporal annotation of complex medical video scenes. The resulting structured video annotations can be efficiently queried manually or programmatically, and can be used in scene interpretation, video understanding, and content-based video retrieval.

Despite the advantages and potential of Semantic Web technologies in this field, previous approaches failed to exploit rich semantics due to their reliance on proprietary solutions, incorrect and incomplete knowledge-based abstraction of medical domains, and limited temporal and spatial segmentation capacity. Because of the successful semantic implementations seen in the literature for a variety of knowledge domains and applications, there is no doubt that it is worthwhile to apply formal knowledge representation and automated reasoning to medical multimedia resources. Interestingly enough, while medical image semantics have been extensively researched, medical video semantics have long been neglected, and research efforts on video semantics have mainly been focusing on news videos, sports videos, and surveillance videos only.

In this work, novel formalisms and ontologies are proposed to set new directions in standards-aligned semantic video annotation and spatiotemporal reasoning. These can be applied to knowledge domains of practical importance, among which the medical domain has been selected as the primary domain, but other domains are also mentioned and demonstrated.