Semantic Video Annotations

Text-based video metadata annotations are represented as unstructured data with limited machine-processability. User-added metadata are often ambiguous, misspelt, or too generic to describe the audiovisual content, making video retrieval inefficient. Traditional video annotation formats written in XML Schema (XSD) provide semi-structured data, with a potential to map to structured data expressed in RDFS or OWL, bridging the “semantic gap” between human- and machine-interpretable contents. This review has recently been published as a monograph. For over a decade, the semantic limitations of XML-based annotations were addressed by the direct mapping of application-specific XML schemas to domain-specific machine-interpretable metadata terms, leading to open issues. The majority of newly introduced ontologies covers a very specific domain only due to the limitations of manual ontology engineering. Because of the lack of consensus, implementations vary, and multimedia ontologies rely heavily on the MPEG-7 standard, regardless that it does have a comprehensive coverage for high-level video descriptors, and it has not gain global adoption since its release in 2002. Automatic video metadata processing remains a challenge due to the lack of sufficient semantics provided for audiovisual contents. Software agents greatly depend on textual contents accompanying the embedded video objects and user-defined tags. The common annotation standards, such as Dublin Core, MPEG-7, and TV-Anytime, are all based on machine-readable, but not machine-interpretable, XML schemas. This makes the standards inefficient for automated content access, sharing, and reuse.

Interlinking entities of LOD datasets related to the content can provide the machine-readable description of the location, plot, staff members, depicted persons, camcorder, video mode, video and audio bitrate, and so on. The proposed video ontology of Dr. Sikos, VidOnt, can be deployed in the form of lightweight semantic video annotations in the web site markup, or as rich semantics through LOD interlinking, suitable for video indexing, storage, web search, faceted search, and automated knowledge discovery. To minimize the tag variation, polysemy, misspelling, ambiguity, subjectivity, and noise issues of manual video tagging, Dr. Sikos proposed to combine LOD-enriched video representations with natural language processing algorithms, which has the promise of an even wider range of web-based, cloud-based, and Big Data applications in online video publishing. My research also aims to provide mathematical grounding for multimedia reasoning applications, as well as prototypes of semantic video annotation tools that leverage Semantic Web standards.