By Karsten Schragmann, Head of Product Management, Vidispine, an Arvato Systems Brand
Metadata is becoming more important than ever in media production and distribution as it is metadata that drives the ROI on our media assets. The reason is simple. The more knowledge we have about an asset, the greater value we can get from it. A really simple example would be basic descriptive metadata such as title and description. This enables us to find the assets we own (e.g., in archive) and use and reuse them.
Having structured metadata then enables us to automate processes. Again, a really simple example would be knowing the codec and resolution of a file; based on this metadata we can make run-time decisions in automated workflows about how to handle that asset. As a more advanced example, if the metadata identifies “highlights” in an asset, we can automate the creation of an EDL to send to the editor, or even automate the whole highlights creation process.
So, metadata enables us to drive down the cost of producing and distributing assets, while at the same time increasing the attainable value of those same assets. But modern television production processes not only have requirements for well-defined metadata models. Due to increasingly data-driven media production, the opportunity to implement content-based automated metadata generation is also increasingly attractive.
The role of standards
While there’s still a huge amount of variability between departments, facilities, and geographies, the metadata landscape today is a million miles from the “Wild West” of 15-20 years ago. Much of the standardization in the field though has come about indirectly through other developments and standardization efforts in other parts of the media lifecycle. For example, the consolidation in file formats used in media workflows — primarily now to MXF, and further constrained to the application specifications defined by the DPP/AMWA or the ARD-ZDF MXF profiles — has resulted in a standard taxonomy for structural metadata. Similarly, as the need to transmit certain metadata, such as captioning, alongside video and audio grew, there have been further efforts to standardize this and other ancillary data.
With the explosion in AI services and a broad range of vendors entering the space, we face a similar although much less constrained scenario today. Work is underway. A joint ETC-SMPTE task force was set up in the second half of 2020 to look at areas where collaboration and standardization may be beneficial with regards to AI and media. Undoubtedly metadata created from those processes will be one of the areas identified by the task force when they produce their engineering report. However, at that point, we will still be a long way from any standards or recommendations being produced, so it’s possible, or perhaps even likely, that vendor consolidation and market pressure will be the main drivers toward a de facto standard.
In the meantime, systems that provide a single point of entry, and unified results from multiple AI services will provide a bridge enabling users to create best-of-breed solutions.
There are many advantages to AI-based metadata generation as it allows the machine to find information inside the video and audio frame itself, in the same way that a human operator can interpret the same content. This, of course, opens up important new possibilities depending on what type of workflow you are managing. A channel distributor can use AI-based metadata generation to automatically find new types of information in a huge amount of media content that could not be processed manually before – and thus use or present those insights to the viewer as a program, highlights, suggested shows, or even as autogenerated trailers. AI-based services can carry this new information as metadata and give your MAM system new and much more granular methods of managing your media files. This is very important in the process of optimizing the performance and capabilities of your evolving media supply chain.
There remain challenges, however. One of the biggest challenges introduced with AI-derived metadata is around “quality” – or more precisely around confidence levels. As AI-based analyzers are becoming commonplace in media workflows, we’ve moved from a position where we had a relatively small amount of metadata that we trusted, to a situation where we have huge amounts of metadata but with varying degrees of confidence about the accuracy of that data. Confidence, and confidence thresholds, now play an important role in our workflows – potentially with different thresholds existing in different workflows or different parts of the organization. Managing these confidence thresholds is key to the usefulness of the metadata.
A second challenge with AI-based metadata creation is the sheer volume of data it produces. While it is tempting to want to have complete information at all times, it is important to remember there is only value in understanding and/or documenting a detail if it first adds value to the content or saves cost in the production process. Not everything is relevant and some form of triage needs to take place to ensure we are only processing useful data. We also need to be able to access that data.
Implications for MAM
So, where does this leave MAM? As a result, there needs to be a unified service approach in place within the MAM architecture. There are a growing number of cognitive service providers and a MAM system needs to be able to not only make room for additional layers of temporal metadata but also conform cognitive metadata from many different providers into a common structure. This is important, because there will be different trained models for different purposes, and you want to use and combine cognitive services from different providers to improve your media supply chain’s capabilities and performance.
In VidiNet, by way of an example, we have defined a standard structure for the cognitive metadata from different providers. Because of this, customers don’t have to care about how to model and integrate the different metadata results coming back from different providers.
Customers in the future will recognize the importance of service-agnostic basic metadata extraction that offers various types of cognitive recognition models while being able to unify all this metadata into a single MAM system and media supply chain to drive business intelligence. Especially in the field of computer vision, there should be a simple way of training your own current and regional concepts with just a few examples of training data integrated in the MAM directly.
And, once all necessary time accurate information is available, we can build value-added services on top that span a wide range from content intelligence, such as search and monetize content; content recommendation due to genealogy pattern; real-time assistance systems based on rights ownership; recommendations while cutting for target program slots based on rating predictions; content compliance to automatically highlight cuts of content; domain-specific archive tagging packages; similarity search in respect to owned licenses; and much, much more.
The result is that customers will start to find their own applications and use cases for cognitive services. There are already growing numbers of cognitive service providers out there at the same time with unique or overlapping functionality. How these use cases will evolve over the coming years will be fascinating to watch.