What is metadata?
- Richard Kirk
- Sep 29, 2024
- 4 min read
Data about data
The first time I heard the term metadata was when I was a a software engineer designing relational database schemas. I had to design the best way to store both small and large amounts of data so that it could be accessed, updated and deleted efficiently by users and system processes. In this context, metadata was a part of the database that held data about the structure of the database. Primarily this was a list of all of the tables and indexes in the database. Those lists are data, about the actual data in the database that help the database management system navigate the structure of the tables and indexes. It is not actual data ie list of people, transactions, TV shows - or whatever data the database is actually there to store and process.
Types of metadata
Media metadata is similar but slightly different. Metadata is a catchall term for pretty much anything that describes a piece of content (TV show, movie, short etc). This ranges from technical metadata about the format of the video file itself, through to images and video that provide and understanding of what the video is about and allows for decisions to be made based on that metadata. Examples of the different types of content metadata are (this is not an exhaustive list):
Video - the digital encoding format, bit rate, frame accurate duration and other metadata that describes the video file.
Audio - Similar to video, metadata about the audio format and spoken language(s)
Subtitles/Captions - Captions used to be created for accessibility purposes but are now also useful for watching content with the sound off.
Rights - Where the content is licensed to be shown and for what time period, inclduing how many times it can be accessed etc.
Entitlement - Which content is available to which subscribers based on the package they've signed up to
Availability - Which TV and streaming platforms have which content available
Descriptive - Human readable information such as title and description
Cast & Crew - who appears in the content or helped make it
Ratings & reviews - What past viewers of the content thought about it
Age rating and advisories - guidance on which age groups should and shouldn't watch this content and specific warnings around harmful content
Categorisations - The content genre, format, intended audience etc.
Structural - How the series, seasons, episodes, parts fit together
Identifiers - The unique numbers that different industry bodies or individual organisation assign to content
Images - Images that represent the content including stills from the content
Contextual - Descriptors and tags that provide a deeper description of the content such as moods, themes and topics
Time-coded - Metadata about the specific time within the video when things happen. Scene changes, a character appearing and captions are examples
Marketing - Trailers and promos that promote the content
Viewership - The data about how many viewers watched a piece of content and for how long
Humans vs machines
This metadata has at least two main audiences - humans and machines. If you've ever watched TV, streamed content or read a TV guide, you'll have used some of the metadata mentioned previously. Let's take a randomly chosen series from Netflix:

Series - Structural metadata
The Gentlemen - This is the title, which forms part of the descriptive and structural metadata
2024 - Descriptive metadata
8 episodes - structural metadata
HD - Video metadata
AD - Audio described - Accessibility metadata
Captions symbol - Accessibility/Captions metadata
15 - Age ratings and advisories
Language, violence, injury detail, drug misuse - Age ratings and advisories
In this fast-paced crime series... - Descriptive metadata
Cast - Cast & Crew metadata
Genres - Categorisations metadata
This programme is - Contextual metadata
Backdrop image - Images metadata
All of this metadata helps you understand what you're looking at and help you decide if you want to watch the series or not. If you're part-way through watching the series then the title and image are the instantly recognisable elements of metadata that confirm this is the programme your think it is. In fact, other than some player controls, almost everything on this page of the UI is metadata about the content itself.
Not though how much metadata is not visible. There are no rights, entitlements, availability, identifiers and other type of metadata. This is because a lot of the content metadata is processed elsewhere by systems and algorithms aka machines. A content recommendation algorithm for example will use some of the metadata shown in the UI but also use many other types of metadata to figure out what to recommend to you to watch, based on your previous viewing habits, other people like you's viewing habits, what's most popular and many other factors.
So we now know what content metadata is and some examples of where it's used it starts to become clear just how dependent the whole content ecosystem is on metadata. This results in a lot of metadata being required and the quality and accuracy being high, otherwise costs, functionality and the user experience are adversely affected.
Comments