The Process of Adding Audio Description to a Video

by Fred Brack, Webmaster

Introduction

This is an overview of the process of how an audio description (AD) track gets created and delivered to the end user. Adding an AD track to a movie, DVD, TV show, or streaming video requires the cooperation of multiple entities and professional talent:

The Content Creator
Audio Description Writer
Audio Description Narrator
Audio Engineer
Delivery Vendor
Summary and Notes

The process starts with the studio responsible for creation of the video (e.g., Sony, Amazon Studios, or a TV network like CBS) and ends with the TV network or streaming service which delivers the video to the end user. This article covers how these five elements must work together.

The Content Creator

There must be a commitment to provide audio description (AD) by the contracting studio and the content creator responsible for creation of the video work. For example, movie studios like Sony or Warner Bros must be willing to contract the necessary talent to create and deliver an audio description track for their video and then include the track in their media. In reality, we must note, the commitment may not be there originally, and the commitment to provide AD then falls on the delivery vendor (TV studio or streaming service). For example, Netflix has been known to provide AD tracks for TV series they have picked up with no AD originally included.

It is important to note that a single video may be distributed in multiple formats: cinema movie, DVD, broadcast TV, and streaming video. To offer an audio description track for each of these delivery formats, the AD track must be reformatted. In theory this should be easy accomplished, because an audio description track is simply another language track. Most videos produced today offer at least one alternative audio track to English; so the mechanism to provide the track is already in place. Historically, however, the original AD track was often not included on the DVD, something which still occasionally occurs today. Sometimes in these cases, a studio is able to resurrect the AD track for a streaming offering, so you can find some old movies available with AD only on streaming services, but not on DVD. In instances where no AD track was originally produced, some studios (notably Paramount) will create a new AD track and offer this track to streaming services.

Part of the role of the studio or content creator is specification of the final rendering format of the AD audio file to be delivered; e.g., Dolby Atmos, 5.1 surround, stereo, or mono mix. Historically, TV studios have often chosen mono; DVDs and many streaming services have chosen stereo; while the sophisticated user would prefer 5.1 or Dolby Atmos! Kudos to Netflix and Apple TV+ in their decision to provide the highest quality audio experience.

Having made the AD commitment, the studio or content creator must contract for the production of the AD track. Over time, relationships are built with several providers of description, often favoring one over the other (which typically happens with television networks, for example). For streaming services producing their own original videos (like Netflix, Prime Video, and Hulu), several vendors will be contracted due to the volume of work. Done right, the studio/content creator will set standards and guidelines and will have a quality control (QC) process in place to evaluate the quality of a vendor's work. Sometimes this may involve AD script approval prior to voicing, while in other cases it may involve a review of the final product for either constructive feedback for future reference and guidance and/or possible rerecording of certain parts (known as pickups).

I like to tell the story of watching a streaming service's original video and finding that the AD script writer misinterpreted something that happened in one of the most important scenes in the video, then wrote a completely incorrect sentence describing it. When I reported this to the service, they went back and rerecorded that one sentence. This is the sort of problem which could be caught in a QC review.

We offer a list of AD Service Providers on this website which includes both writing and voicing talent.

Audio Description Writer

Now we come to the most important role of writing the actual description. Although the other jobs listed here may be handled by a person who is blind, this role requires sight because it is the job where a person decides what VISUAL aspects of a video must be conveyed to a person with visual impairment to best understand the action and context. (A person who is blind may, however, act as an Engineer or Quality Control Consultant for this role.) On average, it takes 30-60 minutes to produce an initial draft of an AD script for every five minutes of program material.

This job requires a person who is very observant, skilled with language, and can determine what key visual elements must be translated into words for best understanding of the work. Because a cardinal rule of description is "say what you see" (not what you think or interpret), language skills are very important for choosing the right words to be as concise yet definitive as possible. Cultural appreciation is also important. As one AD writer commented, "it's really just an individual writer's ability/willingness to recognize, research, and convey with sensitivity any practices/traditions that originate outside their own culture."

It is also important not to "step on lines" or talk over key sounds in the video. You never have time to describe everything, so you must pick and choose. When there is room to speak a few extra words, adding additional information ("she darts off to the right, her long blond hair streaming behind her") gives the listener more information about a character or scenery and a fuller appreciation of the work. On the other hand, just because there is room to speak and give more information, doesn't mean you should! Dramatic pauses, the passage of time without voice, the music score, and subtle sounds in the background should be preserved whenever possible. As one writer recently remarked, if you say a person gets into a car and you hear the car motor start, you don't need to say that the person starts the car! An AD consultant once wrote, "Describe when necessary; don't necessarily describe."

The final document delivered by the writer includes timestamp information about when a line should be spoken and any special nuance or information the ultimate voicer of the script should know. Example:

00:15:59
[BRISKLY]
("...I don't know where I'm going!")
He jerks the steering wheel to the right.

In this example, we have a timestamp from the video, a special instruction to the voicer ("BRISKLY" - in this case because there is very little time to speak), the cue line (in parentheses), and the line of description to speak. It is also helpful for the AD writer to research and indicate pronunciations by transliterating certain words or phrases, especially material in a language other than English.

Audio Description Narrator

While occasionally the Audio Description Writer and Narrator are the same person (in which case we typically call them the Audio Describer), usually they are different. Upon final approval, the script typically goes to a professional voice-over artist (also known as a voice actor) who we have of late been calling an Audio Description Narrator (though Voicer is an alternative). This person generally goes to a recording studio where he or she gets a copy of the audio description script and sits in front of a monitor which shows the video, including the timestamps. A professional AD Narrator usually reads the script cold, never having previewed it or the video.

Often there is an audio engineer outside the studio controlling the audio/video playback and recording the narrator. There may also be a producer or director in charge of the overall process (and sometimes the Audio Description Writer might act in this role). A producer serves as an extra set of ears, making sure that the narrator voices exactly what is written in the script and nuances it in a manner appropriate to the current tone of the video. Even when voice actors are working from their own studio space (more frequent now due to COVID-19), an engineer and/or producer can be present via a "directed session" through Zoom or other professional audio tools. Note that sometimes neither a producer nor audio engineer is present, and narrators act as audio engineers recording themselves as well as "self-directing", making decisions that a producer might make.

Usually only one narrator is needed; but when prolific subtitles for foreign languages are involved (or in other rare circumstances), several voice actors may record separate parts that will be brought together in the final editing process. In some cases, the narrator simply reads the lines which will be merged one by one into the final production by the Audio Engineer, without regard for matching the timing to that of the video.

The AD Narrator must take into account the intent of the content creator when determining how to deliver his or her lines. Today's standards in the USA call for "conversational" delivery, but with nuanced emotion so that the audience is immersed in the scene and story. This requires special training and attention: you don't want the listener to be distracted by the delivery of the AD. The delivery should take into account the tone of the action at the current moment - more somber for a funeral and more animated for a car chase! Choosing the right AD Narrator is important. Sometimes a female voice will be best, while other times a male voice; and diversity should always be a priority. This role may also be filled by a person who is blind. Keep in mind that the quality of the voicing professional is very important! Viewers with visual impairment truly care about this. A top quality narrator does not overemphasize words and keeps the vocal delivery well in tune with the action. There is a growing interest in using synthesized voices to handle this role, but there is concern by many people that a human understanding of nuance is necessary to properly control vocal inflection. (Yella Umbrella Ltd. is one such company that offers a tool to create synthesized AD.)

Audio Engineer

The Audio Engineer has multiple roles, which could, of course, be split between two or more people.

Recording the session, as mentioned above.
Editing the recorded audio description track. This can include removing any background noises, mouth clicks, or breathing noises in the voice track, or even "compressing" a block of audio so that it fits in between two lines of dialog. Occasionally a skilled engineer will even need to correct the mispronunciations of certain words by using "extreme editing" techniques.
Placing the audio description at the correct time stamps.
Mixing the audio description with the original soundtrack.

The engineer must ensure that the voiced audio description lines fit in-between the spoken dialog or critical sound elements of the original sound track. He or she must also "duck" the audio of the original video soundtrack if it overpowers the added AD. The objective is to make sure the description sounds like it was part of the audio all along. This process must take into account that the studio or content creator might deliver "stems," the individual soundtrack components comprising voice, music, and sound effects. These get mixed into the final description track. Sometimes description is not mixed with the original audio but is sent as a stand-alone track to a studio. In these cases, it might be heard in an app or at a movie theater, where the volume of the description will be controlled by the person listening. Like the narrator role, the person fulfilling this job may be blind.

Delivery Vendor

Once the audio engineer has delivered the final audio description track, it is up to the studio or content creator to do a final quality control check and make the AD track available to the end user. In the case of a cinema first-run movie, this means inclusion in the Digital Cinema Package (DCP). For a DVD or Blu-ray disc, it means making space on the disc for the additional language track and including it in the Audio Language menu. For a TV network, it means passing the AD track to their local affiliates via the Second Audio Program (SAP) audio channel, and the local affiliates in turn deliver this signal over their tower antenna directly to homes and to any connected cable or satellite services. For a streaming service, it means two things:

Acquiring the audio description track from the studio licensing agency. This unfortunately has been a problem, albeit one that is slowly improving. Sometimes a TV network or streaming service will forget to request the AD track from the licensing agency, or the agency will fail to include it. The goal, of course, is to always ask for and obtain an AD track when one was originally recorded. And if the streaming service is the studio (like a Netflix, Hulu, or Amazon Original), then they need to commission the audio description in the first place.
Make the AD track available to the user via your hardware or software interface to your delivery mechanism. This can mean a menu in a TV remote or a cable service box's remote, a menu on a media interface (like Amazon's Fire Stick or an Apple TV), or within the product itself (such as accessing a movie via a streaming service like Netflix using a web browser). This includes mobile device interfaces too, such as iPhones and Android devices.

For an end-user to receive audio description, the TV network or streaming service must send the AD track, and the user's hardware and/or software interface must be able to receive the AD track and provide easy access to it.

Summary

There are lots of parts to getting audio description added to media. The good news is that like many things in life, after doing it once, it becomes a lot easier and smoother over time. We congratulate the studios and streaming services which offer audio description on all their new offerings and acquire or commission as much audio description as they can for existing video offerings (movies and TV series). We look forward to the day when audio description is as common as closed captions.

* * * * *

Notes

If you would like to hear a detailed discussion of many of these same points as assembled by another person, I would direct you to the 2020 Keynote Speech at the ACB Convention by AD Narrator Roy Samuelson for his discussion of many aspects of audio description, leading up to a vision of AD quality he called Kevin's Process (now called Kevin's Way).
Additional detailed information about equipment required for the tasks above may be found in ADP Director Dr. Joel Snyder's book, The Visual Made Verbal: A Comprehensive Training Manual and Guide to the History and Applications of Audio Description in Appendix II, "Equipment Needs, Specifications, and Suppliers." The book is available in print in multiple languages, for Kindle, on Bookshare, and in braille and as an audio book from National Library Service for the Blind and Print Disabled cooperating libraries.
You may also wish to consult the ADP's Technology for Audio Description page regarding a subset of the equipment.

If you have comments, corrections, or additions for this article, please send them to the author via the Webmaster link below.

This article was originally published on October 6, 2020, last updated on July 30, 2022.