Menu  Audio Description Project


Skip Navigation

The Audio Description Project

Video Description Research and Development Center

The Smith-Kettlewell Video Description Research and Development Center (VDRDC) investigates innovative technologies and techniques for making online video more accessible to blind and visually-impaired students and consumers. Through collaboration with a broad array of partners and stakeholders in the Description Leadership Network (DLN), we are developing advanced video annotation methods for use in a wide variety of educational settings, as well as helping educators and other description providers make better use of the tools already available.

The VDRDC is led by Josh Miele, PhD. He is also the Associate Director at The Smith-Kettlewell Eye Research Institute (SKI). At SKI his work focuses on the research and development of innovative approaches to information accessibility for blind and visually-impaired consumers. Dr. Miele's areas of active research involve the integration of inexpensive software adaptations, existing mainstream technologies, mobile platforms, and the Internet. Current projects include the development of a web-based tool for the automated production of tactile street maps (TMAP); investigation of the use of an off-the-shelf smartpen as a platform for next-generation audio/tactile graphics; a psychophysical investigation of haptically-integrated sonification techniques; and the development of a virtual, wireless braille keyboard (the WearaBraille) for use with smartphones and mobile platforms.

The Description Leadership Network (DLN) is a coalition of world-class organizations involved with the practicality, policy, and technology of blindness and video accessibility. Each DLN member organization contributes unique resources and perspectives to the partnership. The VDRDC relies upon the DLN members for assistance and advice in its R&D efforts, as well as its extensive outreach and dissemination activities. Its members are:

The research and development component of the VDRDC consists of identifying and prototyping promising new technologies and techniques for providing blind and visually-impaired students with improved descriptive access to educational video materials that are delivered via the Internet and hand-held, mobile networked devices. It should come as no surprise that the very technologies we use to bring about this improved video accessibility are themselves made possible by the Internet and modern wireless technologies.

All software developed by the VDRDC will eventually be made available as free and open-source software (FOSS). The above links describe the software projects currently under development. Given the rapidly changing landscape of Internet technologies and mobile platforms, as well as the highly interactive stakeholder feedback mechanisms of the VDRDC, it is likely that additional promising approaches may present themselves. Please check this page and visit the VDRDC blog for the latest updates on our current projects and progress. Current VDRDC research projects include:

Algorithmic Automated Description (AAD) - Automated algorithmic description (AAD) uses existing machine-vision techniques to automate specific aspects of description such as camera motion, scene changes, face identification, and the reading of printed text. Such events could be identified by computer routines that automatically add annotations to the video. This would allow such things as the automated announcement of scene changes, or the use of text-to-speech for the reading of on-screen text.

Preliminary work in this project involves identifying the types of information most easily extracted from video using these techniques, as well as understanding from focus groups and user feedback, how best to present the information. Despite the speed of modern computers, it is likely that the visual processing for AAD would be done in advance, with tagged information stored in a separate descriptive stream. The DVX server is an ideal repository for storage and retrieval of descriptive tags, and will be used for demonstration and evaluation of AAD techniques.

Choreographed and Orchestrated Video Annotation (COVA) - Choreographed and Orchestrated media refers to a relatively new concept involving different aspects of a coordinated media presentation coming from different networked devices. In this case, COVA will allow video annotations such as audio descriptions to be played from a personal device such as a smartphone, while a primary video presentation is being played on a completely separate device such as a projector in a theater.

The initial COVA prototype will be based on a purely audio identification and synchronization approach, made famous by such services as Shazam and SoundHound. The smart phone will use its microphone to sample the sound track of the video being played and perform a rapid matching algorithm that identifies the video from a database. The smartphone app can then automatically synchronize a separate, private description channel through the smartphone.

A technology such as COVA will be especially useful in educational situations such as assemblies or presentations by third-party educators to large student groups. The presenters may not be aware of the availability of, or need for, the description, but COVA will allow individual students or teachers to quickly and seamlessly find the available description and synchronize to it through the student's own smart phone. The descriptive and synchronization information can be stored and distributed through an open-source framework such as DVX.

The research and development associated with COVA focuses primarily on the applicability of existing listening and matching algorithms to educational video streams. Focus group feedback will be used to help shape our idea of the potential uses and benefits of such a technology, as well as some of the potential problems with it. Preliminary algorithms and prototype software are being developed in collaboration with the IDEAL Group, a DLN Partner, and the viability of the approach will be evaluated from the standpoints of feasibility and usability. As with all software developed under the VDRDC, the COVA prototype will be made available as free and open-source software (FOSS) at the conclusion of the project.

Crowd-sourced Description for Web-Based Video (CSD) - The Descriptive Video Exchange project (funded by the National Eye Institute of the National Institutes of Health, (grant # R01 EY020925-01) focuses on crowd-sourced techniques for describing DVD media. CSD will expand DVX to include Internet-based media such as YouTube, iTunes U, and other streamed video found on a wide variety of web sites. Many streamed Internet-based video sources provide well-defined, public APIs for accessing all the information DVX requires. Using these APIs will allow the VDRDC to expand DVX to include streamed content so that seamless, simple, crowd-sourced descriptions can be added to Internet-based video by volunteers or professionals anywhere.

Descriptive Video Exchange (DVX) - Under the support of a three-year grant from the National Eye Institute (NEI grant # R01 EY020925-01), Smith-Kettlewell is currently conducting an exciting new project to investigate the use of crowd-sourcing to provide readily available amateur description for DVD-based video.

The DVX Project is developing software that allows sighted video viewers to seamlessly add audio description to DVDs as they watch. Those descriptions are then automatically shared over the Internet while the video materials remain only on the DVD.

The DVX project is evaluating: 1) the effectiveness of crowd-sourced audio description, 2) the effectiveness of automated digital tools for enhancing the presentation of amateur audio description, and 3) the effectiveness of using social networks and online communities for the recruitment of volunteer audio describers.

For more information about The DVX Project, visit, and see the ADP's own web page on DVX.

Expanded Populations Research Agenda for Description (EPRAD) - Bridge Multimedia is collaborating with the VDRDC and the DLN to develop EPRAD, a research road map to identify the critical questions that will improve our evidence-based understanding of how description may apply to the education of students with non-visual disabilities such as ADHD and autism.

Bridge's work on the Expanded Populations Research Agenda for Description (EPRAD) will identify the concrete research questions necessary to quantitatively assess the value of description for these expanded populations. EPRAD also dovetails with the Visual-Impairment Research Agenda for Description (VIRAD) being developed by NCAM in collaboration with the Video Description Research and Development Center and the Description Leadership Network.

Remote Real-Time Description (RRTD) - Remote real-time description (RRTD) is a simple technique that will allow a describer anywhere in the world to provide real-time description for a video stream being viewed by a visually-impaired student at home, in the classroom, or on the go. In RRTD, a video feed is streamed to the describer who passes the audio (and optionally the video) to the student, along with the added live description. This ensures the synchronization of the original audio/video stream with the descriptive commentary. If the webcast or webinar is interactive, any audio and/or video from the student is then streamed directly back to the original source to allow the student to ask questions or make comments.

A number of existing technologies will be used in investigating RRTD, and initial research will involve identifying the most appropriate. The most promising approaches include voice over IP (VOIP), Internet proxy services, and of course, web-based streaming video. An existing video conferencing technology (Skype) will be used to conduct simple experiments with this approach. The IDEAL Group - a DLN Partner - will collaborate with the VDRDC to implement the technology as a simple app for an off-the-shelf smart phone.

This approach makes the experimental technology extremely inexpensive and easy to deploy to testers and potential users. As with all of the Center's software technologies, the RRTD prototype app will be made available as free and open-source software (FOSS) at the conclusion of the project.

One of the potential strengths of RRTD is that it would only require a willing describer and reliable Internet connectivity for both parties. This technique would make it possible for a parent at home to provide free description services for a student in the classroom. It would also allow a professional describer in a remote location to provide descriptive services for one or more students with visual disabilities.

Visual-Impairment Research Agenda for Description (VIRAD) - The Smith-Kettlewell Video Description Research and Development Center is proud to be working with the WGBH National Center for Accessible Media (NCAM) to develop the Visual-Impairment Research Agenda for Description (VIRAD) - a systematic road map charting gaps in the quantitative evidence about how description can and should be used to improve video accessibility for the blind. While many focus groups, advisory panels, and expert practitioners have contributed to the accumulation of anecdotal evidence to guide the creation and delivery of description, there remains a surprising dearth of evidence-based best practices in this field. VIRAD provides a guide to help scientists fill gaps in quantitative research on how best to develop and deliver video description to blind and visually-impaired students and other consumers.

NCAM is collaborating with the VDRDC to assemble a community of practice (CoP) consisting of the Description Leadership Network and other description stakeholders and researchers. The research agenda will be synthesized from discussions with the Community of Practice in a series of facilitated meetings and focus groups. The result will provide a prioritized list of research questions that need to be answered in order to improve the state of the science of description for the blind and visually-impaired. NCAM will deliver a written report and presentation to be made available in conjunction with the proceedings of the 2013 annual meeting of the Description Leadership Network in San Francisco.

For more information, visit the VDRDC website at: