Academic Technology

Frequently Asked Questions (FAQs) About Video Captioning, Answered

August 14, 2019

The passage of the Americans With Disabilities Act (ADA) in the United States in 1990 was intended to bring about sweeping change for people with physical and cognitive challenges, and in many ways, it did.

Cities, buildings, and businesses invested in structural changes to accommodate people in wheelchairs, for example, and media companies, corporations, and universities began captioning video content more consistently for people with hearing impairments.

Video content, in particular, has changed since the inception of the ADA and the transformation in how we work and learn spurred by the COVID-19 pandemic. With video content comprising 82% of Internet traffic today, according to Streaming Media, it’s more important than ever to ensure that video content is accessible to all.

With more and more educators and businesses using video as a professional tool for sharing instructional and informational content on-demand, video captioning has become a best practice and spurred a lot of questions, from ADA and Section 508 compliance to “unlimited” captioning services.

In this post, we review some of the most frequently asked questions we receive on the topic.

What is the difference between video transcription and captioning?
What is the difference between subtitles and captioning?
What is the difference between open and closed captioning?
Why Is captioning important?
What’s the difference between automatic speech recognition (ASR) captioning and human-edited captions?
How does captioning work in Panopto?
How long does it take to generate captions?
How easy is it to add captions to a video in Panopto?
What are Web Content Accessibility Guidelines (WCAG)?
What captioning quality is required for ADA, Section 508 compliance?
Does Panopto support best practices and ADA and Section 508 compliance for video captioning?
Can you change the style and position of captions in Panopto?
How does pricing for captioning work?
Are unlimited captioning services truly unlimited?

14 Frequently Asked Questions About Video Captioning

Q: What is the difference between video transcription and captioning?

Video transcription is the process of producing a text document from the words spoken in a video. Transcribed text does not have a time value associated with it. In terms of accessibility, transcription works well for audio-only media, but falls short when it comes to audio with moving content on a screen, such as voice-over-PowerPoint slides or video.

Video captioning converts the audio content within a video into text, then synchronizes the transcribed text to the video. When the recording is played, that text will be displayed in segments that are timed to align with specific words as they are spoken. Captioning is required to make video content accessible to viewers who are deaf or hard of hearing. Captions are also considered good universal design for learning as they benefit a wide variety of people in different situations.

Q: What is the difference between subtitles and captioning?

Captions show the words spoken in a video in the same language, while subtitles show the translation of words spoken in a different language. The words shown on the screen in a foreign film in another language, for example, are considered subtitles.

Q: What is the difference between open and closed captioning?

There are a few differences between open captioning and closed captioning in videos. Most notably, open captions are always on and in view, whereas closed captions can be turned off by the viewer. Open captions are part of the video itself, and closed captions are delivered by the video player or television (via a decoder). And unlike closed captions, open captions may lose quality when a video is encoded and compressed.

Q: Why Is captioning important?

In addition to making video content more accessible to viewers with impaired hearing, captioning can actually improve the effectiveness of video:

Captions improve comprehension by native and foreign language speakers. An Ofcom study showed 80% of people who use video captions don’t even have a hearing disability.
Captions help compensate for poor audio quality or background noise within a video.
Captions make video useful when a person is watching with the sound off or viewing in a noisy environment that obscures the sound.
Captions provide viewers with one way to search inside of videos.

Q: What’s the difference between automatic speech recognition (ASR) captioning and human-generated captions?

There are two important differences between ASR captioning (also referred to as machine-generated captions) and human-generated captions: the quality and the time required to generate captions.

Machine-generated captioning produces captions very quickly. Typically, captions can be created in about one-quarter of the total video length. For example, an hour-long video could be captioned using ASR in approximately 15 minutes.

ASR captions are typically 90-95% accurate depending on the audio quality in the recording. As a result, machine-generated captions are primarily intended to enable inside-video search, and by default, they aren’t added to the video as closed captions. Instead, the text is stored in the video platform’s database for use with the video search engine.

Of course, ASR also provides a starting point from which people can manually create 100% accurate captions. In video platforms like Panopto, text generated by ASR can be added to the video as closed captions, which people can then edit.

Human-generated captions take substantially longer to produce but provide results that are at least 99% accurate. In some cases, human-generated captions can be turned around in 24 hours, but typically, you can expect a 2-5 day turnaround.

If you are required to meet ADA and Section 508 requirements for accessibility, you will need to use human-generated captions to guarantee your captions meet the minimum quality standards.

Q: How does captioning work in Panopto?

You can caption videos in Panopto a couple different ways.

Automatic Speech Recognition (ASR)

Every video uploaded into Panopto (whether it was created with Panopto or not) is machine-transcribed using ASR technology. This provides the backbone for our video search technology, which means you can search inside the spoken content of every video you upload into Panopto.

Human-generated captioning

Panopto also gives you a few options for human-generated captioning, which meet ADA and Section 508 compliance guidelines for accuracy.

Option 1: Request 508-compliant video captioning right inside Panopto

For individual videos, you can request human-generated captions from within our online video editor. After your Panopto administrator sets up the captioning integration with us or one of our captioning partners, captioning can be requested with just a few clicks. Simply choose the captioning turn-around time and, when the human-generated captions are ready, they will automatically appear inside your video.

Human-generated captions can also be requested for all videos within a particular Panopto folder. Once configured, any new videos added to a folder will be automatically captioned. Similar to captioning requests for individual videos, your administrator can select and set a specific turnaround time for captioning the videos added to that folder.

Option 2: Manually upload human-generated captions to your video in the editor

You may have a different third-party captioning partner that you prefer, or one with which you already have a contract. Or you may even have people on staff to manually edit machine-generated captions for accuracy. In either case, you can easily upload the human-generated caption file through our online video editor. Panopto supports the SRT, ASHX, and DXFP captioning formats.

Q: How long does it take to generate captions?

Machine-generated captions can be generated in a quarter of the time it takes to play an individual video. If your organization chooses to process videos in batches overnight to conserve network bandwidth during peak hours, your captions may not be ready until the following day.

Human-generated captions are typically generated within two to five days, depending on the requested turnaround time and service options. Some human-generated captioning services can expedite 508-compliant captioning, turning it around in just 24 hours.

Q: How easy is it to add captions to a video in Panopto?

Generating or importing captions in Panopto can be done with just a few clicks of your mouse.

Adding machine-generated captions:

You can add machine-generated captions within seconds in our online video editor. Simply select “Captions” from the menu on the left side and choose “Import automatic captions” from the drop-down as shown here. Once the ASR-generated captions have populated, you can either edit them to ensure they are 100% accurate or simply click “Publish”.

Editing automatically generated captions in Panopto:

Adding human-generated captions from one of our partners:

You can request human-generated captioning from us or one of our captioning partners just as easily. Depending on turn-around time, you can caption your videos through Panopto for as little as $1/minute. Faster turn-around time will cost slightly more but we pass our discounts along to you, ensuring you get affordable video captioning rates. All you need to do is choose the turnaround time from the drop-down next to “Service level,” shown below:

And with Panopto it’s easy to automate video captioning at scale — your administrator simply enables captioning for a specific folder or for your entire video library.

Uploading human-generated captions from another source:

If you are using another third-party captioning service or editing your captions in-house, you can upload a human-edited caption file just as easily as you can request captioning. In this scenario, you click “Captions” in the left-hand menu, choose the caption file you need to upload from your computer and select “Upload Captions.”

Q: What are Web Content Accessibility Guidelines (WCAG) and how do they relate to captioning?

The Web Content Accessibility Guidelines, also known as the WCAG standard, is the most detailed and widely adopted guide for creating accessible web content. While WCAG does not yet dictate legal requirements for making online video accessible in the United States, it is often followed by educational institutions and businesses as a best practice, and has been referenced by laws in several European countries.

WCAG 2.0 generally asks that online content meet four principles that improve accessibility for people with disabilities and also adhere to a certain level of compliance. Both are summarized below:

WCAG Design Principles:

Perceivable: All relevant information in your content must be presented in ways the user can perceive.
Operable: Users must be able to operate interface components and navigation successfully.
Understandable: Users must be able to understand both the information in your content and how to operate the user interface.
Robust: Content must be robust enough that it can be interpreted by users, including those using assistive technologies (such as screen readers).

WCAG Compliance Levels for Online Video:

Level A: Captions are provided for all prerecorded audio content in synchronized media, except when the media is a media alternative for text and is clearly labeled as such.
Level AA: In addition to Level A compliance, captions are provided for all live audio content in synchronized media.
Level AAA: In addition to Levels A and AA compliance, sign language interpretation is provided for all prerecorded audio content in synchronized media.

To learn more about WCAG 2.0 guidelines visit W3C.org.

Q: What captioning quality is required for ADA, Section 508 compliance?

Section 508 of the Federal Rehabilitation Act (enacted in 1973) is an amendment that broadens the original act’s application to include online video content. While this act generally only applies to federal agencies, many states have passed laws that also make Section 508 applicable to federally funded organizations, such as colleges, research facilities, and arts institutions.

Quality standards for television captioning set the precedent for online video captioning that aims to improve accessibility. The quality standards for captioning online video include the following:

Accuracy: Captions must be 99% accurate when relaying the speaker’s exact words, including correct spelling, punctuation, and grammar, with no paraphrasing.
Synchronized: Captions must be time synchronized so they align with the words spoken in the video, and must remain visible long enough for the viewer to read (3 to 7 seconds per caption frame).
Completeness: Videos must include captions from beginning to end.
Styling and Placement: Font and size should be easy to read and the placement on the screen should no block important content.

Q: Does Panopto support best practices and ADA and Section 508 compliance for video captioning?

Yes. Panopto includes features that support the production of ADA and Section 508-compliant video captions, including the ability to add human-generated captions from any of our captioning partners right within the online editor, as well as the option to upload human-generated captions acquired from both external captioning services and ASR captions that were edited in-house.

Panopto includes support for other accessibility features too, including screen reader support and keyboard navigation.

Q: Can you change the style and position of captions in Panopto?

Yes. In addition to making it easier for video creators to add captions that meet the quality standards for compliance under Section 508 of the Federal Rehabilitation Act, Panopto’s video player enables viewers to change the styling and placement of captioning to suit their individual needs. You can learn more about our configurable caption styling on our support site.

Q: How does pricing for captioning work?

For video platforms that support machine-generated captions, that feature is generally included at no additional cost.

Pricing for human-generated, ADA and Section 508-compliant captions varies quite a bit depending on the scale (there are often discounts for high-volume contracts) and turn-around time (24 hours, 2 days, 5 days, etc.). You can request human-edited captioning through Panopto’s video platform for as little as $1/minute with just a few clicks.

Q: Are unlimited captioning services truly unlimited?

Unlimited machine-generated captioning is generally provided by video platforms that support ASR.

However, you should beware of video platforms that claim to provide unlimited Section 508-compliant, human-generated captioning. There is always a cost to human-generated captioning services, so if you’re offered free or unlimited human-generated captioning:

Ask to see the actual captioning costs that are being baked into your contract elsewhere. Some video platforms claim to support unlimited human-generated captioning, but in fact simply offset the captioning costs by increasing the price you pay for video hosting, storage, or streaming.
Be sure to ask about usage caps. Some video platforms that claim to offer unlimited human-generated captioning restrict this captioning to videos shorter than 30 minutes. For universities recording hour-long lectures or businesses capturing 45-minute employee town hall events, this approach is infeasible.

Record your screen with Panopto Express, a free online screen recorder.

Share instantly through YouTube, Google Classroom, or anyway you prefer.
No free trial limitations. No downloads, plug-ins, user accounts, or credit card required.

Start Recording Now