Smarter Video Search: ASR, OCR, and Transcription — What’s the Difference?

Search engines today have made a science of indexing text. Modern spiders find and record every last written word — and return results so efficiently that some efficiency experts are recommending people give up their email filing systems and  web browser favorites bars and simply rely on search to turn up what they need.

But for most organizations, that depth of search capability is reserved for text alone. Video in particular remains a black box — limited to manually-entered metadata like titles and tags.

It’s a problem that needs to be solved.

According to a study by McKinsey and IDC, the average knowledge worker now spends nearly 20% of their time — nearly one whole day, every week — just searching for the information they need to do their job effectively. As businesses share more and more using video, that wasted time will only worsen without a video search solution in place.

That’s why today, more and more video platforms are expanding their video search capabilities. Yet as the field of solutions expands, it’s becoming more difficult for organizations to navigate. Why? Because not all video search is created equal.

Forrester Research recently commended Panopto as having “the best support for video search”. It’s easy to see why — no one goes deeper or broader than Panopto when it comes to video search, as shown on the following chart.

Panopto Inside Video Search - Search Capabilities Chart 2015

If a video is worth recording and storing, it’s worth finding. You want video search capabilities that can rise to that task. Modern video platforms are now finding creative ways to index the content inside videos, finding new ways to capture metadata, audio inputs, and visual content.

So what capabilities should a business look for in video search?

Fundamentally, if a video search tool is going to index your videos, it should be able to find and return all the words spoken and shown on-screen.

While there are a number of technical strategies to get at this information, they tend to fall into two groups — automated or manual.

Automated video indexing relies on one or more technologies to capture and discern what’s happening in your video. These tools can often be applied for a video the very instant recording is completed, expediting the process of indexing the content.

Common automated video indexing systems include automatic speech recognition (ASR), optical character recognition (OCR), and slide content ingestion. These three systems do very different things, so let’s look into each a little more closely.

  • Automatic Speech Recognition (ASR) is a technology used to identify each word that is spoken in a recording. Once identified, the words are time stamped and added to a search index. Users can then search for spoken words, find the precise moment in the video when they were mentioned, and fast-forward to that point in the video. Since many viewers will be searching for a moment based on an idea or phrase they remember, ASR is an incredibly helpful part of your video search engine.
  • Optical Character Recognition (OCR) is a technology used to recognize text shown on-screen within videos. Often in today’s modern presentations, a speaker will switch from between slides, live on-screen content, and even other videos. Without OCR, any text shown as part of those presentations cannot be indexed because search engines like Google cannot recognize text that’s saved as an image. OCR technology however, is designed to identify and decipher those words, allowing your viewers to search for literally any word that appears on-screen anywhere in a video.
  • Slide Content Ingestion refers to the technology that imports and indexes your actual PowerPoint or Keynote presentation slides when used in your video. Content ingestion differs from OCR in that it programmatically extracts the actual text strings from your slides, rather than taking a picture of the slide and attempting to identify words. Slide ingestion also extracts additional information that isn’t shown on-screen, such as speaker’s notes, so that your team can always find precise moments in video based on any word contained on any slide.

Manual video indexing, on the other hand, relies on human intervention that takes place after a video is completed in order to help index video content.

The usefulness of manual indexing processes varies based on the amount of information they can add. Some processes are quite comprehensive, others, much more limited. Let’s take a look at the two most common manual inputs:

  • Manual Metadata refers the information added on to a video file such as title, author, and a description. Viewer notes and comments may also be added here. These are a fundamental part of video search, but for business videos — which often last 30-60 minutes or more and cover a range of topics — manual metadata almost never provides enough description to be useful by themselves.
  • Transcripts are a more comprehensive approach, done by simply appending an actual video transcript to your video files for indexing. Transcript production is an evolving field — while many services still produce these files manually, the process can be automated. However you develop it, the quality of your input is essential — complete transcripts will be more valuable than partial transcripts, and those transcripts that also include notes about the content shown on-screen will be more valuable than those that only recite the dialog.

So Which Should I Prefer — Automatic or Manual Indexing?

If you need to make the choice, consider your needs. Automatic systems that rely on technology offer faster results and can often be applied to every video, but the accuracy isn’t 100% with ASR and OCR. Manual, human-based approaches such as transcription typically offer improved accuracy, but take longer to produce and often come at an added cost.

Fortunately, you don’t have to choose.

Panopto’s Smart Search video search technology is the industry’s most comprehensive inside-video search engine. With Panopto, you can search through your video library the same way you’d search across the internet, or through your email.

  • By any keyword spoken in your videos, with ASR
  • By any word that ever appears on-screen or anywhere else in your video, with OCR and Slide Content Ingestion,
  • By traditional and advanced metadata, including tags and titles, viewer notes and comments,
  • And optionally by complete manual transcriptions of your video content.

Try it out for yourself!

Ready to see what your video search has been missing? Contact our team today to try Panopto free.

Published: April 19, 2016

Recommended Articles