EchoVideo has partnered with several transcription service companies to provide transcription services for audio and video media in EchoVideo. ASR stands for "automatic speech recognition" and uses computers to translate speech into text, and sync the text with the video.

If your institution prefers to do manual transcriptions and not use the ASR service, you can still use the EchoVideo Transcript Editor to upload transcripts created outside of EchoVideo, then perform further edits and apply transcripts as closed captions if appropriate.

There are several institution-level ASR features that you can set, all available from the Institution Settings > Features page:

ASR Course Media - sends all video/audio/interactive media for transcriptions at the time they are posted into a class in a section (if the media does not already have a transcription).
ASR All New Media - sends all video/audio media for transcriptions at the time it is added to EchoVideo (and successfully processed). This includes user file uploads, captures generated by EchoVideo capture appliances, and all universal capture uploads. It also applies to all users, and includes media added students.
ASR Language Settings - Allows you to select the primary language used in your media recordings, and instructs the transcription service to use that language for transcribing the speech.
Automatic Push to Closed Captions - Allows you to set a confidence score for your automated transcripts, and tell EchoVideo to automatically apply those ASR transcripts to the closed captioning track if the transcript meets or exceeds the level you set.

If you choose NOT to enable the Course Media or All Media ASR options for your institution, you can still request transcripts for individual pieces of media. This allows you to take advantage of the automated transcription service without having media transcribed on a large scale.

Remember that these toggles and any reference to ASR here or in any of the EchoVideo documentation refers to automatic 3rd-party-generated transcriptions of media.

Once media has an ASR transcription, it is not RE-sent for transcription as long as the media has not been edited (the audio file has not changed in any way). This means that if both toggles are turned on, and an uploaded video was transcribed when it was uploaded, it is not RE-transcribed when it is published. It retains the original transcription. This also applies to individual media transcription requests; if the media has an automated transcript and it has not been edited, the media will not be sent for re-transcription.

Turning OFF any of these toggles has no effect on existing transcriptions. Transcriptions remain with the media throughout its lifecycle, regardless of how they were generated.

Manual transcription and upload for video/audio media is always available, as is the Transcript Editor. You may also want to refer to the student-access features for transcripts: Allow Students to Edit Transcripts for Class-Published Media and/or Allow Students to Edit Transcripts for Library Media. Some institutions assign media transcription and editing as a work-study task; these features allow students to take on this work.

The below sections provide some of the more technical details about the transcription service.

What is the difference between Transcriptions and Closed Captions

For end users (e.g., instructors or students), see Transcriptions vs. Closed Captions to see the difference in how each are presented in the media players.

Transcriptioning is different than "captioning" in several ways. Transcriptioning is "speech to text" and does not include sound effects and non-speech elements often included in closed captions. Furthermore, an automatic speech recognition transcription service is likely not going to meet the accuracy levels required of closed captions for hearing impaired individuals. You will find this particularly true in the case of low volume captures or those where the audio is interfered with by background noise, or possibly even in the case of non-native speakers whose accent is thick enough to cause word-recognition problems for the transcription service.

All that being said, transcriptions can be applied much faster than closed captions and typically cost less to generate. Transcriptions may provide a "good enough" solution for providing both visual and audio content for lectures. In particular, transcriptions may work as an interim "visual-text" measure during the interval while closed captions are being generated and applied (sometimes a day or more, depending on required accuracy levels).

Alternately, because both closed caption files and transcription files use the WEBVTT standard, it is possible to have the automatic transcriptions generated, then edited for accuracy, then applied to the media as closed captions. Alternately you can tell EchoVideo to automatically push transcripts to the closed captioning track if the transcription service applies a high enough confidence score to the transcripts.

When do transcriptions get applied

WHEN a piece of video/audio media gets transcriptions requested and applied depends on the configuration of the feature toggles noted above. It can happen either when the media is added to the system (ASR All New Media) or as each media is posted to a class in a course (ASR Course Media).

Understand that for timing, even when requested for individual media (by an admin) transcripts take at least 30 minutes to be returned. More if there are a lot of requests happening at once, and logically, longer recordings take longer to transcribe.

ASR Course Media

Turning on the ASR Course Media toggle tells EchoVideo to send media for transcription when it is posted to a class in a course. In the case of capture schedules that auto publish to a section, or an ad hoc recording that is published directly to course, the creation of the media and the publishing occurs together. For media uploaded to a user's library, or capture schedules or ad hoc recordings that do not auto-publish to a section, those media are not sent for transcripts until they are posted to an EchoVideo course.

For example, an instructor generates an ad hoc capture but selects "Library" as the Publish-to location, that video will not be transcribed until it is published to a class in a course. If an instructor uploads a video directly to the Class List in a section, that video is sent for automatic transcriptions and will display them when they are finished.

This feature can also be turned on for the whole institution, or it can be set so that Organizations, Departments, and individual course sections can have it on or off as needed

Keep also in mind that "availability" of a capture is not the same as "publishing". You can publish a video while making it not-yet available for students. Since the video is still published, it will trigger an automated transcription at that time (if ASR Course Media is turned on).

If a video is edited while it is currently published, it will be sent for transcription again after the edits are complete and the user clicks Save. This means that the video may be transcribed twice (once on initial posting to the class; once after editing it while published).

Important: The connection to make between the above two points is that if you/your instructors' standard procedure is to generate a capture, auto-publish it to a section but not make it available to students for some period while the instructor edits the video, that video will be transcribed twice; once when it is initially published, then again after the edits are complete and saved.

ASR All Media

Turning on the ASR All New Media toggle tells EchoVideo to send media for transcription as soon as it is added and completes media processing. This applies to all video/audio media, added to EchoVideo in any way, by all users, including students. The MP3 file created during media processing is sent for transcriptions as soon as it is available. NOTE - this also means that as an Admin you can select to Reprocess captures, which re-generates the MP3 file. If this feature toggle is turned on, those re-generated MP3 files are sent out for transcriptions. This is one way to get transcriptions for older items or previously un-published media that might need them.

Transcription of all media also applies to EDITED media. For example, if an instructor edits a video that already has transcriptions, then clicks Save, the edited video is processed and the newly generated MP3 file is sent for transcription. This is because the audio file changed with the edits and needs to be re-transcribed. This applies even if the edits included removing a silent section of the video; the audio file itself changed so EchoVideo sends the new file for transcriptions. In the Transcript Editor the video also has at least two versions of the transcript: the original and Version 1 for the edited version of the video.

If a Manual Transcription is applied to a capture/video before it is published, and THEN the video is published, the video will not be sent for automatic transcription. In this case, the uploaded transcription is considered the "original" and the automated one is an update. Reverting to the original would return the originally uploaded transcription. While we don't expect this to be a common use-case, we wanted to note it here for you.

How long does it take for a transcription to appear

It takes at least 30 minutes for a video to receive automatic transcriptions, longer for videos that are more than an hour in length and/or if the transcription service is processing a large number of requests at the time.

Currently transcriptions are not visible in the media details playback. If you need to know whether or not an item has transcripts, refer to the Transcript entry in the Details tab of the Media Details page. The Transcript entry will read "Add" if there is no transcript, and will read "Update" if there is one.

Alternately as an Admin, select Edit Transcript from the chevron menu for any completed capture entry; the Transcript editor opens and displays a message if the item does not have a transcript file.

In what instances are videos NOT automatically transcribed

Transcriptions are not "back-applied" to existing captures. Captures and videos already in the system at the time the ASR Toggle is turned on must be either published/re-published OR reprocessed (depending on the ASR toggles settings), or sent individually to request transcripts.

NOTE: If you remove then re-publish a capture to a class to obtain transcriptions, you will remove student video view data from the section analytics for that class/video. BETTER OPTION: Instructors can create a "holding class" solely for temporarily publishing videos (or use an expired section). Publish the older, non-transcribed videos to the class, then remove them. The act of publishing will trigger automatic transcription if the ASR Course Media toggle is on; the video does not need to be left in the class for more than several seconds. All currently published versions of the video will see the transcriptions.
Alternately as an Administrator, you can select to Request Transcript for any piece of media on the Captures page.

If Amazon is your transcription provider, recordings longer than 4 hours are NOT transcribed. This is an Amazon restriction. You can manually transcribe these videos and upload the VTT file to apply it to the media. Alternately, you can edit longer videos into shorter segments, which will then be transcribed whenever the ASR feature toggles indicate (either on save after editing, or when posted to a course).

If Speechmatics is your transcription provider, there is a maximum file size for the audio file that can be transcribed (roughly 2GB). However that would be a recording of almost 14 hours. So the duration limitation is more a functional limit and less a practical one. However if you have a very very long duration media you need transcribed, you can edit it into shorter segments, then have those transcribed.

Captures/videos that have already been auto-transcribed are NOT sent for automatic transcription again, as long as the file has not changed. The ASR service sees that the video has an automated transcription, compares the audio file byte-for-byte. As long as it is the same as the file originally submitted (has not been edited) the media is not re-submitted for transcriptions.

How is the automatic transcription service paid for

EchoVideo’s ASR offering is a paid service, and each customer/institution has an allocation of transcription hours included as part of your EchoVideo contract. Allocations are based on the annual contract period, and are reset each year.

ASR usage is based on the number of capture hours being transcribed. If your contractual allocation does not provide sufficient transcription coverage, you can pre-purchase additional hours by contacting your regional account team.

EchoVideo contract-allocated transcription hours will not roll over year-over-year. However, any additional hours you purchase above the contract-granted ones that are not consumed will not expire or reset.

If/When you reach your ASR Allocation limit, your EchoVideo account representative will notify you as soon as possible. At that point, you will be asked if you wish to purchase more hours to continue using the service. If you do not, the ASR service will be turned off for your institution.

Existing automated transcriptions will always remain with the media they have been applied to; they are not removed regardless of whether you continue to use the ASR service or not. The ASR Allocation simply determines whether you have transcription hours available in your account, and therefore whether or not more media can be sent for automatic transcription. Manual transcription and upload is always available.

Related to