Logo

Elhuyar – Speech Recognition (ASR) and Machine Translation (MT) in audiovisual media for automatic subtitling and range enhancement

Elhuyar 

Sector: Services

Business Case

For reasons of inclusion and accessibility, subtitling requirements for the audiovisual sector are becoming increasingly stringent. In turn, offering subtitles in multiple languages increases the potential audience for the content to a completely global market. However, the creation and translation of subtitles by manual means is a time-consuming and costly process that is rarely undertaken.

Objectives

Use Automatic Speech Recognition (ASR) and Machine Translation (MT) technology for the (semi-)automatic creation of subtitles and their (semi-)automatic translation.

Use case

The speech recognition system has been adapted to the media domain, obtaining a higher recognition rate in informal conversations, dialects, etc. Automatic subtitling and translation of subtitles has been integrated into the multimedia content managers of the media, and when uploading and cataloguing audiovisual content in the manager, subtitles are automatically created, they can be corrected from the same manager, they can be sent for translation and also corrected, and published with subtitles included on the website, in social networks, etc.

Infrastructure

On Premise or Cloud, according to customer requirements

Technology

Automatic or deep learning Text Mining Voice recognition

Data

Several hundred hours of transcribed audio recordings.

Resources

Researchers who specialise in NLP and particularly in speech recognition and machine translation. Server infrastructure to host the developed recognition and translation systems (if On Premise installation is not required). Developers of APIs for remote calling of recognition and automatic translation systems. Front-end developers for integration of recognition, machine translation and manual correction systems.

Difficulties and learning

Specific requirements of speech recognition systems applied to media content: informal conversations, dialects, distinction and marking in the subtitles of different speakers, division of subtitles into segments of specific length and logical points, etc. Training with domain datasets, training of specific subsystems.

KPIs (business impact and metrics of the model)

Implementation in Hamaika Telebista, Goiena, Antxeta Irratia, Teknopolis, etc. Increased accessibility of content. Increased reach of content to a global audience through translation.

Funding

Hazitek, Applied Artificial Intelligence

Collaborators, Partners

Hamaika Telebista, Goiena, Antxeta Irratia

Scroll to Top