Logo

Elhuyar – Personalised speech synthesis (TTS) in print media for accessibility and multi-modality

Elhuyar

Sector: Services

Business Case

In recent times, the consumption of news in text format has decreased and there has been a progressive shift towards consumption in audiovisual format, or even audio-only. It is becoming less and less common to see people on the street reading a newspaper, and more and more usual to see them with headphones on listening to podcasts, radio, etc. Likewise, when people do read, they read less on paper and more on devices such as cell phones, which present difficulties in reading outdoors or on public transport, such as lack of contrast, insufficient font size, the movement of walking, the vibration of the means of transport, etc., in addition to the problems blind people or people with visual disabilities experience when reading texts.

Objectives

Use speech synthesis technology (TTS or Text-To-Speech) for voice-over of media texts. Thus, a written medium can be listened to instead of (or in addition to) reading it, facilitating the consumption of texts via cell phone in various environments (walking along the street, travelling by public transport, various disabilities, etc.), making it more accessible. All this in several languages (Basque, Spanish, English, French, Catalan, Galician, etc.).

Use case

Customised speech synthesis systems are created for each media outlet, with one or more unique voices capable of speaking in multiple languages, generated from a few minutes of recordings of their own speakers in a single language. Each news item or article on the media’s website includes a player bar where when you press play you can listen to the news item with the voices of the media’s own announcers. In addition, the entire article can also be downloaded as an audio file for later listening or for creating podcasts.

Infrastructure

Cloud

Technology

Automatic or deep learning Text Mining Voice recognition

Data

Recordings made with media broadcasters (about 10 minutes)

Resources

Legal and security consulting for drafting contracts and storage of personal data such as voice recordings. Researchers who specialise in NLP, and particularly in speech synthesis. Server infrastructure to host the developed synthesis systems (if On Premises installation is not required). API developers for remote call of the synthesis system. Front-end developers to include audio player bar in the media website.

Difficulties and learning

Difficulty in pronouncing correctly the new toponyms, proper names, technical terms, etc. that appear continuously in the media environment, which led us to design a system for the continuous updating of a database of words of this kind with their pronunciations.

KPIs (business impact and metrics of the model)

Implementation in the only daily newspaper in Basque (Berria) and in all Tokikom media (a network of local media in Basque made up of more than 78 media outlets). Increased accessibility of written content. Increased media multi-modality by adding the option of listening to written media.

Funding

Hazitek, Applied Artificial Intelligence

Collaborators, Partners

Berria, Goiena, Tokikom

What we do at BAIC

Knowledge Centre

Use cases

AppIAI

Dataton Euskadi

IkasIA

IA situation in the Basque Country

BAIChallenge

Elhuyar – Personalised speech synthesis (TTS) in print media for accessibility and multi-modality

Sector: Services

Business Case

Objectives

Use case

Infrastructure

Technology

Data

Resources

Difficulties and learning

KPIs (business impact and metrics of the model)

Funding

Collaborators, Partners