Elhuyar – Personalised speech synthesis (TTS) in print media for accessibility and multi-modality
Elhuyar
Sector: Services
Business Case
In recent times, the consumption of news in text format has decreased and there has been a progressive shift towards consumption in audiovisual format, or even audio-only. It is becoming less and less common to see people on the street reading a newspaper, and more and more usual to see them with headphones on listening to podcasts, radio, etc. Likewise, when people do read, they read less on paper and more on devices such as cell phones, which present difficulties in reading outdoors or on public transport, such as lack of contrast, insufficient font size, the movement of walking, the vibration of the means of transport, etc., in addition to the problems blind people or people with visual disabilities experience when reading texts.
Objectives
Use speech synthesis technology (TTS or Text-To-Speech) for voice-over of media texts. Thus, a written medium can be listened to instead of (or in addition to) reading it, facilitating the consumption of texts via cell phone in various environments (walking along the street, travelling by public transport, various disabilities, etc.), making it more accessible. All this in several languages (Basque, Spanish, English, French, Catalan, Galician, etc.).
Use case
Customised speech synthesis systems are created for each media outlet, with one or more unique voices capable of speaking in multiple languages, generated from a few minutes of recordings of their own speakers in a single language. Each news item or article on the media’s website includes a player bar where when you press play you can listen to the news item with the voices of the media’s own announcers. In addition, the entire article can also be downloaded as an audio file for later listening or for creating podcasts.
Infrastructure
Cloud
Technology
Automatic or deep learning Text Mining Voice recognition
Data
Recordings made with media broadcasters (about 10 minutes)
Resources
Legal and security consulting for drafting contracts and storage of personal data such as voice recordings. Researchers who specialise in NLP, and particularly in speech synthesis. Server infrastructure to host the developed synthesis systems (if On Premises installation is not required). API developers for remote call of the synthesis system. Front-end developers to include audio player bar in the media website.
Difficulties and learning
Difficulty in pronouncing correctly the new toponyms, proper names, technical terms, etc. that appear continuously in the media environment, which led us to design a system for the continuous updating of a database of words of this kind with their pronunciations.
KPIs (business impact and metrics of the model)
Implementation in the only daily newspaper in Basque (Berria) and in all Tokikom media (a network of local media in Basque made up of more than 78 media outlets). Increased accessibility of written content. Increased media multi-modality by adding the option of listening to written media.
Funding
Hazitek, Applied Artificial Intelligence
Collaborators, Partners
Berria, Goiena, Tokikom