Logo

Elhuyar – Multimedia content management (MAM) with automated metadata generation through the use of artificial intelligence and natural language processing

Elhuyar 

Sector: Services

Business Case

Digitalisation and information technologies have generated an exponential growth of files and multimedia content, which has led to the need to manage these assets appropriately in modern organisations. This has led to the development of Media Asset Managers (MAMs) for the storage, classification, organisation, optimisation, maintenance and preservation of these items.

Objectives

Apply artificial intelligence technologies to improve audiovisual content management in MAM systems, using audio transcription technologies, and technologies for thematic segmentation and automatic generation of semantic metadata.

Use case

Adaptation of neural language models (encoder, encoder-decoder, and decoder) to implement the tasks of thematic segmentation, named entity extraction, thematic descriptor extraction, and automatic summary generation.

Infrastructure

On Premise and Cloud.

Technology

Automatic or deep learning Text Mining Voice recognition

Data

Public and private datasets for automatic summarisation, entity extraction and thematic descriptor extraction.

Resources

Researchers who specialise in NLP and especially in Information Extraction and LLMs. Server infrastructure to deploy the trained models.

Difficulties and learning

Thematic segmentation of transcripts is difficult to resolve using a supervised approach; unsupervised approaches are more robust, especially those based on LLMs used in a zero-shot fashion.

KPIs (business impact and metrics of the model)

Success rates above 90% in entity and thematic descriptor extraction. Ability to process three languages: Basque, Spanish and English. Improved document management of multimedia content.

Funding

Applied Artificial Intelligence (SPRI) and private funding.

Collaborators, Partners

Baleuko.

Scroll to Top