Francisco Cerna-Herrera


2024

pdf bib
Lessons from Deploying the First Bilingual Peruvian Sign Language - Spanish Online Dictionary
Joe Huamani-Malca | Miguel Rodriguez Mondoñedo | Francisco Cerna-Herrera | Gissella Bejarano | Carlos Vásquez Roque | Cesar Augusto Ramos Cantu | Sabina Oporto Pérez
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Bilingual dictionaries present several challenges, especially for sign languages and oral languages, where multimodality plays a role. We deployed and tested the first bilingual Peruvian Sign Language (LSP) - Spanish Online Dictionary. The first feature allows the user to introduce a text and receive as a result a list of videos whose glosses are related to the input text or Spanish word. The second feature allows the user to sign in front of the camera and shows the five most probable Spanish translations based on the similarity between the input sign and gloss-labeled sign videos used to train a machine learning model. These features are constructed in a design and architecture that differentiates among the coincidence for the Spanish text searched, the sign gloss, and Spanish translation. We explain in depth how these concepts or database columns impact the search. Similarly, we share the challenges of deploying a real-world machine learning model for isolated sign language recognition through Amazon Web Services (AWS).

2022

pdf bib
PeruSIL: A Framework to Build a Continuous Peruvian Sign Language Interpretation Dataset
Gissella Bejarano | Joe Huamani-Malca | Francisco Cerna-Herrera | Fernando Alva-Manchego | Pablo Rivas
Proceedings of the LREC2022 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources

Video-based datasets for Continuous Sign Language are scarce due to the challenging task of recording videos from native signers and the reduced number of people who can annotate sign language. COVID-19 has evidenced the key role of sign language interpreters in delivering nationwide health messages to deaf communities. In this paper, we present a framework for creating a multi-modal sign language interpretation dataset based on videos and we use it to create the first dataset for Peruvian Sign Language (LSP) interpretation annotated by hearing volunteers who have intermediate knowledge of PSL guided by the video audio. We rely on hearing people to produce a first version of the annotations, which should be reviewed by native signers in the future. Our contributions: i) we design a framework to annotate a sign Language dataset; ii) we release the first annotated LSP multi-modal interpretation dataset (AEC); iii) we evaluate the annotation done by hearing people by training a sign language recognition model. Our model reaches up to 80.3% of accuracy among a minimum of five classes (signs) AEC dataset, and 52.4% in a second dataset. Nevertheless, analysis by subject in the second dataset show variations worth to discuss.