Spanish to Mexican Sign Language glosses corpus for natural language processing tasks
Spanish to Mexican Sign Language glosses corpus for natural language processing tasks
Blog Article
Abstract This work shares a dataset that contains Spanish (SPA) to Mexican Sign Language (MSL) glosses -transcripted MSL- pairs of sentences for a downstream task.The methodology used to prepare the shared dataset considered the construction of SPA-to-MSL corpus with a specific representation of the SPA language for MSL interpretation.The proposed corpus is a reference dataset for evaluating diverse neural machine translation (NMT) system variants.With the support of grammatical MSL books and advice from MSL interpreters, this study developed a 3000 sentence pairs SPA-to-MSL dataset.The distribution of 3000 sentences in the corpus follows the linguistic composition of the SPA language.
With the aim of testing the cc100x parts functionality of the corpus as a data source for NMT, two neural transformers were used to test the usability of the proposed dataset.The first NMT model uses a Helsinki-NLP SPA-to-SPA transformer developed by the Language Technologies Research Group at the University of Helsinki.The second NMT amundsen field slacks herre model considers a SPA-to-SPA pre-trained neural transformer presented as a BARTO approach.Both evaluations considered a transfer learning strategy, which has been demonstrated to be effective for modeling low-resource languages.The NMT evaluation produced 91.
13 and 94.23 BLEU that coincide with the state-of-the-art results in NMT for arbitrary languages.Moreover, the evaluation of a professional MSL interpreter established 94% of effective translation of SPA sentences in MSL structures.