AUTOMATIC CORPUS-BASED TRANSLATION OF A SPANISH FRAMENET MEDICAL GLOSSARY
ebook

AUTOMATIC CORPUS-BASED TRANSLATION OF A SPANISH FRAMENET MEDICAL GLOSSARY (ebook)

Editorial:
UNIVERSIDAD DE SEVILLA
Materia
Lingüística
ISBN:
978-84-472-3005-1
Formato:
HTML5 – Streaming
Derechos eBook:
Acceso perpetuo desde cualquier dispositivo
DRM
Si

Computational linguistics is the scientific study of language from a computational perspective. It aims is to provide computational models of natural language processing (NLP) and incorporate them into practical applications such as speech synthesis, speech recognition, automatic translation and many others where automatic processing of language is required. The use of good linguistic resources is crucial for the development of computational linguistics systems. Real world applications need resources which systematize the way linguistic information is structured in a certain language. There is a continuous effort to increase the number of linguistic resources available for the linguistic and NLP Community. Most of the existing linguistic resources have been created for English, mainly because most modern approaches to computational lexical semantics emerged in the United States. This situation is changing over time and some of these projects have been subsequently extended to other languages; however, in all cases, much time and effort need to be invested in creating such resources. Because of this, one of the main purposes of this work is to investigate the possibility of extending these resources to other languages such as Spanish. In this work, we introduce some of the most important resources devoted to lexical semantics, such as WordNet or FrameNet, and those focusing on Spanish such as 3LB-LEX or Adesse. Of these, this project focuses on FrameNet. The project aims to document the range of semantic and syntactic combinatory possibilities of words in English. Words are grouped according to the different frames or situations evoked by their meaning. If we focus on a particular topic domain like medicine and we try to describe it in terms of FrameNet, we probably would obtain frames representing it like CURE, formed by words like cure.v, heal.v or palliative.a or MEDICAL CONDITIONS with lexical units such as arthritis.n, asphyxia.n or asthma.n. The purpose of this work is to develop an automatic means of selecting frames from a particular domain and to translate them into Spanish. As we have stated, we will focus on medicine. The selection of the medical frames will be corpus-based, that is, we will extract all the frames that are statistically significant from a representative corpus. We will discuss why using a corpus-based approach is a reliable and unbiased way of dealing with this task. We will present an automatic method for the selection of FrameNet frames and, in order to make sure that the results obtained are coherent, we will contrast them with a previous manual selection or benchmark. Outcomes will be analysed by using the F-score, a measure widely used in this type of applications. We obtained a 0.87 F-score according to our benchmark, which demonstrates the applicability of this type of automatic approaches. The second part of the book is devoted to the translation of this selection into Spanish. The translation will be made using EuroWordNet, a extension of the Princeton WordNet for some European languages. We will explore different ways to link the different units of our medical FrameNet selection to a certain WordNet synset or set of words that have similar meanings. Matching the frame units to a specific synset in EuroWordNet allows us both to translate them into Spanish and to add new terms provided by WordNet into FrameNet. The results show how translation can be done quite accurately (95.6%). We hope this work can add new insight into the field of natural language processing.

Otros libros del autor