As I mentioned in my previous post, this summer, I had the pleasure of attending the 33rd European Summer School in Logic, Language, and Information, a fascinating series of multidisciplinary courses focusing on the intersection of computational linguistics, semantics, and logic. Today, I’d like to discuss another class that I attended at the school that focused on cross-linguistic semantics: how different languages package and express meaning. The area of study is relatively new and extremely interesting, and in this course, Professors de Swart and Le Bruyn discussed a state-of-the-art approach to cross-linguistic semantic research that they call translation mining. I express my sincere thanks to both professors for teaching the fascinating course.
Past methods of researching cross-linguistic semantics go something like this: you start with a reference grammar, usually in English, and take it out into the field, searching for structures in your target language (L2) that have the same meaning as your previously identified structure in L1. To find such corresponding structures, researchers will carry out scripted question and answer sessions or ask participants of a study to describe a set of pictures that tells a story. While somewhat effective, these strategies miss a significant part of cross-linguistic semantics, as they are too grounded in English (or rarely another L1).
To exemplify this, let’s say we have a sentence X that has a meaning Y in English. We are now going to take meaning Y and look for structures in our L2 that share the same meaning. Remember that we are looking for these structures in a conversational or descriptive format that has been scripted in English. So, what we’re going to find is the set of all (or most) of the structures in L2 that can express meaning Y in the same context that English can. What we’re going to miss is the potentially very large set of contexts that L2 can express meaning Y in that do not apply to English. Thus, we’ve failed to fully accomplish the main purpose of our task: successfully compare meaning across languages.
The solution that Professors de Swart and Le Bruyn present compares two or more parallel corpora, identical texts that have been translated into different languages, and compares the aspects of each language that are used in each context. By looking at universal contexts instead of specific conversations or stories, researchers gain insight as to when certain aspects of languages can be used, which can then be used to draw conclusions about meanings. For example, instead of searching for instances where a person would convey a certain meaning (an onomasiological approach), we look at contexts such as tense usages and compare them across languages (a semasiological approach). Using this new strategy, we can move away from English-focused studies, and we can create distributions of contexts that allow us to determine when languages employ their tools. We can generate accurate, probability-based conclusions that shed incredible new insight into how the many languages of the world convey meaning. What I have described today is only a surface-level description of the approach, so if you are interested, I highly recommend you check out one of the professors’ papers, which exemplifies and further describes their new approach.