Helping Students Succeed in Translation Technology

筆譯組碩四 Ruben G. Tsui 徐嘉煜

Over the past four years, I’ve been the teaching assistant for “Translation Technology”, a required course for GPTI students in the Translation Track usually offered in the Spring (second) semester and taken by first-year students. Despite the fact that the vast majority of our students are in their early twenties (“digital natives”) and are savvy users of Internet and computer technology, there’s a significant hurdle to overcome in order to be successful in this course.

This course doesn’t teach you commercial Computer-Assisted Translation (CAT) tools, such as Translation Memory (TM) suites, which many translators use in their day-to-day translation tasks. These are covered in another GPTI course dedicated to CAT tools. What Translation Technology teaches, among other things, are the tools that help you collect “parallel texts” (or “parallel corpora”, typically bilingual text pairs in which one text is translated from the other) and analyze them. Quite a few master’s theses produced by GPTI graduates were conceived in this course.

In order to solve a problem dealing with the collection or analysis of textual data, you need to make a variety of software tools from many different sources to work together. The perfect word to describe this is “eclectic”. Some might think it’s a euphemism for “chaotic”, but I’d prefer to think of it as “creative and resourceful”. This has been the source of many difficulties that our students have encountered (suffered?) in the course. Building skills in each of the tools and combining the results to synthesize the results can be quite daunting to many. But practice makes perfect. Spending the time and genuinely learning how to use the tools will be rewarding, I promise.

“Different sources of software tools”, as you probably suspect, often translates to the problem of competing computer platforms used by our students (“Windows” vs. “macOS”). As neither the professor or myself use a Mac, we’d ask the Mac users to use the Windows machine in the lab if a particular tool was available only on Windows. This has usually drawn negative reaction from Mac users, which is quite understandable, as moving your data and analysis across computers can be a messy endeavor. One semester, as I recall, seven out of eight students were Mac users, and trying to persuade them to give up their beloved and stylish Macs and switch to the old clunky Windows desktops just for this class wasn’t easy.

Fortunately, there’s a way out of this conundrum. One principle that the professor adheres to in this class is the use of open-source source (i.e., free, as in libre, not gratis software). Apart from being also free-of-charge, most, if not all, open-source software is available on the three major desktop/laptop operating systems: Windows, macOS and Linux, and there’s a surprising amount of free software available for translation studies and text processing and analysis in general.

Over the past year, in additional to using free software, we’ve also been gradually migrating away from the personal computer platform and moving toward “Cloud Computing”. Take the “pre-processing” task of segmenting Chinese sentences into “word tokens” as an example (recall words in Chinese text are not separated by spaces, as in English). Identifying word units in Chinese is important if you want to do things like word frequency analysis and “word alignment” (e.g., finding the most probable translation(s) of a Chinese word in English according to a bilingual text). Currently the best word segmentation tool for traditional Chinese is the open-source CKIPtagger from Academia Sinica’s Institute of Information Science. As powerful and accurate this piece of software is, installing it on a typical student’s laptop isn’t straightforward. It also requires the type of computing power and memory many students aren’t equipped with, especially if they’re not in engineering or computer science.

The solution is Cloud Computing. For this course we’ve created computer scripts written in the Python language and deploy them on free cloud computing environments, such as Google Colab and Binder. To perform Chinese word segmentation, all our students need is a browser and the knowledge of how to transfer files to and from the cloud platform. Oh, they also need patience. Why? Free cloud computing resources are on a first-come, first-served basis, and the resources are rationed or queued when there are too many concurrent users. In practice, we’ve found that apart from occasionally slow start-up times, once the environment has been launched and ready for use, tasks can be run and completed smoothly in a reasonable amount of time.

Once the students have gotten a taste of text processing power on the cloud, there are several avenues they can pursue to ensure that they have access to the computing resources on a timely and reliable fashion. First, if they have a powerful laptop and lots of memory, we can help them set up an identical environment on that computer. No more waiting on slow launches. Second, if they don’t mind paying for professional cloud computing services, there are service providers that can guarantee a certain level of availability and performance. With open-source software computing components (called “Jupyter notebook”), scripts and data can be migrated from one platform to another (and back, if necessary) with minimal changes.

Succeeding in Translation Technology can be challenging, but many students have found it to be a useful and fulfilling pursuit. To quote from a thank-you note I recently received from a student in Translation Technology at the conclusion of the semester, “… thank[s] … for helping me find joy in learning about technology, which I’ve always been terrified of…”. I hope our first-year students who will be taking the course next Spring will find it equally enjoyable, or even instrumental, in your translation studies!

Direct any inquiries to the author at RubenTsui@gmail.com.