Fine-Tuning AI: Machine Translation Post Editing

Jan 12, 2024
Article
Technology

Breaking down language barriers has become increasingly efficient thanks to machine translation (MT). MT is like a supercharged, polyglot robot that can deliver pages of translation at lightning speed. But there’s a catch—it’s not quite perfect yet. This gap between speed and having a message that appears native and engages audiences is where the human touch comes into play, known in the field as machine translation post-editing (MTPE).

MTPE and the Gap Between Automation and Precision

Over the past year, ChatGPT and other Generative AI-driven large language models have ignited substantial discussions about their potential impact on various aspects of life and business. These conversations included predictions about the transformative influence of AI in fields such as translation. However, well before ChatGPT made its debut, AI had already established its presence in the language services industry through the evolution of Neural Machine Translation (NMT).

Until 2015, MT was software that translated text using language rules, algorithms, and statistical models to analyze and interpret the meaning of the text and provide the results in the desired language. Leading MT models delivered a moderate level of accuracy and performed better with simple and formal text. The challenges of these early models were that they required data sets, regular upkeep, tuning, and maintenance of dictionaries. The early models also had significant accuracy issues.

In 2015, NMT became the preferred model for most language services companies. NMT utilizes artificial neural networks trained on large parallel corpora consisting of pairs of sentences in the source and target languages. Unlike previous approaches to MT, NMT uses end-to-end learning to generate a higher translation quality while supporting more complex content. It’s faster, more accurate, easier to manage, and cheaper than the previous iterations of MT.

The strength of NMT is that it can quickly translate large volumes of simple content that doesn’t require a high level of accuracy. However, both NMT and Gen AI have challenges. Each requires large training datasets and domain-specific learning datasets. NMT and Gen AI struggle with understanding and conveying nuanced or context-dependent information, and both introduce gender and social biases into their results.

Human Post-Editing

Machine translation post-editing (MTPE) uses MT to process translation alongside a human editor who reviews and improves the output text. While raw MT output works well in a number of use-cases, it often lacks the necessary level of cultural sensitivity, tone, and readability required to make a translation sound natural to a native speaker. As such, post-editing by a human translator is recommended in nearly all cases when MT is used. The editor refines the MT output by correcting errors, improving the fluency of the text, and ensuring it reflects the source material. Overall, the MTPE process is designed to leverage the speed and efficiency of machine translation while still providing the high-quality human touch essential for accurate and effective communication.

MT vs. Human Translation

Content with specific terminology, large bilingual datasets, and a consistent structure yields the best results from MTPE. Content types that currently perform well are technical manuals, product descriptions, e-commerce content, and low-priority internal corporate communications, among others.

There are still many types of content that are better handled by humans. Linguists are equipped to translate complex content that requires a high level of creativity, accuracy, and precision, such as marketing copy, legal briefings, or medical documentation. Human translation is also needed for text with humor or idiomatic expressions. Humans best convey tone and emotion and effectively render the intended style. They consider the cultural nuances of the target language, ensuring that the translation is appropriate for the intended audience.

Content that is best handled by human translators includes:

Marketing and advertising
High-profile website copy
Branded material
Public-facing content
Journalistic copy
Literary texts
Exhibition didactics
Unique or new subject areas
Content where accuracy is crucial
Language combinations with limited available training data
Specialized content with limited domain-specific training data
Sensitive or confidential information

Post-Editing and the Perfect Fit

Light post-editing is the fastest, most cost-effective version of MTPE, in which the editor cleans up any errors, but makes minimal stylistic edits. The end goal is legibility and accuracy. Full post-editing is a more thorough process, in which the editor also improves style, tone, and flow. This approach further enhances the overall readability and improves local resonance. A project-specific approach can also be taken, prioritizing certain segments over others, depending on business needs.

Effective MTPE: The Role of the Human Editor

The effectiveness of MTPE relies heavily on the skills and experience of the human editor. An ideal linguist must be fluent in both the source and target languages, well-versed in the subject matter, and familiar with the idiosyncrasies of machine-generated translation. A good MTPE linguist typically employs a range of tools, including translation memory, glossaries, and quality assurance features within Computer-Assisted Translation (CAT) tools, to ensure consistency and accuracy throughout the text.

A Well-Defined Process

A successful MTPE process requires a clear roadmap, quality control measures, and a feedback loop for continuous improvement. Any post-delivery feedback must be reviewed, validated, and incorporated into future work. By establishing these steps, the overall quality of translations can be enhanced, leading to better outcomes and increased efficiency over time.

MTPE: MT and Human Translation

By leveraging the strengths of both MTPE and human translation where appropriate, businesses can unlock the full potential of their translation strategy and effectively overcome language barriers.

Cookie	Duration	Description
__hssc	30 minutes	HubSpot sets this cookie to keep track of sessions and to determine if HubSpot should increment the session number and timestamps in the __hstc cookie.
__hssrc	session	This cookie is set by Hubspot whenever it changes the session cookie. The __hssrc cookie set to 1 indicates that the user has restarted the browser, and if the cookie does not exist, it is assumed to be a new session.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	Cloudflare set the cookie to support Cloudflare Bot Management.
_hjAbsoluteSessionInProgress	30 minutes	Hotjar sets this cookie to detect a user's first pageview session, which is a True/False flag set by the cookie.
_vis_opt_s	3 months 8 days	Visual Website Optimizer sets this cookie to detect if there are new to or returning to a particular test.
_vis_opt_test_cookie	session	Visual Website Optimizer creates this cookie to determine whether or not cookies are enabled on the user's browser.

Cookie	Duration	Description
_vwo_ds	3 months	This cookie stores persistent user-level data for VWO Insights.
_vwo_sn	30 minutes	This cookie stores session-level information.
_vwo_uuid	1 year 1 month 4 days	Visual Website Optimizer sets this cookie to generate a unique id for every visitor and for its report segmentation feature. The cookie also allows to view data in a more refined manner.

Cookie	Duration	Description
__hstc	5 months 27 days	Hubspot set this main cookie for tracking visitors. It contains the domain, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_hjFirstSeen	30 minutes	Hotjar sets this cookie to identify a new user’s first session. It stores the true/false value, indicating whether it was the first time Hotjar saw this user.
_hjRecordingEnabled	never	Hotjar sets this cookie when a Recording starts and is read when the recording module is initialized, to see if the user is already in a recording in a particular session.
_hjRecordingLastActivity	never	Hotjar sets this cookie when a user recording starts and when data is sent through the WebSocket.
_hjSession_*	30 minutes	Hotjar sets this cookie to ensure data from subsequent visits to the same site is attributed to the same user ID, which persists in the Hotjar User ID, which is unique to that site.
_hjSessionUser_*	1 year	Hotjar sets this cookie to ensure data from subsequent visits to the same site is attributed to the same user ID, which persists in the Hotjar User ID, which is unique to that site.
_vwo_uuid_v2	1 year	This cookie is set by Visual Website Optimiser and calculates unique traffic on a website.
CONSENT	2 years	YouTube sets this cookie via embedded YouTube videos and registers anonymous statistical data.
hubspotutk	5 months 27 days	HubSpot sets this cookie to keep track of the visitors to the website. This cookie is passed to HubSpot on form submission and used when deduplicating contacts.

Cookie	Duration	Description
VISITOR_INFO1_LIVE	5 months 27 days	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
YSC	session	Youtube sets this cookie to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.

Email Signup

Fine-Tuning AI: Machine Translation Post Editing

MTPE and the Gap Between Automation and Precision

Human Post-Editing

MT vs. Human Translation

Post-Editing and the Perfect Fit

Effective MTPE: The Role of the Human Editor

A Well-Defined Process

MTPE: MT and Human Translation

Related Insights