Quality Review and Post Editing with Ocelot 2.0

While editing and quality review have long had a place in the translation process, practical concerns have limited the ability of translation and localization organizations to assess the quality of their translations. The increased use of post-edited machine translation has made this issue more acute as organizations seek to refine and qualify huge amounts of machine-translated content.

The difficulty is not due to a lack of suitable methodology. Systems for counting and classifying error data are well established. However, the tools required to capture this data have been lacking, leaving many to depend on manually-populated spreadsheets and other improvised solutions.

Vistatec, in partnership with Spartan Software, has sought to solve this problem with the release of Ocelot, an open source editor and error flagging application. Ocelot seeks to solve many of the ergonomic issues associated with post editing and review by taking advantage of existing markup standards.

XLIFF and ITS

When performing quality review, there are 5 essential pieces of information that must be captured: the source, original target, updated target, error type and severity. Without a specialized tool, reviewers are often required to copy/paste the source and original target into a spreadsheet and enter quality information manually. They may also need to populate additional data in the spreadsheet, such as the document name, word count, vendor, etc. Once the manual work is done, the spreadsheet becomes a separate record which must be managed along with the translatable files.

Ocelot eliminates much of this manual work by adding ITS 2.0 markup to XLIFF files. ITS, which stands for Internationalization Tag Set, is a W3 standard for capturing localization metadata in XML documents. Most importantly, it can record LQI (Linguistic Quality Issue) at the segment level. This ensures a logical organization of data for downstream management and reporting.

Ocelot also creates a record of the changes made to the translations. The method is different depending on the version of XLIFF being used. For XLIFF 1.2, the original translation is saved in the alt-trans element. In XLIFF 2.0, users can take advantage of new revision tracking features to display changes using colored markup, similar to tracked changes in MS Word.

Once the data is written into the XLIFF files, it can be imported into other programs for reporting. This can require some additional development effort, however data can easily be aggregated across projects using a combination of Excel macros or relatively simple scripts. In this scenario, Excel is used solely for reporting, rather than as a data entry tool.

There are some caveats to this approach. Most importantly, the XLIFF standard has not always been respected by tools developers. As is noted on the Ocelot wiki, XLF files created by WorldServer do not support ITS markup, so information added by Ocelot will need to be stripped out of the file before it can be re-imported. Also, SDLXLIFF is not currently supported. So, a small amount of additional development work may be required to transform content from the CAT tool output to a format that complies with the XLIFF and ITS standards.

Customizable Filtering

One of the Ocelot interface’s greatest strengths is its support for filtering. Segments can be filtered on several different data dimensions supported by ITS and XLIFF, including segment origin, matching level, MT confidence level and linguistic quality issues. Filters are controlled via an external properties file that also allows the user to specify custom flag colors for different filters. With these features, users are able to tailor their editing interface to their own quality program.

The ability to create custom filters also has important implications for performing sample review. When performing a sample review, users may prefer to focus on segments that originated from a certain user or tool, or in the case of post-editing MT content, that have been assigned a low confidence score. Creating custom filters allows users to visually recognize segments that require their attention, and ignore segments that are not flagged by the filters.

Integrated Translation Memory and Concordance Search

The release of Ocelot 2.0 in early 2016 added several impressive new features. Most noticeably, the interface was extended to include a translation memory and concordance search window. Users can now upload translation memories in TMX format and see matches for the active segment presented in the translation memory window, exactly as they would appear in a traditional CAT tool. Concordance searching is equally intuitive. These changes allow reviewers to save even more time when checking translations against legacy content.

Custom Quality Models and Hotkeys

Ocelot 2.0 also introduced an improved interface for logging quality issues and assigning weights. This interface allows users to quickly assign errors using their mouse, or to assign custom hotkeys for logging issues and severities.

Where to Learn More about Ocelot

Ocelot is available both as source code or compiled binaries. For information on downloading and setup, visit the Ocelot wiki.

If you would like additional guidance on how to integrate Ocelot into your organization’s machine translation or quality review workflows, contact Spartan Software for a consultation.