Case Study: Translating Markdown Content at Autodesk

Background

Autodesk approached Spartan Software with the requirement to translate markdown content in WorldServer, their translation management system, and needed a filter that could extract text, while leaving markdown and whitespace intact.

Markdown content is a simple text representation of HTML that is easily readable and writable by humans and can include embedded HTML.

Here’s a quick markdown example from the CommonMark standard that illustrates how simple characters, like hashes and dashes, are converted to HTML.

 

Markdown Text

HTML Output

 

Rendered in Browser

Requirements

The Okapi open source project had a markdown filter that could be leveraged but it needed to be “wrapped” for use in WorldServer. Fortunately, other Okapi filters had been wrapped for WorldServer previously and the development process was documented.

Autodesk wanted the markdown filter’s configuration to be exposed in the WorldServer user interface, including options for HTML sub-filtering such as link translation. This meant that the filter should be configurable to support different use cases without recompilation.

It would also be nice to take advantage of future markdown filter bug fixes and enhancements.

Project

Based on the configuration and Autodesk needs, we pursued a solution, with both WorldServer and the markdown filter running on Java 7.

Spartan debugged compilation with Java 7, removing unnecessary dependencies and fixing code that no longer worked. The work was done on an M36 fork for this specific project that would allow us to merge the latest code changes in the future. This was the riskiest part of the project because the scope of required changes was unknown.

When the filter downgrade work was complete and tests were successful, we moved on to exposing the configuration options in the WorldServer user interface and found a simple but effective way to enable changes to HTML sub-filtering options.

Solution

After user acceptance testing, we delivered an Okapi markdown filter with documentation that could be installed and configured on WorldServer running under Java 7. The configuration options provided rich control over the filter behavior, as seen below.

Users would be able to add regular expressions to protect inline codes with placeholders, switch from sentence to paragraph segmentation, identify translatable URLs and more.

Okapi filter configurations are typically stored in text files (*.fprm) that can be easily edited using Rainbow, the desktop Okapi app. Our solution was to place these files in a special directory of WorldServer AIS and make them available to the filter. The desired file simply needs to be specified in the “HTML Subfilter…” field of the WorldServer UI.

 

Spartan Software provided a flexible yet powerful solution for filtering markdown content with embedded HTML in Worldserver. When the markdown filter is enhanced in the future, it can be easily recompiled with the wrapper so that Autodesk can take advantage of the latest code.

If you are running WorldServer with Java 7 you can download the source code here and build the filter for your own production translation of markdown content.

Spartan Software has deep experience solving tough localization problems and contributing to open source projects like Okapi. Please reach out to me (trent@spartansoftwareinc.com) if you need help leveraging Okapi for your company’s localization workflows.