A few months ago, we released the first version of our Okapi Components for WorldServer library on GitHub. We developed this code along with Tableau Software in order to repackage portions of the Okapi Framework into components build on the SDL WorldServer SDK. The goal was to leverage the power of Okapi to more quickly develop custom filters, MT connectors, and other TMS functionality within WorldServer.
We’ve been happy with the feedback we’ve received so far, and we’ve continued to make improvements — we released version 1.1 just before the holidays. In this post I wanted to give some additional insight about how this integration works, and why what Okapi provides is so valuable.
Architecture and Complexity
Filter implementations can be complicated, even for reasonably simple formats. The WorldServer SDK provides very little support for new filter formats except for access to raw data streams and an expectation that the filter will transform these bytes to and from WorldServer segments.
Generally speaking, the implementation of most filters can be split into three parts:
- An extractor, capable of reading the native file and transforming it into a logical structure from which segments can be extracted.
- A merger, capable of updating a native file with translated target text.
- Segment logic, which converts the model that the extractor and merger understand into the segments that WorldServer expets.
The segment logic is usually the simplest component. A robust extractor, even for a simple format like JSON, can still require a few thousand lines of code. Additionally, the extractors used in filters often have unusual requirements (such as options for specific behaviors, and the need to preserve formatting while updating text) mean that traditional off-the-shelf parsers aren’t always flexible enough for filter development.
Developing with Okapi
When possible, we solve the problem of extractor complexity by leveraging components that already exist within the Okapi Framework. Okapi contains its own filter framework and implementations for a wide variety of common formats, each of which produces a common set of events to describe the translatable text it encounters.
The open source Okapi Components for WorldServer project is based on the idea of using an Okapi filter to provide the extractor and merger, and then writing common segment logic to convert between Okapi’s common event model and the WorldServer segment model. Because Okapi’s event model is fixed across all filters, our segment logic can be reused from one custom filter to another.
This makes developing new custom WorldServer filters based on an existing Okapi filter very simple. In addition to writing a few small wrapper classes, the largest integration task is exposing the filter options via the WorldServer UI.
Exposing Filter Configuration
Okapi filters implement their own set of configuration options to affect the behavior of their extractor and merger. As we add support for them in WorldServer, we have been mapping some or all of these options to the WorldServer UI. For specific client implementations, we may also hardcode certain options internally in order to make the filter behave in a particular way.
The requests for specific filter behavior we encounter fall into three basic categories:
- Behavior for which a configuration option exists in Okapi. In this case, we can either map the option to the WorldServer UI (if the user needs control over the behavior) or hard-code it internally (if the behavior is fixed). The first option is obviously preferred from the standpoint of subsequently contributing the code to open source.
- Behaviors which are specific to WorldServer. Examples of this include the option to apply WorldServer’s sentence-breaking logic to filter output, or an option to copy translated .pot files to an equivalent .po file. These are developed entirely within the custom filter, and level of effort depends on the complexity of the feature.
- Behaviors which require modification of the underlying Okapi filter. This includes things like adding new options that affect how the extractor works or what content is exposed for translation. This requires developing against a fork of Okapi which we modify as needed. The level of effort for this varies, but in general these efforts are more complicated than other changes, sometimes significantly so. Ideally, our modifications will be merged back into the Okapi core for inclusion in the next release; this allows us to eventually discard our fork and build entirely on the open source code base. Not doing this will increase long-term maintenance cost of the custom filter for the client.
One of our goals is to commit as much code as possible back to the open source community, both by improving the WorldServer components we make available and by contributing fixes and enhancements to Okapi. Filter development is tricky and not entirely pleasant, and there’s a benefit to not having to re-solve the same problems over and over again. In the long term, I’d prefer to see our team — and, more broadly, our industry — focus on more interesting challenges. For our clients, it simplifies maintenance to have a shared codebase where the cost of ownership is amortized.