Transpiler, a meaningless word

PhD Student fights the good fight

August 15, 2023

This tool is different from a compiler which often has a complex frontend, an optimizing middle end, and code generators for various backends. The big problem around most of the arguments to distinguish between compilers and “transpilers” focus on language syntax. However, anyone who wants one of these tools to actually work has to contend with the fact that different languages will have different semantics and translating between those is a complex task; a task that compilers already do.

Lie #1: Transpilers Don’t have Frontends

Let’s look at a simple Python to C transpiler. Both Nuitka and Mojo both actually target this exact problem but sanely call themselves compilers. It takes python code that looks like this:

def fact(n):
    x = 1
    for i in range(1, n):
        x *= i
return x

Into some C code like this:

int fact(int n) {
    int x = 1;
    for (int i = 1; i < n; i++) {
        x *= i;
    }
    return x;
}

Wow, pretty simple! But of course, that piece of python is not very idiomatic. We can make it a bit more terse using list comprehensions:

import functools as ft
def fact(n):
    lst = range(1, n)
    return ft.reduce(lambda acc, x: acc*x, )

Now our “transpiler” is in a little bit of trouble. The implementation of reduce is in pure Python so maybe we can still transpile it but range is implemented purely in C.

Looking into the implementation, what’s even clearer is that matching the semantics of this program is even harder: range is a Python generator which means that instead of actually computing the numbers from 1 to n, it only produces them when asked. This allows our method to save memory because we don’t actually have to allocate n words and can work using just the memory for the lazy implementation of the generator and the local variables.

Another problem is that there are hundreds of built-in library functions that need to be compiled from Python from C. Even a moderately useful subset would be unwieldy to implement by hand in our simple “transpiler”. Maybe one strategy we can take is to build a some sort of tool that would simplify these hundreds of definitions into a more uniform representation to work with.

We’ll call it the transpiler-not-frontend to make sure people understand we’re not building a compiler here. It is not hard to find examples of things mislabelled as transpilers. However, I won’t name any specific projects because this is just a dumb diatribe about words, I actually think the projects themselves are cool.

Lie #2: Transpilers are Simple

BabelJS is arguably one of the first “transpilers” that was developed so that people could experiment with JavaScript’s new language features that did not yet have browser implementations. Technically, ECMAScript features.

For example, ES6 added support for generators (similar to those in Python) but a lot of browser frontends did not support them. Generators are pretty nice:

function *range(max) {
  for (var i = 0; i < max; i += 1) {
    yield i;
  }
}
// Force the evaluation of the generator
console.log([0, ...range(10)])

Facebook’s regenerator is a BabelJS-based “transpiler” to transform generators into language constructs that already existed in JavaScript. Shouldn’t be too hard, right?

var _marked = /*#__PURE__*/regeneratorRuntime.mark(range);
function range(max) {
  var i;
  return regeneratorRuntime.wrap(function range$(_context) {
    while (1) {
      switch (_context.prev = _context.next) {
        case 0:
          i = 0;
        case 1:
          if (!(i < max)) {
            _context.next = 7;
            break;
          }
          _context.next = 4;
          return i;
        case 4:
          i += 1;
          _context.next = 1;
          break;
        case 7:
        case "end":
          return _context.stop();
      }
    }
  }, _marked);
}
// Force the evaluation of the generator
console.log([0, ...range(10)]);

Guess what, it is. Implementing generators is a whole-program transformation: they fundamentally rely on the ability of the program to save its internal stack and pause its execution. In fact, making it fast requires enough tricks that we wrote a paper on it.

The point here is that people call arbitrarily complex tools “transpilers”. Again, the problem is the misguided focus on language syntax and a lack of understanding of the semantic difference.

Lie #3: Transpilers Target the Same Level of Abstraction

This is pretty much the same as (2). The input and output languages have the syntax of JavaScript but the fact that compiling one feature requires a whole program transformation gives away the fact that these are not the same language. If we’re to get beyond the vagaries of syntax and actually talk about what the expressive power of languages is, we need to talk about semantics.

Lie #4: Transpilers Don’t have Backends

BabelJS has a list of “presets” which target different versions of JavaScript. This is not very different from LLVM having multiple different backends. If you’re going to argue that the backends all compile to the same language, see (3). People might argue that when Babel is compiling its operations, it can do it piecemeal: that is, the compilation of nullish coaleascing operators has nothing to how classes are compiled.

This is exactly what compiler frontends do as well: they transform a large surface area of syntax into a smaller language and a lot of operations are simple syntactic sugar which can be represented using other, more foundational primitives in the language. For example, in the Rust compiler, the mid-level representation (MIR) does away with features like if-let by compiling them into match statements. In fact, clippy, a style suggestion tool for Rust, implements this as source-to-source transformation: if you have simple match statements in your program in your program, Clippy will suggest a rewrite to you.

Compilers already do things that “transpilers” are supposed to do. And they do it better because they are built on the foundation of language semantics instead of syntactic manipulation.

Lie #5: Compilers only Target Machine Code

This one is interesting because instead of defining the characteristics of a “transpiler”, it focuses on restricting the definition of a compiler. Unfortunately, this one too is wrong. The term is widely used in many contexts where we are not generating assembly code and instead generating bytecode for some sort of virtual machine. For example, the JVM has an ahead-of-time compiler from Java source code to the JVM bytecode and another just-in-time compiler to native instructions. These kinds of multi-tier compilation schemes are extremely common in dynamic languages like JavaScript as well.

Lie #6: Transpilers are not Compilers

People seemed to scared of compilers and resort to claims like “I don’t want something as complex”, or “string interpolation is good enough”. This is silly. Anyone who has built one of these “transpilers” knows that inevitably, they get complex and poorly maintained precisely because of the delusion that they aren’t doing something complex.

Programming languages are not just syntax; they have semantics too. Pretending that you can get away with just manipulating the former is delusional and results in bad tools.

Lindsey Kuper has a well-written article on the same topic.