skyline of Raleigh, NC

Nick Freeman

← blog index

S-expressions in WebAssembly Text Format

as if I didn't already use too many parentheses
September 27, 2018

Contents


This started as part of my previous post, From Zero to WebAssembly. It turned out to be quite a long tangential topic, so I decided to break it out here as a separate post so as to not distract from other material in the original article and to more freely explore the topic in its own space.

If you are completely new to WebAssembly, I recommend starting with the previous post before continuing here.

Motivation

WebAssembly Text Format (WAT) specifies that a function is simply “a list of instructions.” While we could certainly just write one long linear list of instructions to operate on the stack, it might not always be the most readable or easiest to grok format for us humans – especially those of us more used to writing in higher-level languages.

S-expressions are a notation that let us express linear sequences of instructions in a manner that may be nested or otherwise visually appear out-of-order, possibly in a way that makes it easier for a human reader to understand. For example, consider the difference in the following pieces of WAT which each add the number 42 with an existing local variable $x and store the result in another local $y:

i32.const 42 ;; push 42 onto stack
get_local $x ;; push value of $x onto stack (assume i32)
i32.add      ;; pop both previous values, add, push result onto stack
set_local $y ;; pop previous result and store in local $y

This makes sense, and for such a small example it is not too difficult to follow and keep in mind all of the various changes that happen to the stack implicitly. However, this ability to understand code at a glance may not scale with more complex operations, especially if we are often switching back and forth between WAT and higher-level, non-stack-based languages.

How might we write a similar operation in, say, JavaScript?

y = x + 42;

Quite the difference in appearance. In a linear sequence of WAT, the assignment target comes last, while the source comes first. This is reversed in many HLLs, where the target appears first on the left-hand side of the assignment operator.

S-expressions allow us to instead write:

(set_local $y (i32.add (i32.const 42) (get_local $x)))

Still not exactly the same as our JavaScript version, but easier to read from left to right: “set $y to be the sum of 42 and $x.”

Parsing S-expression Trees

S-expression notation was popularized by the programming language Lisp in the late 1950s, and its many descendants and derivatives since. While the use of such syntax may appear familiar to those who work in those languages, I personally found it quite difficult to wrap my head around in my initial tinkerings, having no prior experience myself.

The key to understanding this format – to my brain, at least – is in seeing the nested tree structure which arises when parsing S-expressions and “unfolding” the atoms contained within each set of parentheses. The one line above, for example, parses to a tree such as

                         [set_local $y]
                        ↙
               [i32.add]
              ↙        ↘︎
[get_local $x]           [i32.const 42]

If we then read the instructions in the tree in post-order traversal, we arrive back at the linear order of instructions we had earlier:

get_local $x
i32.const 42
i32.add
set_local $y

The Program

Let’s see how we might use S-expressions to visually simplify pieces of the lowercase function from the end of my previous post. The WAT source of the function body is initially 36 lines long, including whitespace added for readability:

(module
 (memory (export "mem") 1)

  (func $lowercase
    (local $idx i32)  ;; current memory index
    (local $char i32) ;; current character

    loop $processChar
      get_local $idx
      i32.load8_u
      tee_local $char
      i32.const 0
      i32.eq
      if
        return
      end ;; if $char == 0

      get_local $char
      i32.const 65
      i32.ge_u
      if
        get_local $char
        i32.const 90
        i32.le_u
        if
          get_local $idx
          get_local $char
          i32.const 32
          i32.add
          i32.store8
        end ;; if $char <= 90
      end ;; if $char >= 65

      get_local $idx
      i32.const 1
      i32.add
      set_local $idx

      br $processChar
    end ;; loop $processChar
  )

  (export "lowercase" (func $lowercase))
)

Control Flow Statements

Conditional branching instructions are another place where using S-expressions can make our WAT code more closely resemble the kind of structure we expect to see in a higher-level language.

The if statement takes one operand – that is, it consumes one value from the top of the stack. If that value is 0, the rest of the instructions are skipped up until its associated else if present, or end otherwise. If the value is non-zero, the next instructions execute in their normal order up until the else or end.

Consider one of the if statements from the lowercase function which compares a local value $char to a constant number 65:

get_local $char
i32.const 65
i32.ge_u
if ;; if $char >= 65
  ;; if block body
end

Using S-expressions, we can express the logic in a way that more closely resembles the pseudocode in the comment if $char >= 65.

(if (i32.ge_u (get_local $char) (i32.const 65))
  (then (; if block body ;))
)

Even though they visually appear after the opening if instruction, the comparison instructions will still occur first in the compiled bytecode.

The body of an if statement must be inside of a then and, optionally, an else block. The end pseudo-instructions are implicit at a block’s closing parenthesis ).

Let’s re-write the blocks in the lowercase function to use S-expressions.

(func $lowercase
  (local $idx i32)  ;; current memory index
  (local $char i32) ;; current character

  (loop $processChar
    get_local $idx
    i32.load8_u
    tee_local $char

    (if (i32.eq (i32.const 0)) ;; compares against teed $char
      (then return))

    (if (i32.ge_u (get_local $char) (i32.const 65))
      (then
        (if (i32.le_u (get_local $char) (i32.const 90))
          (then
            get_local $idx
            get_local $char
            i32.const 32
            i32.add
            i32.store8))))

    get_local $idx
    i32.const 1
    i32.add
    set_local $idx

    br $processChar)
)

The indentation is purely aesthetic; aside from whitespace required by the grammar to separate tokens, the wat2wasm compiler ignores extra whitespace. This lets us choose, for the most part, where we want to add indentation, new lines, and other whitespace in order to visually separate code to make it easier to read.


We can then go on to convert the remaining operations into S-expression form as well. One particularly interesting transformation is of the store command which saves a lowercased letter to memory. The section of code we started with appears as

get_local $idx
get_local $char
i32.const 32
i32.add
i32.store8

The local $idx contains the address of memory at which to store the value, and the value itself is the sum of $char and 32. The lexical distance between $idx and the store command is a bit odd, the result of an (instructional) optimization step we took in the previous article paired with the need for $idx to be pushed to the stack first because it is the first operand to the store command.

With S-expressions, this code may be re-written as

(i32.store8 (get_local $idx) (; value to store ;))

where the value to store is itself

(i32.add (get_local $char) (i32.const 32))

Altogether, adding some whitespace for readability, the code may appear as

(i32.store8
  (get_local $idx)
  (i32.add (get_local $char) (i32.const 32)))

This is much more clear; lexically, the first operand to the store instruction follows it immediately, and, read left-to-right, the entire phrase says, “store at $idx the sum of $char and 32.”

Final Form

Considering all of the types of transformations discussed, one possible representation of the function appears as:

(func $lowercase
  (local $idx i32)  ;; current memory index
  (local $char i32) ;; current character

  (loop $processChar
    (tee_local $char (i32.load8_u (get_local $idx)))

    (if (i32.eq (i32.const 0)) ;; compares against teed $char
      (then return))

    (if (i32.ge_u (get_local $char) (i32.const 65))
      (then
        (if (i32.le_u (get_local $char) (i32.const 90))
          (then
            (i32.store8
              (get_local $idx)
              (i32.add (get_local $char) (i32.const 32)))))))

    (set_local $idx (i32.add (get_local $idx) (i32.const 1)))

    br $processChar)
)

The function body now has only 20 lines, compared to the original source’s 36.

It is worth reiterating that all of the changes to the program are purely syntactical; the code above contains the exact same instructions in the same order as were present in the beginning without S-expressions. In fact, compiling the two sources will produce exactly the same bytecode bit for bit.

Thus the choice to write any given piece of WAT source with or without S-expressions is a decision to base on merits of human readability and comprehension, and is – to some degree, as always – a matter of personal taste.

to top