This started as part of my previous post, From Zero to WebAssembly. It turned out to be quite a long tangential topic, so I decided to break it out here as a separate post so as to not distract from other material in the original article and to more freely explore the topic in its own space.
If you are completely new to WebAssembly, I recommend starting with the previous post before continuing here.
WebAssembly Text Format (WAT) specifies that a function is simply “a list of instructions.” While we could certainly just write one long linear list of instructions to operate on the stack, it might not always be the most readable or easiest to grok format for us humans – especially those of us more used to writing in higher-level languages.
S-expressions are a notation that
let us express linear sequences of instructions in a manner that may be nested
or otherwise visually appear out-of-order, possibly in a way that makes it
easier for a human reader to understand. For example, consider the difference in
the following pieces of WAT which each add the number 42 with an existing local
$x and store the result in another local
i32.const 42 ;; push 42 onto stack get_local $x ;; push value of $x onto stack (assume i32) i32.add ;; pop both previous values, add, push result onto stack set_local $y ;; pop previous result and store in local $y
This makes sense, and for such a small example it is not too difficult to follow and keep in mind all of the various changes that happen to the stack implicitly. However, this ability to understand code at a glance may not scale with more complex operations, especially if we are often switching back and forth between WAT and higher-level, non-stack-based languages.
y = x + 42;
Quite the difference in appearance. In a linear sequence of WAT, the assignment target comes last, while the source comes first. This is reversed in many HLLs, where the target appears first on the left-hand side of the assignment operator.
S-expressions allow us to instead write:
(set_local $y (i32.add (i32.const 42) (get_local $x)))
left to right: “set
$y to be the sum of
S-expression notation was popularized by the programming language Lisp in the late 1950s, and its many descendants and derivatives since. While the use of such syntax may appear familiar to those who work in those languages, I personally found it quite difficult to wrap my head around in my initial tinkerings, having no prior experience myself.
The key to understanding this format – to my brain, at least – is in seeing the nested tree structure which arises when parsing S-expressions and “unfolding” the atoms contained within each set of parentheses. The one line above, for example, parses to a tree such as
[set_local $y] ↙ [i32.add] ↙ ↘︎ [get_local $x] [i32.const 42]
If we then read the instructions in the tree in post-order traversal, we arrive back at the linear order of instructions we had earlier:
get_local $x i32.const 42 i32.add set_local $y
Let’s see how we might use S-expressions to visually simplify pieces of the
lowercase function from the end of my previous post. The
WAT source of the function body is initially 36 lines long, including whitespace
added for readability:
(module (memory (export "mem") 1) (func $lowercase (local $idx i32) ;; current memory index (local $char i32) ;; current character loop $processChar get_local $idx i32.load8_u tee_local $char i32.const 0 i32.eq if return end ;; if $char == 0 get_local $char i32.const 65 i32.ge_u if get_local $char i32.const 90 i32.le_u if get_local $idx get_local $char i32.const 32 i32.add i32.store8 end ;; if $char <= 90 end ;; if $char >= 65 get_local $idx i32.const 1 i32.add set_local $idx br $processChar end ;; loop $processChar ) (export "lowercase" (func $lowercase)) )
Conditional branching instructions are another place where using S-expressions can make our WAT code more closely resemble the kind of structure we expect to see in a higher-level language.
if statement takes one operand – that is, it consumes one value from the
top of the stack. If that value is 0, the rest of the instructions are skipped
up until its associated
else if present, or
end otherwise. If the value is
non-zero, the next instructions execute in their normal order up until the
Consider one of the
if statements from the
lowercase function which compares
a local value
$char to a constant number
get_local $char i32.const 65 i32.ge_u if ;; if $char >= 65 ;; if block body end
Using S-expressions, we can express the logic in a way that more closely
resembles the pseudocode in the comment
if $char >= 65.
(if (i32.ge_u (get_local $char) (i32.const 65)) (then (; if block body ;)) )
Even though they visually appear after the opening
if instruction, the
comparison instructions will still occur first in the compiled bytecode.
The body of an
if statement must be inside of a
then and, optionally, an
else block. The
end pseudo-instructions are implicit at a block’s closing
Let’s re-write the blocks in the
lowercase function to use S-expressions.
(func $lowercase (local $idx i32) ;; current memory index (local $char i32) ;; current character (loop $processChar get_local $idx i32.load8_u tee_local $char (if (i32.eq (i32.const 0)) ;; compares against teed $char (then return)) (if (i32.ge_u (get_local $char) (i32.const 65)) (then (if (i32.le_u (get_local $char) (i32.const 90)) (then get_local $idx get_local $char i32.const 32 i32.add i32.store8)))) get_local $idx i32.const 1 i32.add set_local $idx br $processChar) )
The indentation is purely aesthetic; aside from whitespace required by the
grammar to separate tokens, the
wat2wasm compiler ignores extra
whitespace. This lets us choose, for the most part, where we want to add
indentation, new lines, and other whitespace in order to visually separate code
to make it easier to read.
We can then go on to convert the remaining operations into S-expression form as
well. One particularly interesting transformation is of the
which saves a lowercased letter to memory. The section of code we started with
get_local $idx get_local $char i32.const 32 i32.add i32.store8
$idx contains the address of memory at which to store the value, and
the value itself is the sum of
32. The lexical distance between
$idx and the
store command is a bit odd, the result of an (instructional)
optimization step we took in the previous article paired with the need for
$idx to be pushed to the stack first because it is the first operand to the
With S-expressions, this code may be re-written as
(i32.store8 (get_local $idx) (; value to store ;))
where the value to store is itself
(i32.add (get_local $char) (i32.const 32))
Altogether, adding some whitespace for readability, the code may appear as
(i32.store8 (get_local $idx) (i32.add (get_local $char) (i32.const 32)))
This is much more clear; lexically, the first operand to the
follows it immediately, and, read left-to-right, the entire phrase says, “store
$idx the sum of
Considering all of the types of transformations discussed, one possible representation of the function appears as:
(func $lowercase (local $idx i32) ;; current memory index (local $char i32) ;; current character (loop $processChar (tee_local $char (i32.load8_u (get_local $idx))) (if (i32.eq (i32.const 0)) ;; compares against teed $char (then return)) (if (i32.ge_u (get_local $char) (i32.const 65)) (then (if (i32.le_u (get_local $char) (i32.const 90)) (then (i32.store8 (get_local $idx) (i32.add (get_local $char) (i32.const 32))))))) (set_local $idx (i32.add (get_local $idx) (i32.const 1))) br $processChar) )
The function body now has only 20 lines, compared to the original source’s 36.
It is worth reiterating that all of the changes to the program are purely syntactical; the code above contains the exact same instructions in the same order as were present in the beginning without S-expressions. In fact, compiling the two sources will produce exactly the same bytecode bit for bit.
Thus the choice to write any given piece of WAT source with or without S-expressions is a decision to base on merits of human readability and comprehension, and is – to some degree, as always – a matter of personal taste.