skyline of Raleigh, NC

Nick Freeman


Background

In the summer semester of 2018, I served as a teaching assistant for the undergraduate course Computer Organization and Assembly Language. There were a few opportunities for the teaching staff to present on special topics of interest, and on one particular day I chose to discuss WebAssembly – though, as we are about to see, WebAssembly is quite different than MASM and its other assembler relatives which we used in most of the course.

The semester contained many projects, increasing in complexity as students became familiar with MASM and related programming concepts. Similarly, I decided to take a few of the first projects and adapt them to a WebAssembly context in order to familiarize myself with its mechanics.

The whole process took me through pleasantly unexpected topics in an effort to reason about and design for low-level interoperability between JavaScript and Wasm in the browser, given that each have quite different program grammars, execution mechanisms, and require different programming mindsets.

In the end it was all quite worthwhile, and in the Fall I incorporated my experience and some of the resulting material into a new workshop session that explored newer web application technologies and techniques.

Motivation for WebAssembly

WebAssembly is a safe, portable, low-level code format designed for efficient execution and compact representation. Its main goal is to enable high performance applications on the Web, but it does not make any Web-specific assumptions or provide Web-specific features, so it can be employed in other environments as well.

Introduction to the WebAssembly Specification

I sometimes call WebAssembly the “fourth true language of the web”, joining its older siblings HTML, CSS, and JavaScript. Its great potential, though, is not as a language, but as a compilation target. The bright future its champions envision – quickly becoming the bright present – is one in which developers can program in their languages of choice and still be able to run their programs quickly and efficiently in the browser and beyond after compiling to a common Wasm bytecode.

I am incredibly excited by the potential that WebAssembly brings. Rather than drone on about that myself here, I recommend these existing resources which already do a great job explaining the significance of WebAssembly to the web and to computing in general:

The Projects

The very first project LOWERCASE is a program which accepts arbitrary input, converts upper-case letters to lower-case, and passes through non-letter characters unchanged. The program must continue to process incoming characters until it receives and prints a period.

A second project TABS implements a simple form of tab expansion. Given a fixed column width, the program must read input and replace tab characters with one or more spaces in order to align to a new column. It must also accommodate input with multiple lines, and again halts after processing a period.

I thought these made excellent candidates for a first exercise into exploring WebAssembly, requiring minimal and straightforward logic while being one or two steps up from a trivial hello world echoing program.

Program I/O Foundation

While it is possible for WebAssembly modules to run in arbitrary host environments (such as Node.js), I thought it would be even more exciting to see WebAssembly running in the browser alongside JavaScript and with observable effects in a visible DOM.

Some project requirements had to be adapted in order to run in the browser, where the input/output mechanics differ from that of a DOS terminal. For our use here, we can just accept input from a textarea and print the result to the screen and/or console.

Minimal Host Page

The HTML document below will be used for both of our programs. It has one textarea to accept input, one div to display output, and a button for the user to click to run the program. It has just enough CSS to render text in monospace and prevent default whitespace collapse. It also includes one script, script.js, which will hold all of the necessary JavaScript.

<!DOCTYPE html>
<html>
  <head>
    <title>Hello, Wasm</title>
    <style>
      #input, #output {
        white-space: pre;
        font-family: monospace;
      }
    </style>
  </head>
  <body>
    <textarea id="input"></textarea>
    <button id="btn">Run</button>
    <div id="output"></div>

    <script src="script.js"></script>
  </body>
</html>

To start out the script, let’s just get references to the interface controls to use later.

// script.js

const gebID = id => document.getElementById(id); // shorthand

const input = gebID('input');
const btn = gebID('btn');
const output = gebID('output');

btn.addEventListener('click', runProgram);

/* TODO: import wasm */

function runProgram() {
  /* TODO */
}

Let’s also make one file each for our Wasm programs. They will be written in WebAssembly Text Format, which is typically given the extension .wat. So we start with 4 files in total:

├── index.html
├── lowercase.wat
├── script.js
└── tabs.wat

Note that the human-readable .wat files are not what will eventually be shipped to the browser. They must first be compiled into Wasm bytecode, which we will get to in just a bit.

Connecting WebAssembly and JavaScript

Currently, the only way to work with Wasm modules in the browser is to use the WebAssembly JavaScript API which is exposed on the global WebAssembly namespace.

For context and terminology, here is an excerpt from MDN’s article WebAssembly Concepts:

There are several key concepts needed to understand how WebAssembly runs in the browser. All of these concepts are reflected 1:1 in the WebAssembly JavaScript API.

  • Module: Represents a WebAssembly binary that has been compiled by the browser into executable machine code. A Module is stateless and thus, like a Blob, can be explicitly shared between windows and workers (via postMessage()). A Module declares imports and exports just like an ES2015 module.
  • Memory: A resizable ArrayBuffer that contains the linear array of bytes read and written by WebAssembly’s low-level memory access instructions.
  • Table: A resizable typed array of references (e.g. to functions) that could not otherwise be stored as raw bytes in Memory (for safety and portability reasons).
  • Instance: A Module paired with all the state it uses at runtime including a Memory, Table, and set of imported values. An Instance is like an ES2015 module that has been loaded into a particular global with a particular set of imports.

The JavaScript API provides developers with the ability to create modules, memories, tables, and instances. Given a WebAssembly instance, JavaScript code can synchronously call its exports, which are exposed as normal JavaScript functions. Arbitrary JavaScript functions can also be synchronously called by WebAssembly code by passing in those JavaScript functions as the imports to a WebAssembly instance.

There’s a lot to parse here, but we will take on most of these concepts one by one as we build the programs.

In order to get an instance of a Wasm module, we use the WebAssembly.instantiate() or, more often, WebAssembly.instantiateStreaming() functions. .instantiateStreaming() takes two parameters: a Response object (or promise for one) representing the source of the Wasm bytecode, and an optional importObject with values to be passed into the new Wasm module instance. Both methods return a promise, so loading Wasm modules will always be an asynchronous operation. Let’s update our script to accommodate for that and not let the user click the button until we have a module instance available.

// script.js

const gebID = id => document.getElementById(id); // shorthand

const input = gebID('input');
const btn = gebID('btn');
const output = gebID('output');

btn.setAttribute('disabled', true);(async function loadProgram() {  /* TODO: await getting module instance */  btn.addEventListener('click', runProgram);  btn.removeAttribute('disabled');  function runProgram() {    /* TODO */  }})();

Because .initiateStreaming() can be given a promise for a Response object, it works in tandem with the fetch API to facilitate streaming compilation, which means that the browser engine can start compiling the incoming Wasm bytecode as soon as the first byte arrives!

// script.js

const programURL = 'lowercase.wasm';
const gebID = id => document.getElementById(id); // shorthand

const input = gebID('input');
const btn = gebID('btn');
const output = gebID('output');

(async function loadProgram() {
  const response = await fetch(programURL);  const { module, instance } = await WebAssembly.instantiateStreaming(response);
  btn.addEventListener('click', runProgram);

  function runProgram() {
    /* TODO */
  }
})();

The instantiator function resolves with an object that has two properties: module is a WebAssembly.Module object representing the compiled Wasm module, and instance is a WebAssembly.Instance object (recall our definitions of Module and Instance from above).

The object instance.exports gives us access to all of the module’s exported definitions, including functions, memories, tables, and globals.

Speaking of which, let’s start templating out our first Wasm module file. For now, let’s just export one named function and have it return the value 1 so that we can check and make sure that we have the simplest possible program running end-to-end before going any further.

;; lowercase.wat

(module
  (func $lowercase (result i32)
    i32.const 1
  )
  (export "lowercase" (func $lowercase))
)

If you’re like me and haven’t had any prior exposure to WebAssembly Text, you might be thinking – wat’s with all the parentheses? We’ll get into the details of WAT in a bit; there are just a few more things I want to get out of the way first. For now, trust that the above Wasm module will export a function named lowercase that just returns the number 1. (Also note that a double-semicolon ;; is the demarcation for line comments.)

In order to use the module in the browser, we must compile the human-readable text format into bytecode. The official WebAssembly Binary Toolkit (WABT, delightfully pronounced “wabbit”) provides tools to do exactly that. If you are coding along with me and have not already installed at least wat2wasm, go ahead and do so now.

$ wat2wasm lowercase.wat -o lowercase.wasm

This will output the final bytecode into a file lowercase.wasm. If we want to see the bytes, we have to use a utility that can print binary files as hex.

$ hexdump lowercase.wasm
0000000 00 61 73 6d 01 00 00 00 01 05 01 60 00 01 7f 03
0000010 02 01 00 07 0d 01 09 6c 6f 77 65 72 63 61 73 65
0000020 00 00 0a 06 01 04 00 41 01 0b

Now that we have a valid Wasm module, we can test out its connection in JavaScript.

// script.js

const programURL = 'lowercase.wasm';

const gebID = id => document.getElementById(id); // shorthand

const input = gebID('input');
const btn = gebID('btn');
const output = gebID('output');

(async function loadProgram() {
  const response = await fetch(programURL);
  const { instance } = await WebAssembly.instantiateStreaming(response);

  const { lowercase } = instance.exports;  console.log('output: ', lowercase());  // "output: 1"  // it works!
  btn.addEventListener('click', runProgram);

  function runProgram() {
    /* TODO */
  }
})();

Notice that, as was stated earlier, calls to Wasm functions happen synchronously.


Now that we’ve successfully received data from a Wasm function, how do we pass data into it?

WebAssembly functions can take any number of parameters. The number of parameters, however, is fixed at compile time, while our program must handle an arbitrary amount of input.

We could write a Wasm function that accepts only one character of input at a time and returns the possibly transformed character, and rely on JavaScript to iterate over the input and invoke the Wasm function for each character. In the spirit of exploring Wasm functionality, though, let’s see if there is another way to pass an arbitrary amount of data into a Wasm function.

Sharing Memory

WebAssembly provides the ability for modules to define and access their own linear space of memory (currently, a maximum of one memory per module). As this memory can be shared with the host, it is the only way to share structured or otherwise complex data types between Wasm and the host environment.

The only native data types in Wasm (as of version 1.0) are 32- and 64-bit integers and floating-point (IEEE 754-2008) numbers. Any arguments passed to a Wasm function from JavaScript, and any return value from a Wasm function, must be one of these types. It is therefore not possible for JavaScript to pass in nor receive native JavaScript objects, arrays, or even strings. All values other than the four native Wasm values must somehow be serialized into the linear memory shared between Wasm and its host environment. The spec refers to the memory as “a vector of raw uninterpreted bytes.” This gives great flexibility in terms of how the memory can be used, but leaves more responsibility to module authors to coordinate its use both internally and externally when exposed to the host environment.

To set up shared memory between Wasm and JavaScript, we can either instantiate the memory in JavaScript and import it inside the Wasm module, or create it inside the module and export it to the JavaScript host. Let’s go with the second option for now.

;; lowercase.wat

(module

  (memory (export "mem") 1)
  (func $lowercase (result i32)
    i32.const 1
  )

  (export "lowercase" (func $lowercase))
)

Wasm memory is allocated in units of pages, each page 64 KiB in size. The added line above defines a memory section with an initial size of 1 page and an export name of "mem". For our purposes, let’s assume that all program input will fit in one page of memory.

We can now access the memory in the host environment on the Wasm module’s exports.

// script.js (partial)

async function loadProgram() {
  const response = await fetch(programURL);
  const { instance } = await WebAssembly.instantiateStreaming(response);

  const { lowercase, mem } = instance.exports;  mem instanceof WebAssembly.Memory; // true
  btn.addEventListener('click', runProgram);

  function runProgram() {
    /* TODO */
  }
}

Text Encoding

Now we must come up with a way to serialize input into the shared memory. We will get our input from the textarea on the page into which the user can type. In JavaScript, we have access to the typed text as a normal JavaScript string, which is UTF-8. For our program, let us greatly simplify the logic needed by assuming that all input will be valid ASCII. Luckily, all 7-bit ASCII characters (values 0-12710) need no transformation at the byte level to or from Unicode,2 which will simplify our work further.

WebAssembly.Memory objects expose access to their underlying bytes through an ArrayBuffer property .buffer. We must use a JavaScript typed array to meaningfully interpret the contents of the buffer.

Though we are assuming ASCII input which only requires 7 bits per character, let’s round up to one byte and we can use a Uint8Array to interact with the memory buffer.

// script.js (partial)

async function loadProgram() {
  const response = await fetch(programURL);
  const { instance } = await WebAssembly.instantiateStreaming(response);

  const { lowercase, mem } = instance.exports;
  const memArray = new Uint8Array(mem.buffer);
  btn.addEventListener('click', runProgram);

  function runProgram() {
    let inputStr = input.value; // get text from textarea    // TODO: put text into memArray    // TODO: run wasm function    // TODO: read output from memArray  }
}

Because memArray is an instance of a TypedArray, we cannot directly set its contents to be strings, even if only a single character.

We can use the TextEncoder API to convert a string into an Uint8Array of bytes. Once we have those bytes, we can copy them into the shared memory.

Though TypedArrays have similar instance methods to normal JavaScript Arrays, they do not have mutable methods which would change the length of the array, such as .push() and .pop(). There is thus not a .splice() method to copy values into the typed array; instead, we can use a .set() method unique to TypedArrays which accepts a source array and copies its values into the target, with an optional offset. If we choose to always copy the input to the beginning of the memory, we don’t need to supply an offset.

// script.js (partial)

function runProgram() {
  let inputStr = input.value; // get text from textarea
  let inputBytes = new TextEncoder().encode(inputStr);  memArray.set(inputBytes);  // TODO: run wasm function
  // TODO: read output from memArray
}

I/O Contract

Now that we’ve stored input into the memory, we can pass control to the Wasm function. How will it know what and how much input to process? We need to agree upon some assumptions to build a contract between the Wasm function and its host environment.

The original project description specified that the program should read and process input character-by-character as the user types until a period is typed (as periods are easy to type on a keyboard). In this scenario, however, I think it makes sense to stop on a null character which our host script can insert at the end of the user’s input.

This gives us the following shared assumptions about input:

  • The input will always start at the beginning of the shared memory.
  • The lowercase function should process input until it reaches a null character (0x00).

Now how should the function provide its output? It could start writing output into memory immediately after the input, possibly leaving one null byte as separation. This would have the benefit of leaving the input intact, but require twice as much memory – though again, this isn’t really a problem if we’re assuming relatively small input.

Another strategy could be to overwrite the input in-place. Because characters are processed one by one, and the input and output will always be the same length, overwriting will be relatively straightforward and not require any edge case re-shuffling of unprocessed input. Let’s go with this strategy for now:

  • The function will overwrite the input with the output in-place

With these assumptions set, we can now finish writing the logic in the host script to handle the output after its processing in Wasm. Just as we used a TextEncoder before to encode a string into numeric values, we can now use a TextDecoder to reverse the process and get back a new string.

// script.js (partial)

function runProgram() {
  let inputStr = input.value; // get text from textarea
  let inputBytes = new TextEncoder().encode(inputStr);
  memArray.set(inputBytes);
  lowercase(); // run Wasm function  let outputBytes = memArray.slice(0, inputBytes.length);  let outputStr = new TextDecoder().decode(outputBytes);  output.textContent = outputStr; // write text to output div}

Now, on to writing the lowercase function in WebAssembly land.

Learning WAT and the Wasm Execution Model

Although in most real-world projects Wasm will be the compilation target of some higher-level language, I still think it is worth taking the time to understand its lower-level primitives.

The official WebAssembly Text Format spec, while interesting, is written more for host environment implementers to follow rather than those seeking to learn how to write WAT programs.

MDN’s Understanding WebAssembly text format is a great resource which discusses both the syntax and higher-level concepts of the execution model, and I was helped immensely by the the article Writing WebAssembly By Hand, by Colin Eberhardt at Scott Logic.

Stack vs. Register Machines

In our undergraduate course, programs were written targeting a DOS emulator using an Intel 8086 instruction set. Those with any exposure to lower-level assembly code with such chip architectures will know that it deals with operating on values from any given number of registers which hold active operands and other significant working values. The architecture then requires a large number of operators to perform any given action with operands from combinations of registers, which can be accessed at random. This is generally referred to as a random access machine, a type of register machine.

WebAssembly code, on the other hand, implements a type of stack machine. Rather than writing instructions that explicitly operate on one or more given registers, instructions for stack machines implicitly operate on values that exist on a program stack. As the WebAssembly design semantics explain,

Each function body consists of a list of instructions which forms an implicit block. Execution of instructions proceeds by way of a traditional program counter that advances through the instructions. Instructions fall into two categories: control instructions that form control constructs and simple instructions. Control instructions pop their argument value(s) off the stack, may change the program counter, and push result value(s) onto the stack. Simple instructions pop their argument value(s) from the stack, apply an operator to the values, and then push the result value(s) onto the stack, followed by an implicit advancement of the program counter.

The WebAssembly Design Rationale document goes into further detail about why a specific stack machine model was chosen.

I have an advantage of some experience with stack-based computation as some of my favorite calculators throughout my own undergraduate years operated with reverse Polish notation (RPN). Learning to write entire programs with such a mindset, however, took a bit of adjustment.

Lowercase Logic

Let us map out the overall logic of our lowercase function. It should read each input character one by one as a single byte. If the character is a lowercase letter, it should transform it to the matching uppercase letter. Because we are assuming ASCII-encoded text, this means we are only dealing with the 26 english letters. In ASCII, english letters are represented by two in-order sections of numbers: A-Z are 65-90 (uppercase) and a-z are 97-122 (lowercase). This makes the transformation easy: simply subtract 32 from any uppercase letter to get its lowercase counterpart. The null character, which we are using to detect the end of input, simply has a value of 0.

In pseudocode, our overall logic might appear as

mem_index = 0

loop {
  char = read character from memory at mem_index

  if (char is 0) {
    return
  }

  if (next_char between 'A' and 'Z') {
    char = char + 32
    write char to memory at mem_index
  }

  mem_index = mem_index + 1
}

Let’s start writing some WAT!

We can create a local to maintain the current index of the character being processed. Locals are analogous to variables in JavaScript; they offer a way to store and access values arbitrarily by name rather than having to keep track of the position of a value in the stack and constantly shuffle it around.

We must declare locals at the beginning as part of the function’s signature. In the body of the function, we can refer to locals by their index (0, 1, etc.), or we can optionally give them human-readable labels starting with $. Locals are always initialized with the value 0, so we don’t have to explicitly set it at the beginning.

(func $lowercase
  (local $idx i32) ;; current memory index)

Blocks and Control Flow Instructions

The rest of the function will take place in a loop which will execute for every character read. The loop instruction in Wasm does not itself cause a section of code to be repeated, rather, it acts as a label to which other instructions can jump.

As the spec describes,

The block, loop and if instructions are structured instructions. They bracket nested sequences of instructions, called blocks, terminated with, or separated by, end or else pseudo-instructions. As the grammar prescribes, they must be well-nested.

Each structured control instruction introduces an implicit label. Labels are targets for branch instructions that reference them with label indices […] indexing of labels is relative by nesting depth, that is, label 0 refers to the innermost structured control instruction enclosing the referring branch instruction, while increasing indices refer to those farther out.

Like locals, we can optionally give blocks semantic, human-readable labels with a $, and the wat2wasm compiler will replace the reference with the proper numeric index in the produced bytecode.

Consequently, labels can only be referenced from within the associated structured control instruction. This also implies that branches can only be directed outwards, “breaking” from the block of the control construct they target.

Restricting structured control flow mechanisms in this way – rather than allowing arbitrary jumps or gotos – is an important part of keeping WebAssembly more safe, predictable, and fast to compile and verify.

(func $lowercase
  (local $idx i32) ;; current memory index

  loop $processChar  end ;; loop $processChar)

Although the smallest integer type in WebAssembly is i32, we can read in memory one byte at a time using the instruction i32.load8_u. The .load* instructions take one operand (that is, pop one value off of the stack) to use as the memory offset (or “address”) in bytes. For .load* instructions which read a number of bits narrower than the target data type, the _u unsigned variants zero-extend to fill in the upper bits, which is our desired behavior here dealing with unsigned integers.

Let’s create another local to store the value of the current character after reading it in from memory.

(func $lowercase
  (local $idx i32)  ;; current memory index
  (local $char i32) ;; current character
  loop $processChar
    get_local $idx  ;; push memory index to read    i32.load8_u     ;; read memory at $idx    set_local $char ;; save in local  end ;; loop $processChar
)

Order of Operands

WebAssembly instructions are said to “consume” operands off of the stack. For commutative instructions, such as addition or multiplication, the order of operands does not matter, so the below expressions produce the same result.

i32.const 40
i32.const 2
i32.add

;; equivalent to

i32.const 2
i32.const 40
i32.add

For non-commutative instructions, however, we must take the order of operands into account.

Consider the instruction i32.ge_u, the unsigned greater than or equal comparator for the 32-bit integer type. It consumes two i32 values and produces one i32 value. The spec3 defines its behavior as

Return 1 if i1 is greater than or equal to i2, 0 otherwise.

Which operand is i1? The one which was pushed onto the stack first, or the one which is currently on top of the stack (and thus was pushed last)?

The answer is the former: values are pushed onto the stack in the order in which they are to be consumed.

;; 40 >=? 2
i32.const 40
i32.const 2
i32.ge_u     ;; 1

;; 2 >=? 40
i32.const 2
i32.const 40
i32.ge_u     ;; 0

For just one more example, observe the behavior of the non-commutative subtraction operation:

i32.const 40
i32.const 2
i32.sub      ;; 38

i32.const 2
i32.const 40
i32.sub      ;; -38 (as signed int)

Back to our own program — we now need to check whether the current $char is the null character. If it is, that signals the end of the input, at which point our function should return execution back to the host environment.

In order to compare two values, we must push both onto the stack and then use a comparator instruction. The instruction i32.eq pops two values from the stack and pushes a single i32 boolean value (0x0000 or 0x0001) onto the stack indicating whether the previous two values were equal.

The if instruction is another mechanism for branching. It implicitly consumes one value off of the stack as an operand – our boolean comparison result, in this case. If the value is 0, execution skips to the end statement; otherwise, execution continues to the next statement (“into” the if block, if you will).

(func $lowercase
  (local $idx i32)  ;; current memory index
  (local $char i32) ;; current character

  loop $processChar
    get_local $idx  ;; push memory index to read
    i32.load8_u     ;; read memory at $idx
    set_local $char ;; save in local

    get_local $char ;; push $char    i32.const 0     ;; push 0    i32.eq          ;; compare $char == 0    if      return    end ;; if $char == 0  end ;; loop $processChar
)

Given that the current $char is not the null character, we must now check whether it is an uppercase letter – that is, whether its value falls within 65-90, inclusively. One way to do this is with nested if blocks, one for each end of the range. We can use the instructions i32.ge_u and i32.le_u to compare values for greater-than-or-equal and less-than-or-equal-to, respectively, interpreted as unsigned integers.

(func $lowercase
  (local $idx i32)  ;; current memory index
  (local $char i32) ;; current character

  loop $processChar
    get_local $idx  ;; push memory index to read
    i32.load8_u     ;; read memory at $idx
    set_local $char ;; save in local

    get_local $char ;; push $char
    i32.const 0     ;; push 0
    i32.eq          ;; compare $char == 0
    if
      return
    end ;; if $char == 0

    get_local $char    i32.const 65    i32.ge_u    if      get_local $char      i32.const 90      i32.le_u 90      if        ;; TODO: convert $char to lowercase      end ;; if $char <= 90    end ;; if $char >= 65  end ;; loop $processChar
)

As we figured out earlier, in order to convert an uppercase ASCII letter to lowercase, all we have to do is add 32. To write the new value back into memory, we can use the i32.store8 instruction to store the lowest 8 bits of a value. It will first pop a value to store, then pop another value to use as the memory offset.

(func $lowercase
  (local $idx i32)  ;; current memory index
  (local $char i32) ;; current character

  loop $processChar
    get_local $idx  ;; push memory index to read
    i32.load8_u     ;; read memory at $idx
    set_local $char ;; save in local

    get_local $char ;; push $char
    i32.const 0     ;; push 0
    i32.eq          ;; compare $char == 0
    if
      return
    end ;; if $char == 0

    get_local $char
    i32.const 65
    i32.ge_u
    if
      get_local $char
      i32.const 90
      i32.le_u
      if
        get_local $char        i32.const 32        i32.add        set_local $char        get_local $idx        get_local $char        i32.store8      end ;; if $char <= 90
    end ;; if $char >= 65
  end ;; loop $processChar
)

Finally, we need to increment the $idx index value and explicitly jump back to the beginning of the loop using the br branch instruction.

(func $lowercase
  (local $idx i32)  ;; current memory index
  (local $char i32) ;; current character

  loop $processChar
    get_local $idx  ;; push memory index to read
    i32.load8_u     ;; read memory at $idx
    set_local $char ;; save in local

    get_local $char ;; push $char
    i32.const 0     ;; push 0
    i32.eq          ;; compare $char == 0
    if
      return
    end ;; if $char == 0

    get_local $char
    i32.const 65
    i32.ge_u
    if
      get_local $char
      i32.const 90
      i32.le_u
      if
        get_local $char
        i32.const 32
        i32.add
        set_local $char

        get_local $idx
        get_local $char
        i32.store8
      end ;; if $char <= 90
    end ;; if $char >= 65

    get_local $idx    i32.const 1    i32.add    set_local $idx    br $processChar  end ;; loop $processChar
)

If you’re coding along yourself, compile using wat2wasm now. If you load the host page in a browser, type in some input, and click the Run button, it should now work! Try several combinations of input – upper- and lowercase letters, numbers, spaces, ASCII punctuations. All output should be the same as input except that all output letters a-z should be lowercase.

A few optimizations

Our solution so far has a total of 28 instructions. While we are using quite low-level operations, it still seems like a lot for a loop with just one read, a few comparisons, and one write. There are a few optimizations that we can make.

You may have noticed that there are a couple places where we want to save a value to a local and then immediately want to use that same value in a later operation. However, the set_local instruction consumes a value from the top of the stack, which means we have to get it back using get_local if we want to use it again.

There is another operation called tee_local which will effectively do this for us – it saves a value to a local while keeping that value on the stack for later use. An obvious use for this is near the top of our function:

loop $processChar
  get_local $idx  ;; push memory index to read
  i32.load8_u     ;; read memory at $idx
  ;;- set_local $char

  ;;- get_local $char
  tee_local $char
  i32.const 0     ;; push 0
  i32.eq          ;; compare $char == 0

We might also use tee_local inside of the nested if blocks when we want to store a transformed character. In the existing code below, I’ve included what the local stack looks like in comments next to each instruction, with rightmost values being those on the top of the stack.

if
  get_local $char ;; [$char]
  i32.const 32    ;; [$char, 32]
  i32.add         ;; [$char(+32)]
  set_local $char ;; []

  get_local $idx  ;; [$idx]
  get_local $char ;; [$idx, $char(+32)]
  i32.store8      ;; []
end

The replacement here is not as straightforward; the index and value ($idx and $char) must be pushed onto the stack in the existing order to be consumed by the last i32.store8 instruction. This is where we must embrace the stack – we can push the index onto the stack before even reading or operating on the character value so that it is in the proper place by the time the character operations are done. In fact, with values on the stack in the proper place, we don’t even have to store the transformed character back into the local $char since we don’t use it again after storing in memory.

if
  get_local $idx  ;; [$idx]
  get_local $char ;; [$idx, $char]
  i32.const 32    ;; [$idx, $char, 32]
  i32.add         ;; [$idx, $char(+32)]
  i32.store8      ;; []
end

These two small optimizations have brought our total instruction count down from 28 to 25.

(func $lowercase
  (local $idx i32)  ;; current memory index
  (local $char i32) ;; current character

  loop $processChar
    get_local $idx  ;; push memory index to read
    i32.load8_u     ;; read memory at $idx
    tee_local $char    i32.const 0     ;; push 0
    i32.eq          ;; compare $char == 0
    if
      return
    end ;; if $char == 0

    get_local $char
    i32.const 65
    i32.ge_u
    if
      get_local $char
      i32.const 90
      i32.le_u
      if
        get_local $idx        get_local $char        i32.const 32        i32.add        i32.store8      end ;; if $char <= 90
    end ;; if $char >= 65

    get_local $idx
    i32.const 1
    i32.add
    set_local $idx

    br $processChar
  end ;; loop $processChar
)

The difference might not seem significant in this very small program, but an ~11% reduction in number of instructions could very well indeed be significant depending on your use case. The performance impact would depend on several factors – how the WASM bytecode is compiled to native machine instructions (hardware architecture-dependent), whether the reduced number of instructions come out of hot code paths, etc. – and of course would need empirical measurements to gauge the impact of any changes made. If you are implementing functionality in WebAssembly for reasons of performance, it is worth considering exploring such optimizations.

Other Resources


Update Nov. 18, 2018: Added link to article “WebAssembly’s post-MVP future”

to top