WebAssembly will bring them all together


I thought about how to tie a plug-in system on WebAssembly to my pet project. This will potentially allow you to reuse existing code in Go, C ++, Rust, if, of course, it exists. It will also get rid of so/dll, which is convenient when distributing plugins when the project is a desktop application and is built for Windows, OSX, GNU/Linux. So I went to see how it’s done in Envoy.

Background of Envoy

As of early 2019, Envoy is a static binary with all extensions compiled at build time, so you need to maintain multiple binary builds instead of using the official and unmodified Envoy binary. For projects that don’t control their deployments, it’s even more problematic because updating extensions requires rebuilding and deploying the entire Envoy.

Advantages

  • Flexibility. Extensions can be delivered and reloaded at runtime. Any changes or fixes can be tested at runtime, without the need to update and redeploy a new binary.

  • Reliability and isolation. Extensions run in a sandbox and can be limited in terms of CPU and memory consumption.

  • Safety. Extensions run in a sandbox with a well-defined API for communicating with proxies (envoy, nginx etc.). They have limited access to properties that they can change.

  • Diversity. Large selection of programming languages ​​that can compile to WebAssembly, allowing developers of all skill levels (C, Go, Rust, Java, TypeScript, etc.) to write extensions.

  • Portability. Since the interface between the host environment and extensions is proxy independent, extensions written using Proxy-Wasm can run in different proxy servers, for example Envoy, NGINX, ATS or even inside the gRPC library (assuming they all implement the standard).

Flaws

  • Higher memory consumption due to the need to run many virtual machines, each with its own block of memory.

  • Slower performance for extensions that transform payloads due to the need to copy significant amounts of data in and out of the sandbox.

  • Lower performance for CPU-bound tasks. It is expected that the slowdown will be less than 2 times compared to native code.

  • Increased binary size due to the need to include the Wasm runtime. This is ~20 MB for WAVM and ~10 MB for V8.

  • The WebAssembly ecosystem is still young and development is currently focused on use in the browser, where JavaScript is considered the host environment.

General scheme

Envoy took the C ++ API, screwed the Wasm VM and redirected calls to the wasm module.

There can be several filters in one wasm module. Instances of Wasm VM are multiplied and hosted in thread-local storage

Communication between Wasm VM instances is carried out by shared data and message queue primitives. Services are singletons and run on the main Envoy thread. They are executed in parallel with filters and perform auxiliary functions: logs, statistics, etc.

Runtime

Wasm VM is one of the following runtimes

Specification

The ABI specification is divided into two large blocks: the functions implemented in module and functions implemented in host environment. I will single out two functions: memory allocation proxy_on_memory_allocatepoint of entry _start.

The specification represents a set of functions in the format proxy_log

arguments:

  • i32 (proxy_log_level_t) log_level

  • i32 (const char*) message_data

  • i32 (size_t) message_size

return value:

i32 is numeric type in wasmand this is how it looks in different SDKs

extern "C" WasmResult proxy_log(LogLevel level,
                                const char *logMessage,
                                size_t messageSize);
package internal

//export proxy_log
func ProxyLog(logLevel LogLevel,
              messageData *byte,
              messageSize int) Status
// @ts-ignore: decorator
@external("env", "proxy_log")
export declare function proxy_log(level: LogLevel,
                                  logMessage: ptr<char>,
                                  messageSize: size_t): WasmResult;

Memory Possession

This is probably one of the most important topics when building such a system, where there is memory on the host side and memory in the wasm module. No managed memory is passed to the wasm module when calling handlers. Instead, the wasm module itself requests data. For example proxy_on_http_request_body transmits information about the number of bytes available in the request body, the module must request this data using proxy_get_buffer. When this happens, the host allocates a mash on the wasm module by calling proxy_on_memory_allocate, copies data there and gives back the memory to the wasm module, in the hope that it will free it.

proxy_on_memory_allocate

arguments:

return value:

Implementation in AssemblyScript malloc.ts

import {
  __pin,
  __unpin,
} from "rt/itcms";

/// Allow host to allocate memory.
export function malloc(size: i32): usize {
  let buffer = new ArrayBuffer(size);
  let ptr = changetype<usize>(buffer);
  return __pin(ptr);
}

/// Allow host to free memory.
export function free(ptr: usize): void {
  __unpin(ptr);
}

Reverse transformation

class ArrayBufferReference {
  private buffer: usize;
  private size: usize;

  constructor() {
  }

  sizePtr(): usize {
    return changetype<usize>(this) + offsetof<ArrayBufferReference>("size");
  }
  bufferPtr(): usize {
    return changetype<usize>(this) + offsetof<ArrayBufferReference>("buffer");
  }

  // Before calling toArrayBuffer below, you must call out to the host to fill in the values.
  // toArrayBuffer below **must** be called once and only once.
  toArrayBuffer(): ArrayBuffer {
    if (this.size == 0) {
      return new ArrayBuffer(0);
    }

    let array = changetype<ArrayBuffer>(this.buffer);
    // host code used malloc to allocate this buffer.
    // release the allocated ptr. array variable will retain it, so it won't be actually free (as it is ref counted).
    free(this.buffer);
    // should we return a this sliced up to size?
    return array;
  }
}

AssemblyScript laid the behavior in which objects can be given to the external environment (primarily in JS). For this there is __pin/__unpin, so that the garbage collector does not collect objects that are no longer referenced. In Go

//nolint
//export proxy_on_memory_allocate
func proxyOnMemoryAllocate(size uint) *byte {
	buf := make([]byte, size)
	return &buf[0]
}

Reverse transformation

import (
	"reflect"
	"unsafe"
)

func RawBytePtrToString(raw *byte, size int) string {
	//nolint
	return *(*string)(unsafe.Pointer(&reflect.SliceHeader{
		Data: uintptr(unsafe.Pointer(raw)),
		Len:  size,
		Cap:  size,
	}))
}

func RawBytePtrToByteSlice(raw *byte, size int) []byte {
	//nolint
	return *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{
		Data: uintptr(unsafe.Pointer(raw)),
		Len:  size,
		Cap:  size,
	}))
}

What can be said here? Specification go nothing is said about the work of the garbage collector. Compiler with https://go.dev/ prohibits passing memory pointers from Go to C. Have a package go-pointerwhich is a bit like pin/unpin from AssemblyScript.

C.pass_pointer(pointer.Save(&s))
v := *(pointer.Restore(C.get_from_pointer()).(*string))

Pretty simple on the inside.

package pointer

// #include <stdlib.h>
import "C"
import (
	"sync"
	"unsafe"
)

var (
	mutex sync.RWMutex
	store = map[unsafe.Pointer]interface{}{}
)

func Save(v interface{}) unsafe.Pointer {
	if v == nil {
		return nil
	}

	// Generate real fake C pointer.
	// This pointer will not store any data, but will bi used for indexing purposes.
	// Since Go doest allow to cast dangling pointer to unsafe.Pointer, we do rally allocate one byte.
	// Why we need indexing, because Go doest allow C code to store pointers to Go data.
	var ptr unsafe.Pointer = C.malloc(C.size_t(1))
	if ptr == nil {
		panic("can't allocate 'cgo-pointer hack index pointer': ptr == nil")
	}

	mutex.Lock()
	store[ptr] = v
	mutex.Unlock()

	return ptr
}

BUT developers use Tiny Goin which the garbage collector is simpler and starts when there is not enough heap space. If between calls proxy_on_memory_allocate and there is no memory allocation by the moment Go takes back ownership, then it is conditionally safe.

But in the C++ SDK proxy_on_memory_allocate you won’t see. Malloc is being searched, which is exported by the compiler

$ em++ --no-entry -s EXPORTED_FUNCTIONS=['_malloc'] ... 

The host side is looking for mallocif not, then search for proxy_on_memory_allocate

void WasmBase::getFunctions() {
#define _GET(_fn) wasm_vm_->getFunction(#_fn, &_fn##_);
#define _GET_ALIAS(_fn, _alias) wasm_vm_->getFunction(#_alias, &_fn##_);
  _GET(_initialize);
  if (_initialize_) {
    _GET(main);
  } else {
    _GET(_start);
  }

  _GET(malloc);
  if (!malloc_) {
    _GET_ALIAS(malloc, proxy_on_memory_allocate);
  }
  if (!malloc_) {
    fail(FailState::MissingFunction, "Wasm module is missing malloc function.");
  }
#undef _GET_ALIAS
#undef _GET

  // Try to point the capability to one of the module exports, if the capability has been allowed.
#define _GET_PROXY(_fn)                                                                            \
  if (capabilityAllowed("proxy_" #_fn)) {                                                          \
    wasm_vm_->getFunction("proxy_" #_fn, &_fn##_);                                                 \
  } else {                                                                                         \
    _fn##_ = nullptr;                                                                              \
  }
#define _GET_PROXY_ABI(_fn, _abi)                                                                  \
  if (capabilityAllowed("proxy_" #_fn)) {                                                          \
    wasm_vm_->getFunction("proxy_" #_fn, &_fn##_abi##_);                                           \
  } else {                                                                                         \
    _fn##_abi##_ = nullptr;                                                                        \
  }

  FOR_ALL_MODULE_FUNCTIONS(_GET_PROXY);

  if (abiVersion() == AbiVersion::ProxyWasm_0_1_0) {
    _GET_PROXY_ABI(on_request_headers, _abi_01);
    _GET_PROXY_ABI(on_response_headers, _abi_01);
  } else if (abiVersion() == AbiVersion::ProxyWasm_0_2_0 ||
             abiVersion() == AbiVersion::ProxyWasm_0_2_1) {
    _GET_PROXY_ABI(on_request_headers, _abi_02);
    _GET_PROXY_ABI(on_response_headers, _abi_02);
    _GET_PROXY(on_foreign_function);
  }
#undef _GET_PROXY_ABI
#undef _GET_PROXY
}

Entry point _start

By writing in Go

package main

import (
	"math/rand"
	"time"

	"github.com/tetratelabs/proxy-wasm-go-sdk/proxywasm"
	"github.com/tetratelabs/proxy-wasm-go-sdk/proxywasm/types"
)

const tickMilliseconds uint32 = 1000

func main() {
	proxywasm.SetVMContext(&vmContext{})
}

type vmContext struct {
	// Embed the default VM context here,
	// so that we don't need to reimplement all the methods.
	types.DefaultVMContext
}

AssemblyScript

export * from "@solo-io/proxy-runtime/proxy"; // this exports the required functions for the proxy to interact with us.
import { RootContext, Context, registerRootContext, FilterHeadersStatusValues, stream_context } from "@solo-io/proxy-runtime";

class AddHeaderRoot extends RootContext {
  createContext(context_id: u32): Context {
    return new AddHeader(context_id, this);
  }
}

class AddHeader extends Context {
  constructor(context_id: u32, root_context: AddHeaderRoot) {
    super(context_id, root_context);
  }
  onResponseHeaders(a: u32, end_of_stream: bool): FilterHeadersStatusValues {
    const root_context = this.root_context;
    if (root_context.getConfiguration() == "") {
      stream_context.headers.response.add("hello", "world!");
    } else {
      stream_context.headers.response.add("hello", root_context.getConfiguration());
    }
    return FilterHeadersStatusValues.Continue;
  }
}

registerRootContext((context_id: u32) => { return new AddHeaderRoot(context_id); }, "add_header");

C++

#include <string>
#include <string_view>
#include <stdlib.h>

#include "proxy_wasm_intrinsics.h"

class ExampleContext : public Context {
public:
  explicit ExampleContext(uint32_t id, RootContext *root) : Context(id, root) {}

  FilterHeadersStatus onRequestHeaders(uint32_t headers, bool end_of_stream) override;
};

static RegisterContextFactory register_ExampleContext(CONTEXT_FACTORY(ExampleContext));


FilterHeadersStatus ExampleContext::onRequestHeaders(uint32_t, bool) {
  LOG_DEBUG(std::string("print from wasm, onRequestHeaders, context id: ") + std::to_string(id()));

  auto result = getRequestHeaderPairs();
  auto pairs = result->pairs();
  for (auto &p : pairs) {
    LOG_INFO(std::string("print from wasm, ") + std::string(p.first) + std::string(" -> ") + std::string(p.second));
  }

  return FilterHeadersStatus::Continue;
}

We need an entry point that initializes the C++/Go/AssemblyScript runtime and executes something like main. WASI for such purposes offers _start and _initialize. Although there is only _start in the spec, both options are available on the host

class WasmBase : public std::enable_shared_from_this<WasmBase> {
 //s..
protected:
  //...
  WasmCallVoid<0> _initialize_; /* WASI reactor (Emscripten v1.39.17+, Rust nightly) */
  WasmCallVoid<0> _start_;      /* WASI command (Emscripten v1.39.0+, TinyGo) */

  WasmCallWord<2> main_;
  WasmCallWord<1> malloc_;
  //...
};

Strings and Associative Containers

One of the drawbacks is memory copying. Not only copying, but also conversion may be required. In Go, strings are easy byte sequence, but usually there is UTF-8. In AssemblyScript, this is a UCS-2 sequence and needs to be converted.

export function log(level: LogLevelValues, logMessage: string): void {
  // from the docs:
  // Like JavaScript, AssemblyScript stores strings in UTF-16 encoding represented by the API as UCS-2, 
  let buffer = String.UTF8.encode(logMessage);
  imports.proxy_log(level as imports.LogLevel, changetype<usize>(buffer), buffer.byteLength);
}

And passing the familiar map container will require additional packing / unpacking

func DeserializeMap(bs []byte) [][2]string {
	numHeaders := binary.LittleEndian.Uint32(bs[0:4])
	var sizeIndex = 4
	var dataIndex = 4 + 4*2*int(numHeaders)
	ret := make([][2]string, numHeaders)
	for i := 0; i < int(numHeaders); i++ {
		keySize := int(binary.LittleEndian.Uint32(bs[sizeIndex : sizeIndex+4]))
		sizeIndex += 4
		keyPtr := bs[dataIndex : dataIndex+keySize]
		key := *(*string)(unsafe.Pointer(&keyPtr))
		dataIndex += keySize + 1

		valueSize := int(binary.LittleEndian.Uint32(bs[sizeIndex : sizeIndex+4]))
		sizeIndex += 4
		valuePtr := bs[dataIndex : dataIndex+valueSize]
		value := *(*string)(unsafe.Pointer(&valuePtr))
		dataIndex += valueSize + 1
		ret[i] = [2]string{key, value}
	}
	return ret
}

func SerializeMap(ms [][2]string) []byte {
	size := 4
	for _, m := range ms {
		// key/value's bytes + len * 2 (8 bytes) + nil * 2 (2 bytes)
		size += len(m[0]) + len(m[1]) + 10
	}

	ret := make([]byte, size)
	binary.LittleEndian.PutUint32(ret[0:4], uint32(len(ms)))

	var base = 4
	for _, m := range ms {
		binary.LittleEndian.PutUint32(ret[base:base+4], uint32(len(m[0])))
		base += 4
		binary.LittleEndian.PutUint32(ret[base:base+4], uint32(len(m[1])))
		base += 4
	}

	for _, m := range ms {
		for i := 0; i < len(m[0]); i++ {
			ret[base] = m[0][i]
			base++
		}
		base++ // nil

		for i := 0; i < len(m[1]); i++ {
			ret[base] = m[1][i]
			base++
		}
		base++ // nil
	}
	return ret
}
function serializeHeaders(headers: Headers): ArrayBuffer {
  let result = new ArrayBuffer(pairsSize(headers));
  let sizes = Uint32Array.wrap(result, 0, 1 + 2 * headers.length);
  sizes[0] = headers.length;

  // header sizes:
  let index = 1;

  // for in loop doesn't seem to be supported..
  for (let i = 0; i < headers.length; i++) {
    let header = headers[i];
    sizes[index] = header.key.byteLength;
    index++;
    sizes[index] = header.value.byteLength;
    index++;
  }

  let data = Uint8Array.wrap(result, sizes.byteLength);

  let currentOffset = 0;
  // for in loop doesn't seem to be supported..
  for (let i = 0; i < headers.length; i++) {
    let header = headers[i];
    // i'm sure there's a better way to copy, i just don't know what it is :/
    let wrappedKey = Uint8Array.wrap(header.key);
    let keyData = data.subarray(currentOffset, currentOffset + wrappedKey.byteLength);
    for (let i = 0; i < wrappedKey.byteLength; i++) {
      keyData[i] = wrappedKey[i];
    }
    currentOffset += wrappedKey.byteLength + 1; // + 1 for terminating nil

    let wrappedValue = Uint8Array.wrap(header.value);
    let valueData = data.subarray(currentOffset, currentOffset + wrappedValue.byteLength);
    for (let i = 0; i < wrappedValue.byteLength; i++) {
      valueData[i] = wrappedValue[i];
    }
    currentOffset += wrappedValue.byteLength + 1; // + 1 for terminating nil
  }
  return result;
}

function deserializeHeaders(headers: ArrayBuffer): Headers {
  if (headers.byteLength == 0) {
    return [];
  }
  let numheaders = Uint32Array.wrap(headers, 0, 1)[0];
  let sizes = Uint32Array.wrap(headers, sizeof<u32>(), 2 * numheaders);
  let data = headers.slice(sizeof<u32>() * (1 + 2 * numheaders));
  let result: Headers = [];
  let sizeIndex = 0;
  let dataIndex = 0;
  // for in loop doesn't seem to be supported..
  for (let i: u32 = 0; i < numheaders; i++) {
    let keySize = sizes[sizeIndex];
    sizeIndex++;
    let header_key_data = data.slice(dataIndex, dataIndex + keySize);
    dataIndex += keySize + 1; // +1 for nil termination.

    let valueSize = sizes[sizeIndex];
    sizeIndex++;
    let header_value_data = data.slice(dataIndex, dataIndex + valueSize);
    dataIndex += valueSize + 1; // +1 for nil termination.

    let pair = new HeaderPair(header_key_data, header_value_data);
    result.push(pair);
  }

  return result;
}

That’s all. useful links

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *