v1.1.0, numpy-ts ships Zig-compiled WebAssembly microkernels that transparently accelerate compute- and memory-bound operations. The library remains lightweight and tree-shakeable. WASM acceleration is invisible to the end user, just faster.
How it works
numpy-ts detects at runtime when a WASM kernel is available and beneficial for a given operation. If the input meets the dispatch criteria (sufficient size, supported dtype, contiguous memory layout), the operation runs through the WASM kernel. Otherwise, the pure TypeScript implementation is used. There is no API change: the same functions, the same signatures, the same results. Each WASM kernel is:- Compiled from Zig with
ReleaseFastoptimizations and WASM SIMD128 enabled - Embedded as base64 in the JavaScript bundle. No extra network requests, no
.wasmfiles to serve, works with any bundler - Lazily initialized on first use — zero startup cost if a kernel is never called
- Sharing a single
WebAssembly.Memoryinstance across all kernels to minimize overhead
What’s accelerated
WASM kernels cover 97 operations across these modules:| Module | Speedup | vs NumPy (before) | vs NumPy (after) |
|---|---|---|---|
| Arithmetic | ~23x | 65x slower | 2.75x slower |
| Linear Algebra | ~19x | 61x slower | 3.2x slower |
| Logic | ~27x | 48x slower | 1.8x slower |
| Manipulation | ~9x | 15x slower | 1.6x slower |
| Gradient | ~60x | 30x slower | 2x faster |
| FFT | ~3x | 22x slower | 8x slower |
| Random | ~6x | 11x slower | 1.9x slower |
| Indexing | ~5.5x | 12x slower | 2.2x slower |
All benchmarks measure computation time within numpy-ts and within NumPy respectively — no FFI or serialization overhead is counted. This gives an apples-to-apples comparison of the numerical computation itself. For more info, see performance benchmarks.
Architecture
Zig kernels
The WASM kernels are implemented in Zig, chosen for its:- First-class WASM target support with SIMD intrinsics
- Zero-overhead abstractions and comptime generics
- No runtime or GC; minimal binary size
matmul, reduction, unary, binary, sort, linalg, fft) is compiled to a standalone .wasm binary, then base64-encoded into a TypeScript wrapper.
Memory model
All kernels share a singleWebAssembly.Memory instance with a bump allocator. Before each kernel call:
- The allocator resets to
heapBase(zero-cost reset) - Input data is written into WASM memory
- The kernel operates in-place or writes output to a separate region
- Results are read back into JavaScript typed arrays
Dispatch logic
Each accelerated function checks:- Size threshold: small arrays are faster in pure JS (no WASM call overhead)
- Dtype support: the kernel must support the input dtype (e.g., float32, float64)
- Contiguity: many kernels require C-contiguous input for optimal performance
Tree-shaking
WASM kernels are individually tree-shakeable. If your application only usesadd and matmul, only those kernel wrappers (and the shared memory instance) are included in your bundle. Unused kernels are eliminated by your bundler.