Skip to main content
numpy-ts ships with a comprehensive benchmark suite that measures performance against Python NumPy. This page explains how the benchmarks work.

What’s tested

The suite contains 308 benchmark specifications across 18 categories: creation, arithmetic, math, trig, gradient, linalg, reductions, manipulation, io, indexing, bitwise, sorting, logic, statistics, sets, random, polynomials, and fft. Each specification is tested across multiple dtypes (float64, float32, float16, int8-int64, uint8-uint64, complex64, complex128, bool) where applicable, producing ~2,400 individual benchmarks in a full run. Array sizes are configurable:
ScaleArray sizeMatrix size
Small100 elements32×32
Medium (default)1,000 elements100×100
Large10,000 elements1,000×1,000
All benchmark specifications are defined in benchmarks/src/specs.ts.

How timing works

Both sides use high-resolution timers: performance.now() in the JS runner and time.perf_counter() in the Python runner. The benchmark measures computation time only, from the JS side for numpy-ts and from the Python side for NumPy. This gives an apples-to-apples comparison of the numerical computation itself, without being skewed by JS↔Python interop overhead.

Auto-calibration

Each benchmark automatically calibrates how many operations to run per sample, targeting a minimum sample time of 100ms. This eliminates timer resolution noise: if an operation takes 0.001ms, the runner batches 100,000 of them into a single sample rather than measuring one at a time. The calibration uses exponential scaling (×10 → ×2 → exact) to converge quickly, with a cap of 10 calibration rounds.

Warmup

Before measurement, each benchmark runs a configurable number of warmup iterations to stabilize JIT compilation and ensure WASM modules are compiled:
ModeWarmup iterationsMin sample timeSamples
Quick350ms1
Standard10100ms5
Full20100ms5
The published benchmarks on this site use full mode.

Measurement

After warmup and calibration, the runner collects 5 independent samples. Each sample runs the calibrated number of operations and records the per-operation time. The suite reports:
  • Mean and median time per operation
  • Min and max across samples
  • Standard deviation
  • Ops/second (derived from mean time)
The speedup ratio shown on the benchmark pages is numpy-ts ops/s ÷ NumPy ops/s. A ratio above 1.0x means numpy-ts was faster.

Fairness

A few design decisions to keep the comparison honest:
  • Same operations, same data. Both sides run the same algorithm on the same array shapes and dtypes. The Python runner (numpy_benchmark.py) mirrors the JS specifications exactly.
  • Computation only. Timing happens on each side of the boundary. numpy-ts is timed from JS; NumPy is timed from Python. Neither side pays for cross-language overhead.
  • No cherry-picking. Every benchmark in the spec file runs. Categories where NumPy is faster (trig, math, indexing) are reported alongside categories where numpy-ts wins.
  • Geometric mean for ratios. Category and overall averages use the geometric mean, which is the correct method for averaging ratios.

Running benchmarks yourself

# Standard run (~5-10 min)
npm run bench

# Full run with more warmup (~30-60 min, used for published results)
npm run bench:full

# Quick sanity check (~1-2 min)
npm run bench:quick

# Test different array sizes
npm run bench -- --size small
npm run bench -- --size large

# Compare across runtimes (Node, Deno, Bun)
npm run bench -- --runtimes
Results are saved to benchmarks/results/ as JSON files.

Caching

Benchmark results are cached for 24 hours, keyed by machine fingerprint. This prevents stale cross-comparisons when hardware or environment changes. Use --fresh to skip the cache and re-run Python benchmarks.