What’s tested
The suite contains 308 benchmark specifications across 18 categories: creation, arithmetic, math, trig, gradient, linalg, reductions, manipulation, io, indexing, bitwise, sorting, logic, statistics, sets, random, polynomials, and fft. Each specification is tested across multiple dtypes (float64, float32, float16, int8-int64, uint8-uint64, complex64, complex128, bool) where applicable, producing ~2,400 individual benchmarks in a full run.
Array sizes are configurable:
| Scale | Array size | Matrix size |
|---|---|---|
| Small | 100 elements | 32×32 |
| Medium (default) | 1,000 elements | 100×100 |
| Large | 10,000 elements | 1,000×1,000 |
benchmarks/src/specs.ts.
How timing works
Both sides use high-resolution timers:performance.now() in the JS runner and time.perf_counter() in the Python runner. The benchmark measures computation time only, from the JS side for numpy-ts and from the Python side for NumPy. This gives an apples-to-apples comparison of the numerical computation itself, without being skewed by JS↔Python interop overhead.
Auto-calibration
Each benchmark automatically calibrates how many operations to run per sample, targeting a minimum sample time of 100ms. This eliminates timer resolution noise: if an operation takes 0.001ms, the runner batches 100,000 of them into a single sample rather than measuring one at a time. The calibration uses exponential scaling (×10 → ×2 → exact) to converge quickly, with a cap of 10 calibration rounds.Warmup
Before measurement, each benchmark runs a configurable number of warmup iterations to stabilize JIT compilation and ensure WASM modules are compiled:| Mode | Warmup iterations | Min sample time | Samples |
|---|---|---|---|
| Quick | 3 | 50ms | 1 |
| Standard | 10 | 100ms | 5 |
| Full | 20 | 100ms | 5 |
Measurement
After warmup and calibration, the runner collects 5 independent samples. Each sample runs the calibrated number of operations and records the per-operation time. The suite reports:- Mean and median time per operation
- Min and max across samples
- Standard deviation
- Ops/second (derived from mean time)
numpy-ts ops/s ÷ NumPy ops/s. A ratio above 1.0x means numpy-ts was faster.
Fairness
A few design decisions to keep the comparison honest:- Same operations, same data. Both sides run the same algorithm on the same array shapes and dtypes. The Python runner (
numpy_benchmark.py) mirrors the JS specifications exactly. - Computation only. Timing happens on each side of the boundary. numpy-ts is timed from JS; NumPy is timed from Python. Neither side pays for cross-language overhead.
- No cherry-picking. Every benchmark in the spec file runs. Categories where NumPy is faster (trig, math, indexing) are reported alongside categories where numpy-ts wins.
- Geometric mean for ratios. Category and overall averages use the geometric mean, which is the correct method for averaging ratios.
Running benchmarks yourself
benchmarks/results/ as JSON files.
Caching
Benchmark results are cached for 24 hours, keyed by machine fingerprint. This prevents stale cross-comparisons when hardware or environment changes. Use--fresh to skip the cache and re-run Python benchmarks.