> ## Documentation Index
> Fetch the complete documentation index at: https://numpyts.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# Benchmark Methodology

numpy-ts ships with a comprehensive benchmark suite that measures performance against Python NumPy. This page explains how the benchmarks work.

## What's tested

The suite contains **308 benchmark specifications** across **18 categories**: creation, arithmetic, math, trig, gradient, linalg, reductions, manipulation, io, indexing, bitwise, sorting, logic, statistics, sets, random, polynomials, and fft.

Each specification is tested across multiple dtypes (`float64`, `float32`, `float16`, `int8`-`int64`, `uint8`-`uint64`, `complex64`, `complex128`, `bool`) where applicable, producing **\~2,400 individual benchmarks** in a full run.

Array sizes are configurable:

| Scale            | Array size      | Matrix size |
| ---------------- | --------------- | ----------- |
| Small            | 100 elements    | 32×32       |
| Medium (default) | 1,000 elements  | 100×100     |
| Large            | 10,000 elements | 1,000×1,000 |

All benchmark specifications are defined in [`benchmarks/src/specs.ts`](https://github.com/dupontcyborg/numpy-js/blob/main/benchmarks/src/specs.ts).

## How timing works

Both sides use high-resolution timers: [`performance.now()`](https://github.com/dupontcyborg/numpy-js/blob/main/benchmarks/src/runtime-runner.ts#L131-L135) in the [JS runner](https://github.com/dupontcyborg/numpy-js/blob/main/benchmarks/src/runtime-runner.ts) and [`time.perf_counter()`](https://github.com/dupontcyborg/numpy-js/blob/main/benchmarks/scripts/numpy_benchmark.py#L971-L978) in the [Python runner](https://github.com/dupontcyborg/numpy-js/blob/main/benchmarks/scripts/numpy_benchmark.py). The benchmark measures **computation time only**, from the JS side for numpy-ts and from the Python side for NumPy. This gives an apples-to-apples comparison of the numerical computation itself, without being skewed by JS↔Python interop overhead.

### Auto-calibration

Each benchmark automatically calibrates how many operations to run per sample, targeting a minimum sample time of **100ms**. This eliminates timer resolution noise: if an operation takes 0.001ms, the runner batches 100,000 of them into a single sample rather than measuring one at a time.

The calibration uses exponential scaling (×10 → ×2 → exact) to converge quickly, with a cap of 10 calibration rounds.

### Warmup

Before measurement, each benchmark runs a configurable number of **warmup iterations** to stabilize JIT compilation and ensure WASM modules are compiled:

| Mode     | Warmup iterations | Min sample time | Samples |
| -------- | ----------------- | --------------- | ------- |
| Quick    | 3                 | 50ms            | 1       |
| Standard | 10                | 100ms           | 5       |
| Full     | 20                | 100ms           | 5       |

The published benchmarks on this site use **full mode**.

### Measurement

After warmup and calibration, the runner collects **5 independent samples**. Each sample runs the calibrated number of operations and records the per-operation time. The suite reports:

* **Mean** and **median** time per operation
* **Min** and **max** across samples
* **Standard deviation**
* **Ops/second** (derived from mean time)

The **speedup ratio** shown on the benchmark pages is `numpy-ts ops/s ÷ NumPy ops/s`. A ratio above 1.0x means numpy-ts was faster.

## Fairness

A few design decisions to keep the comparison honest:

* **Same operations, same data.** Both sides run the same algorithm on the same array shapes and dtypes. The Python runner ([`numpy_benchmark.py`](https://github.com/dupontcyborg/numpy-js/blob/main/benchmarks/scripts/numpy_benchmark.py)) mirrors the JS specifications exactly.
* **Computation only.** Timing happens on each side of the boundary. numpy-ts is timed from JS; NumPy is timed from Python. Neither side pays for cross-language overhead.
* **No cherry-picking.** Every benchmark in the spec file runs. Categories where NumPy is faster (trig, math, indexing) are reported alongside categories where numpy-ts wins.
* **Geometric mean for ratios.** Category and overall averages use the geometric mean, which is the [correct method for averaging ratios](https://en.wikipedia.org/wiki/Geometric_mean#Applications).

## Running benchmarks yourself

```bash theme={null}
# Standard run (~5-10 min)
npm run bench

# Full run with more warmup (~30-60 min, used for published results)
npm run bench:full

# Quick sanity check (~1-2 min)
npm run bench:quick

# Test different array sizes
npm run bench -- --size small
npm run bench -- --size large

# Compare across runtimes (Node, Deno, Bun)
npm run bench -- --runtimes
```

Results are saved to `benchmarks/results/` as JSON files.

### Caching

Benchmark results are cached for 24 hours, keyed by machine fingerprint. This prevents stale cross-comparisons when hardware or environment changes. Use `--fresh` to skip the cache and re-run Python benchmarks.
