Benchmarks

Reproducible numbers, software-rendered, no GPU. Everything below is measured by the harness in tests/e2e/ and you can re-run it yourself.

Honest metric. We report unique_fps: genuinely distinct frames per second, counted with ffmpeg mpdecimate. A recording's nominal/container fps pads toward a minimum rate with duplicate frames, so it overstates motion and is not comparable across tools.

Recording throughput across CI runners

The same harness (tests/e2e/ci-bench.sh) recording a lossless FFV1 clip of an animated page, on three machines:

config (FFV1, software)	GitLab EPYC 7B13, 2 vCPU	GitHub EPYC 7763, 4 vCPU	laptop Ryzen 5700U, 16T
Chromium 720p whole-desktop	10.0	10.1	9.9
Chromium 1080p whole-desktop	9.7	9.9	9.9
Chromium 1080p focused-window	15.5	36.3	43.2
KWrite 1080p whole-desktop	7.6	7.8	8.0

Whole-desktop capture pins near 10 unique fps regardless of core count (2 → 4 → 16): it is bounded by synchronous frame readback on the compositor thread, not the CPU. Focused-window capture is several times faster and scales with the machine. Practical guidance: record a single app in --mode focused-window for the smoothest clip; whole-desktop is for capturing the broader session at a steady ~10 fps.

Lossless recording: waymux vs the standard X11 approach

Same machine, same page (the demanding software-WebGL waymux logo), 1280x720, 8 seconds, no GPU. waymux focused-window vs the usual headless-CI recipe (Xvfb + ffmpeg x11grab + x264 -qp 0). Unique fps on a 4-vCPU GitHub runner, before and after a recording-pipeline pass:

method (lossless)	before	after
waymux focused-window (FFV1)	6.5	5.8
waymux focused-window (x264-lossless)	n/a	11.1
Xvfb + ffmpeg x11grab (x264)	10.6	10.8

The original FFV1 path lost to x11grab on a constrained runner (6.5 vs 10.6): FFV1 is a heavier CPU encoder, and the WebGL source already saturates the four cores, so multithreading FFV1 did not help here. Adding a CPU x264-lossless codec (libx264rgb, bit-exact RGB, fed bgr0 to skip a conversion, with a writer thread decoupling the pipe) closes the gap: waymux now matches x11grab (11.1 vs 10.8) at the same CPU and file size. The same parity holds on a 16-thread laptop (x264-lossless and x11grab within a frame of each other, both ahead of FFV1).

So you get x11grab-class lossless throughput and waymux's integration: Wayland-native, any app per-window, no Xvfb/ffmpeg plumbing, and recording paired with spawn, input, and screenshots in one tool. FFV1 stays the default for exact-pixel determinism; reach for --codec x264-lossless on small runners.

Reproduce it

No checkout needed: the published image carries the harness:

# recording throughput sweep (table 1)
docker run --rm --shm-size=512m -e RECORD_SECS=10 -e ARTIFACT_DIR=/out \
  -v "$PWD/out:/out" --entrypoint dbus-run-session \
  ghcr.io/waymux/waymux-ci -- bash /app/tests/e2e/ci-bench.sh
cat out/benchmark.md

The head-to-head additionally needs Xvfb (apt-get install -y xvfb inside the image); the GitHub workflow does exactly that and attaches both clips.

The test content

The head-to-head page is a glowing 3D w rendered in software WebGL (SwiftShader, no GPU), demanding enough to stress the full render + capture pipeline, and a fun thing to watch. Open it →