Architecture

One wayland-server compositor per session, running in its own process against a virtual output. Never your real display.

Overview

waymux is a local-first headless Wayland session manager. A per-user daemon (waymuxd) supervises any number of isolated Wayland sessions, each backed by its own in-process compositor that renders to a virtual output. A single waymux CLI drives the daemon over a msgpack-RPC Unix socket to create sessions, spawn applications into them, inject input, capture screenshots, record video, and serve a low-latency WebRTC viewer to a browser.

Sessions can also be attached to an outer compositor (for example niri) so the virtual output appears as a normal window on the host. The canonical, fully open surface is local; the same control surface is reachable remotely over HTTPS for a hosted deployment, but local is the default and the only mode documented here.

A waymux session is a test harness, not a security sandbox. Run untrusted or third-party code in a container or VM, not in a bare session. All clients in a session share the same inner Wayland display and can screen-capture each other.

Component diagram

                         +-------------------------------+
                         |          waymux CLI           |
                         |  (23 subcommands, clap)       |
                         +---------------+---------------+
                                         |
              Transport trait: LocalTransport | RemoteTransport
                                         |
         msgpack-RPC over Unix socket    |   HTTPS + Bearer (hosted)
         $XDG_RUNTIME_DIR/waymux.sock    |
                                         v
  +----------------------------------------------------------------------+
  |                          waymuxd (per-user daemon)                   |
  |                                                                      |
  |   Server (accept loop, SO_PEERCRED uid gate, Hello negotiation,      |
  |           reader/writer tasks, dispatch router, error mapping)       |
  |                              |                                       |
  |                              v                                       |
  |   Registry (core engine: session map, lifecycle, broadcast,         |
  |             supervisors, log history, session_control RPC)          |
  |                              |                                       |
  |   SessionBackend trait  --> LocalBackend (subprocess)               |
  |                                                                      |
  |   cgroup / tmpfs quota (best-effort)   usage_events (feature-gated)  |
  +--------------------------------+-------------------------------------+
                                   | spawn + per-session control socket
                                   v
  +----------------------------------------------------------------------+
  |             waymux-session (one process per session)                 |
  |                                                                      |
  |   Compositor thread          Control socket (tokio, msgpack-RPC)     |
  |   - inner Wayland server      - Info/ListWindows/Resize              |
  |   - xdg-shell, wl-shm,        - Inject{Key,Pointer,Touch,Batch}     |
  |     dmabuf, layer-shell       - Screenshot/ScreenshotDesktop         |
  |   - virtual output            - Record{Start,Stop} Viewer{Start..}   |
  |   - SurfaceData / windows     Events socket --> daemon broadcast     |
  |        |            |              |                                 |
  |        v            v              v                                 |
  |   Attach server   Recording    Viewer (neko-bridge child, Go)        |
  |   (waymux_attach  encoders:     - encoder thread -> Annex-B NALUs   |
  |    _v1, fd-pass)  ffv1/nvenc/    - Unix socket -> Pion WebRTC       |
  |        |          vaapi/vulkan)  - WS signaling + data channel       |
  |        v                              |                              |
  |   Outer compositor (niri)        Browser viewer (video + input)      |
  +----------------------------------------------------------------------+

Control plane

The control plane is the daemon plus its wire protocol. It is the path through which every CLI verb flows: session creation, spawning, input injection, capture, recording, and viewer control.

Transport

The daemon binds a per-user UnixListener (default $XDG_RUNTIME_DIR/waymux.sock, chmod 0600). The CLI connects through a Transport trait with two implementations: LocalTransport (Unix socket, msgpack-RPC) and RemoteTransport (HTTPS with a Bearer token). Eight of the 23 CLI verbs are transport-routable; the rest are local-only.

Every frame is a 4-byte big-endian length prefix followed by a msgpack payload, capped at a 20 MiB MAX_FRAME_SIZE. The first request on a connection must be Hello. The daemon accepts any client protocol from version 1 through the current version (4) and replies with its version and capabilities. A non-Hello first request, protocol version 0, or a version newer than the daemon's own returns E_PROTO_VERSION.

Registry

The Registry is the core engine. It holds a HashMap<String, SessionEntry> of session metadata, supervisor kill channels, child-PID tracking, rolling per-session log history (1024 lines), and a broadcast channel that fans events to all subscribers. Its public methods (create, destroy, spawn_child, session_control, list_windows, resize, screenshot, inject_*, record_*, viewer_*, tag_window, wait_for_idle, attach, detach, shutdown_all) are protocol-agnostic.

The Server's dispatch() is a match over RequestMethod (26 variants) that translates each wire request into a Registry call and maps typed engine errors into stable ErrorCode values (E_NOT_FOUND, E_ALREADY_EXISTS, E_NOT_IMPLEMENTED, E_BACKPRESSURE, E_INTERNAL, and others).

SessionBackend

A SessionBackend async trait (create / destroy / info) abstracts the session-lifecycle path. LocalBackend is the only shipped implementation: a thin wrapper over the Registry that manages subprocess sessions. The trait is the seam a future provisioning target would plug into (see Extension points).

Data plane

Each session is a separate waymux-session process: a full headless Wayland compositor for one virtual output. It holds the inner Wayland server, the capture logic, the recording thread, the attach server, and the viewer.

Inner compositor

The session advertises: xdg-shell, wl-shm, layer-shell, zwp_linux_dmabuf_v1 (GPU buffer import with modifier negotiation), viewporter, pointer/keyboard/touch, data-device (clipboard), presentation-time, pointer-constraints and relative-pointer, keyboard-shortcuts-inhibit, and KDE-specific protocols.

The compositor is observer-only: it tracks surfaces, subsurface trees, toplevels, and damage timestamps without rendering. Composition happens lazily at capture time via a recursive subsurface tree walk that blits into a single ARGB8888 buffer. This means no GPU is required for the compositor itself.

Capture and screenshots

Screenshot RPCs run on the control thread, look up the surface by window id, composite the subsurface tree on the CPU, and encode PNG with the image crate. The protocol prefers a fd-passed Dmabuf format with a PNG-over-SHM fallback. A buffer-hold ref-count mechanism keeps GPU buffers pinned while a capture or encode is in flight.

Recording

Four backends are available. The choice of codec determines whether a GPU is needed:

CodecBackendGPU neededNotes
ffv1CPU readbackNoLossless, default for CI
h264-nvencNVENC subprocessYes (NVIDIA)H.264 hardware encode
h264-vaapiIn-process VAAPIYesH.264 hardware encode
h264-vulkan / hevc-vulkanIn-process VulkanYesZero-copy, fastest

A LatestTaskSlot lets newer frames evict older ones so a slow encoder never back-pressures the compositor. Dual recording (primary plus --secondary-codec) writes two output paths from the same frame tap. Output is always Matroska (.mkv); recording paths are validated to be absolute and free of ...

WebRTC viewer

On ViewerStart the session spawns a Go waymux-neko-bridge child and an encoder thread that produces Annex-B NALUs tuned for low latency (baseline profile, no B-frames, periodic IDR at 60 fps; Vulkan emits every-frame IDR). NALUs cross a private per-session Unix socket using a typed 5-byte-header protocol (NALU, cursor image, cursor position, force-keyframe, inject-op, set-bitrate, shutdown). The bridge wraps frames into WebRTC with Pion (ICE, DTLS, RTP), signals over WebSocket, and exposes a data channel.

Browser input arrives as JSON, is translated to waymux InjectOp, and is written back over the socket into the session control loop. Multi-viewer is last-wins: only the primary viewer's input and GCC bandwidth estimate drive the shared encoder; other viewers receive video fan-out only.

Attach

A second Wayland server within the session advertises waymux_attach_v1. An attach client passes the outer compositor's display fd via SCM_RIGHTS; the session creates a proxy wl_surface / xdg_toplevel on the outer compositor and ferries the inner focused window's frames into an outer SHM buffer on each commit. The ferry path validates same-format ARGB8888 at 1:1 size and falls back to a placeholder otherwise.

Process and socket model

Every session gets its own set of Unix sockets. The daemon spawns the session process and communicates with it through these sockets throughout its lifetime.

SocketDirectionPurpose
inner Wayland displayclients -> sessionThe Wayland compositor socket Wayland clients connect to
control socketdaemon <-> sessionPersistent msgpack-RPC for session control RPCs (screenshot, inject, record, etc.)
events socketsession -> daemonPush stream: window events, damage events, log lines, forwarded to subscribers
attach socketouter client -> sessionwaymux_attach_v1 protocol for embedding into an outer compositor
ready socketsession -> daemonOne-shot startup handshake (5 s timeout); closed after create completes

Same-uid gating. Both the daemon accept loop and the per-session control socket check SO_PEERCRED and reject any connection whose uid differs from the owner. The daemon socket is chmod 0600. This is the primary local trust boundary.

Lifecycle. create spawns the session subprocess, waits for the ready handshake, sets up best-effort cgroup and tmpfs quota handles, and starts a session_supervisor task that owns the Child, drains stdout/stderr into the log ring, and emits SessionCreated. destroy removes the session, SIGTERMs tracked child PIDs, signals the supervisor, lazy-unmounts the tmpfs, and cleans up the cgroup. The supervisor also handles natural exit, emitting SessionDestroyed.

Spawning clients. spawn_child requires an absolute argv[0], clears the environment and re-adds only safe variables, optionally applies an fd-limit rlimit, joins the cgroup, and tracks the PID for crash detection.

Resource capping. SessionCgroup (cgroup v2) and SessionTmpfs are best-effort: if CAP_SYS_ADMIN is absent or a write fails, the daemon logs a warning and the session runs uncapped rather than failing.

Headless no-GPU path

The compositor does no rendering of its own. Frames are captured from CPU memory (SHM or CPU-mapped Dmabuf), screenshots are encoded on the CPU, and FFV1 is a CPU encoder. The whole loop runs green on a stock shared CI runner, making waymux the Wayland equivalent of Xvfb.

# Force Mesa software rendering (llvmpipe) and disable the DRM syncobj path
# (no /dev/dri node = implicit sync only).
export LIBGL_ALWAYS_SOFTWARE=1
export GALLIUM_DRIVER=llvmpipe
export WAYMUX_DISABLE_SYNCOBJ=1
waymux serve &                        # or let the first command auto-spawn the daemon

waymux new app --size 1280x800
WAYLAND_DISPLAY="$XDG_RUNTIME_DIR/waymux/app/wayland.sock" kwrite notes.txt &
waymux wait app --timeout-ms 15000
waymux screenshot-desktop app -o shot.png
waymux record start app --codec ffv1   # CPU-encoded lossless
waymux record stop app
CapabilityGPU needed?
Nested compositor + virtual outputNo
Hosting Wayland + XWayland appsNo
Screenshot (PNG, from CPU memory)No
FFV1 lossless recording (CPU codec)No
Keyboard / pointer / touch injectionNo
Live WebRTC viewer (default codecs)Yes
Hardware video encode (NVENC/VAAPI/Vulkan)Yes

The first five capabilities are everything needed for functional, layout, input, and visual-regression testing. CI does not need the viewer at all: it screenshots and records instead.

For Chromium, use --app=<url> mode: the page becomes the toplevel surface, which is both the most reliable thing to capture in software and the cleanest target for a pixel diff. In normal (non-app) mode, web page content renders in an uncaptured subsurface.

The repository ships a ready-made end-to-end test at tests/e2e/run-e2e-embedded.sh that launches Chromium and a KDE app as direct Wayland clients under llvmpipe, asserts the captured frames have real content (a center-contrast check), injects keystrokes and verifies they changed the capture, records FFV1, and counts genuinely unique frames (duplicate min-fps padding is stripped with mpdecimate first).

A GPU-free Docker image at tests/e2e/Dockerfile runs the same harness with no --gpus and no /dev/dri:

docker build -f tests/e2e/Dockerfile -t waymux-e2e .
docker run --rm waymux-e2e

Crate layout

waymux is a single Cargo workspace plus one Go module (the web viewer bridge).

CrateBinary / libResponsibility
waymux-cli waymux binary 23 subcommands (clap), dispatched through run_with_transport() (8 transport-routable) and run_local_only(). Holds the Transport trait and the credentials loader.
waymux-protocol lib (published) Wire contract: RequestMethod (26 variants), Response, EventBody (10 variants), SessionCtlMethod, supporting enums, and encode_frame / decode_frame. Serialization via rmp-serde with named fields and #[serde(default)] throughout.
waymux-daemon waymuxd binary Registry engine, SessionBackend trait with LocalBackend, Server (accept loop, per-connection handler, dispatch router, event forwarder, error mapping), cgroup/quota/usage-events modules, and main bootstrap.
waymux-session session subprocess The per-session compositor. Subsystems: compositor (Wayland protocol dispatch, surface and window tracking), state (thread-safe Arc<State>), control (session RPC server), recording and its encoder backends, the attach server, and the viewer (encoder thread + bridge supervision).
waymux-attach waymux-attach binary Attach client: connects to an outer compositor's display and to the session's attach socket to embed a session's surface as a native outer window.
waymux-mcp waymux-mcp binary MCP server that exposes every discrete request/response CLI verb to agents by execing the CLI through an argument vector (no shell, no injection). Streaming verbs and credential-writing login are intentionally excluded.
waymux-mux-mkv lib (published) Matroska muxer used by the recording subsystem to write .mkv output.
waymux-neko-bridge Go binary Slim-vendored Go WebRTC bridge (derived from neko, Apache-2.0). Handles Pion WebRTC, WebSocket signaling, Ed25519 viewer-token validation, multi-viewer fan-out, GCC bandwidth feedback, and input translation. Spawned as a child of waymux-session.

Key data flows

Create a session

  1. CLI new sends Hello then CreateSession over the local socket.
  2. Server gates on Hello, then dispatch() calls registry.create().
  3. Registry spawns waymux-session with the socket set and waits for the ready handshake (5 s timeout).
  4. Registry installs cgroup/tmpfs handles and the supervisor, then emits SessionCreated.
  5. CLI prints name (WxH).

Spawn a client

  1. CLI spawn sends Spawn {argv, env, compositor}.
  2. dispatch() calls registry.spawn_child(), which validates argv[0], sanitizes the environment, joins the cgroup, and starts the process with WAYLAND_DISPLAY pointing at the session's inner socket.
  3. The Registry records the PID, drains its logs, and on exit emits ChildExited or SessionCrashed.
  4. CLI prints pid N.

Screenshot

  1. CLI screenshot sends Screenshot {window_id, format}.
  2. The daemon forwards via session_control() to the session control socket.
  3. The session composites the subsurface tree on the CPU, encodes PNG, and returns width/height plus the PNG bytes.
  4. CLI writes the raw PNG to a file or stdout; metadata goes to stderr.

Record

  1. CLI record start sends RecordStart {path, codec, secondary_codec, mode, min_fps}.
  2. The session selects an encoder, validates the output path (absolute, no ..), and starts a recording thread fed by the compositor frame tap.
  3. CLI prints the primary path (and the secondary path if a secondary codec was set).
  4. record stop finalizes the MKV container.

Live view

  1. CLI viewer start sends ViewerStart {bind, port}.
  2. The session probes a viewer codec (NVENC, then Vulkan), spawns the neko-bridge child, and starts an encoder thread.
  3. The browser opens the bridge URL, completes WebSocket signaling, and receives the H.264 WebRTC stream; input flows back over the data channel as InjectOp.
  4. CLI prints the viewer URL.

Security model

waymux is designed to be safe to run locally without any network exposure or privileged configuration.

PropertyMechanism
Local-first The default path is a per-user Unix socket. No network listener is opened in the local configuration.
Same-uid only Every connection (daemon socket and per-session control socket) is gated by SO_PEERCRED; foreign uids are rejected. The daemon socket is chmod 0600 and the credentials directory is enforced at 0700.
Process hardening spawn_child requires an absolute argv[0], clears the environment and re-adds only safe variables, and can apply an fd-limit rlimit. Dmabuf imports are capped at 256 MiB.
Fail-closed viewer token The WebRTC bridge verifies viewer JWTs with an Ed25519 public key only. A compromised VM can never forge a token because the private key stays on the control plane. On a non-loopback bind, a missing public key or an invalid token rejects all viewers.
Bridge DoS hardening The bridge caps concurrent viewers (default 8) and in-flight handshakes per source IP (default 4), returning 503 when exceeded, with bounded per-connection send queues.
Recording path safety Recording paths are validated to be absolute and free of .. before the recording thread starts.

Shared session trust model. Any client connected to a session's inner wayland.sock can screen-capture every other window in that session (the compositor advertises wlr-screencopy to all clients). Treat all apps sharing one session as mutually trusting. Isolate untrusted apps in separate sessions, containers, or VMs. Access control also depends on XDG_RUNTIME_DIR being 0700: on systemd, /run/user/<uid> is already 0700, but if you point XDG_RUNTIME_DIR elsewhere, keep the directory mode 0700.

What is validated vs experimental. The local control plane, session lifecycle, input injection (key/pointer/touch), screenshots, recording, and the WebRTC viewer are implemented and exercised by the test suite. InjectSelector is the one reserved protocol slot that currently returns E_NOT_IMPLEMENTED: resolve the target with windows / wait and inject with an explicit window_id instead. Coordinate scaling for non-1x outputs is not yet implemented.

Extension points

waymux is designed for evolution. The primary extension seams are:

  • SessionBackend trait. Implement create / destroy / info to target a new provisioning substrate. The local subprocess backend is the only shipped implementation, but the trait was designed from the start with a remote VM backend in mind.
  • Recording and viewer encoders. The codec backends are pluggable behind the recording task interface. New encoders slot in alongside ffv1/nvenc/vaapi/vulkan without touching the rest of the session.
  • Protocol evolution. New RequestMethod / SessionCtlMethod variants and struct fields are added with #[serde(default)] so older peers keep parsing. The version handshake accepts any client protocol from 1 through the daemon's current version.
  • Event subscribers. Clients subscribe to topic-filtered events (sessions, windows, damage, logs, with :name scoping and log replay on subscribe). This is the integration point for external monitoring and automation.
  • Attach protocol. waymux_attach_v1 is the seam for embedding a session's output into any outer Wayland compositor via display-fd passing.

Compositor support matrix

waymux hosts direct Wayland clients (any WAYLAND_DISPLAY-aware app) in every session. It also supports running a nested inner compositor as a client.

Inner compositorStatusNotes
Direct Wayland clients (Chromium, foot, Qt/GTK apps) Validated The primary use case. Works in software and hardware rendering.
KWin / KDE Plasma 6 Validated Runs as a nested inner compositor. On AMD, AMD_DEBUG=nodcc or RADV_DEBUG=nodcc is required to avoid a DCC-tiled buffer capture stall.
niri (Smithay) Validated Hardware-rendered on AMD Renoir. Same AMD_DEBUG=nodcc requirement.
Hyprland (wlroots) Experimental Aquamarine binds xdg_wm_base v6 and wl_seat v9; the session currently advertises v5 / v7, so Hyprland fails to start. Unblocking is a Phase 3 item.

More broadly, the session advertises a limited set of dmabuf formats (ARGB/XRGB8888 only, no multiplanar/YUV) with LINEAR plus EGL-importable tiled modifiers, and a limited set of Wayland interface versions. A compositor that needs more globals or higher interface versions may fall back to software or fail to start. Broader validation and a published support matrix are planned for Phase 3.