Architecture
One wayland-server compositor per session, running in its own process
against a virtual output. Never your real display.
Overview
waymux is a local-first headless Wayland session manager. A per-user daemon
(waymuxd) supervises any number of isolated Wayland sessions, each
backed by its own in-process compositor that renders to a virtual output. A single
waymux CLI drives the daemon over a msgpack-RPC Unix socket to create
sessions, spawn applications into them, inject input, capture screenshots, record
video, and serve a low-latency WebRTC viewer to a browser.
Sessions can also be attached to an outer compositor (for example niri) so the virtual output appears as a normal window on the host. The canonical, fully open surface is local; the same control surface is reachable remotely over HTTPS for a hosted deployment, but local is the default and the only mode documented here.
A waymux session is a test harness, not a security sandbox. Run untrusted or third-party code in a container or VM, not in a bare session. All clients in a session share the same inner Wayland display and can screen-capture each other.
Component diagram
+-------------------------------+
| waymux CLI |
| (23 subcommands, clap) |
+---------------+---------------+
|
Transport trait: LocalTransport | RemoteTransport
|
msgpack-RPC over Unix socket | HTTPS + Bearer (hosted)
$XDG_RUNTIME_DIR/waymux.sock |
v
+----------------------------------------------------------------------+
| waymuxd (per-user daemon) |
| |
| Server (accept loop, SO_PEERCRED uid gate, Hello negotiation, |
| reader/writer tasks, dispatch router, error mapping) |
| | |
| v |
| Registry (core engine: session map, lifecycle, broadcast, |
| supervisors, log history, session_control RPC) |
| | |
| SessionBackend trait --> LocalBackend (subprocess) |
| |
| cgroup / tmpfs quota (best-effort) usage_events (feature-gated) |
+--------------------------------+-------------------------------------+
| spawn + per-session control socket
v
+----------------------------------------------------------------------+
| waymux-session (one process per session) |
| |
| Compositor thread Control socket (tokio, msgpack-RPC) |
| - inner Wayland server - Info/ListWindows/Resize |
| - xdg-shell, wl-shm, - Inject{Key,Pointer,Touch,Batch} |
| dmabuf, layer-shell - Screenshot/ScreenshotDesktop |
| - virtual output - Record{Start,Stop} Viewer{Start..} |
| - SurfaceData / windows Events socket --> daemon broadcast |
| | | | |
| v v v |
| Attach server Recording Viewer (neko-bridge child, Go) |
| (waymux_attach encoders: - encoder thread -> Annex-B NALUs |
| _v1, fd-pass) ffv1/nvenc/ - Unix socket -> Pion WebRTC |
| | vaapi/vulkan) - WS signaling + data channel |
| v | |
| Outer compositor (niri) Browser viewer (video + input) |
+----------------------------------------------------------------------+
Control plane
The control plane is the daemon plus its wire protocol. It is the path through which every CLI verb flows: session creation, spawning, input injection, capture, recording, and viewer control.
Transport
The daemon binds a per-user UnixListener (default
$XDG_RUNTIME_DIR/waymux.sock, chmod 0600). The CLI connects through a
Transport trait with two implementations: LocalTransport
(Unix socket, msgpack-RPC) and RemoteTransport (HTTPS with a Bearer
token). Eight of the 23 CLI verbs are transport-routable; the rest are local-only.
Every frame is a 4-byte big-endian length prefix followed by a msgpack payload,
capped at a 20 MiB MAX_FRAME_SIZE. The first request on a connection
must be Hello. The daemon accepts any client protocol from version 1
through the current version (4) and replies with its version and capabilities. A
non-Hello first request, protocol version 0, or a version newer than the daemon's
own returns E_PROTO_VERSION.
Registry
The Registry is the core engine. It holds a
HashMap<String, SessionEntry> of session metadata, supervisor
kill channels, child-PID tracking, rolling per-session log history (1024 lines),
and a broadcast channel that fans events to all subscribers. Its public methods
(create, destroy, spawn_child,
session_control, list_windows, resize,
screenshot, inject_*, record_*,
viewer_*, tag_window, wait_for_idle,
attach, detach, shutdown_all) are
protocol-agnostic.
The Server's dispatch() is a match over RequestMethod
(26 variants) that translates each wire request into a Registry call and maps
typed engine errors into stable ErrorCode values
(E_NOT_FOUND, E_ALREADY_EXISTS,
E_NOT_IMPLEMENTED, E_BACKPRESSURE,
E_INTERNAL, and others).
SessionBackend
A SessionBackend async trait (create /
destroy / info) abstracts the session-lifecycle path.
LocalBackend is the only shipped implementation: a thin wrapper over
the Registry that manages subprocess sessions. The trait is the seam a future
provisioning target would plug into (see Extension
points).
Data plane
Each session is a separate waymux-session process: a full headless
Wayland compositor for one virtual output. It holds the inner Wayland server, the
capture logic, the recording thread, the attach server, and the viewer.
Inner compositor
The session advertises: xdg-shell, wl-shm,
layer-shell, zwp_linux_dmabuf_v1 (GPU buffer import
with modifier negotiation), viewporter, pointer/keyboard/touch, data-device
(clipboard), presentation-time, pointer-constraints and relative-pointer,
keyboard-shortcuts-inhibit, and KDE-specific protocols.
The compositor is observer-only: it tracks surfaces, subsurface trees, toplevels, and damage timestamps without rendering. Composition happens lazily at capture time via a recursive subsurface tree walk that blits into a single ARGB8888 buffer. This means no GPU is required for the compositor itself.
Capture and screenshots
Screenshot RPCs run on the control thread, look up the surface by window id,
composite the subsurface tree on the CPU, and encode PNG with the
image crate. The protocol prefers a fd-passed Dmabuf format with
a PNG-over-SHM fallback. A buffer-hold ref-count mechanism keeps GPU buffers
pinned while a capture or encode is in flight.
Recording
Four backends are available. The choice of codec determines whether a GPU is needed:
| Codec | Backend | GPU needed | Notes |
|---|---|---|---|
ffv1 | CPU readback | No | Lossless, default for CI |
h264-nvenc | NVENC subprocess | Yes (NVIDIA) | H.264 hardware encode |
h264-vaapi | In-process VAAPI | Yes | H.264 hardware encode |
h264-vulkan / hevc-vulkan | In-process Vulkan | Yes | Zero-copy, fastest |
A LatestTaskSlot lets newer frames evict older ones so a slow encoder
never back-pressures the compositor. Dual recording (primary plus
--secondary-codec) writes two output paths from the same frame tap.
Output is always Matroska (.mkv); recording paths are validated to be
absolute and free of ...
WebRTC viewer
On ViewerStart the session spawns a Go waymux-neko-bridge
child and an encoder thread that produces Annex-B NALUs tuned for low latency
(baseline profile, no B-frames, periodic IDR at 60 fps; Vulkan emits every-frame
IDR). NALUs cross a private per-session Unix socket using a typed 5-byte-header
protocol (NALU, cursor image, cursor position, force-keyframe, inject-op,
set-bitrate, shutdown). The bridge wraps frames into WebRTC with Pion (ICE, DTLS,
RTP), signals over WebSocket, and exposes a data channel.
Browser input arrives as JSON, is translated to waymux InjectOp, and
is written back over the socket into the session control loop.
Multi-viewer is last-wins: only the primary viewer's input and GCC bandwidth
estimate drive the shared encoder; other viewers receive video fan-out only.
Attach
A second Wayland server within the session advertises
waymux_attach_v1. An attach client passes the outer compositor's
display fd via SCM_RIGHTS; the session creates a proxy
wl_surface / xdg_toplevel on the outer compositor and
ferries the inner focused window's frames into an outer SHM buffer on each
commit. The ferry path validates same-format ARGB8888 at 1:1 size and falls back
to a placeholder otherwise.
Process and socket model
Every session gets its own set of Unix sockets. The daemon spawns the session process and communicates with it through these sockets throughout its lifetime.
| Socket | Direction | Purpose |
|---|---|---|
| inner Wayland display | clients -> session | The Wayland compositor socket Wayland clients connect to |
| control socket | daemon <-> session | Persistent msgpack-RPC for session control RPCs (screenshot, inject, record, etc.) |
| events socket | session -> daemon | Push stream: window events, damage events, log lines, forwarded to subscribers |
| attach socket | outer client -> session | waymux_attach_v1 protocol for embedding into an outer compositor |
| ready socket | session -> daemon | One-shot startup handshake (5 s timeout); closed after create completes |
Same-uid gating. Both the daemon accept loop and the per-session
control socket check SO_PEERCRED and reject any connection whose uid
differs from the owner. The daemon socket is chmod 0600. This is the primary local
trust boundary.
Lifecycle. create spawns the session subprocess,
waits for the ready handshake, sets up best-effort cgroup and tmpfs quota
handles, and starts a session_supervisor task that owns the
Child, drains stdout/stderr into the log ring, and emits
SessionCreated. destroy removes the session, SIGTERMs
tracked child PIDs, signals the supervisor, lazy-unmounts the tmpfs, and cleans
up the cgroup. The supervisor also handles natural exit, emitting
SessionDestroyed.
Spawning clients. spawn_child requires an absolute
argv[0], clears the environment and re-adds only safe variables,
optionally applies an fd-limit rlimit, joins the cgroup, and tracks the PID for
crash detection.
Resource capping. SessionCgroup (cgroup v2) and
SessionTmpfs are best-effort: if CAP_SYS_ADMIN is
absent or a write fails, the daemon logs a warning and the session runs uncapped
rather than failing.
Headless no-GPU path
The compositor does no rendering of its own. Frames are captured from CPU memory (SHM or CPU-mapped Dmabuf), screenshots are encoded on the CPU, and FFV1 is a CPU encoder. The whole loop runs green on a stock shared CI runner, making waymux the Wayland equivalent of Xvfb.
# Force Mesa software rendering (llvmpipe) and disable the DRM syncobj path
# (no /dev/dri node = implicit sync only).
export LIBGL_ALWAYS_SOFTWARE=1
export GALLIUM_DRIVER=llvmpipe
export WAYMUX_DISABLE_SYNCOBJ=1
waymux serve & # or let the first command auto-spawn the daemon
waymux new app --size 1280x800
WAYLAND_DISPLAY="$XDG_RUNTIME_DIR/waymux/app/wayland.sock" kwrite notes.txt &
waymux wait app --timeout-ms 15000
waymux screenshot-desktop app -o shot.png
waymux record start app --codec ffv1 # CPU-encoded lossless
waymux record stop app
| Capability | GPU needed? |
|---|---|
| Nested compositor + virtual output | No |
| Hosting Wayland + XWayland apps | No |
| Screenshot (PNG, from CPU memory) | No |
| FFV1 lossless recording (CPU codec) | No |
| Keyboard / pointer / touch injection | No |
| Live WebRTC viewer (default codecs) | Yes |
| Hardware video encode (NVENC/VAAPI/Vulkan) | Yes |
The first five capabilities are everything needed for functional, layout, input, and visual-regression testing. CI does not need the viewer at all: it screenshots and records instead.
For Chromium, use --app=<url> mode: the page becomes the
toplevel surface, which is both the most reliable thing to capture in software
and the cleanest target for a pixel diff. In normal (non-app) mode, web page
content renders in an uncaptured subsurface.
The repository ships a ready-made end-to-end test at
tests/e2e/run-e2e-embedded.sh that launches Chromium and a KDE app
as direct Wayland clients under llvmpipe, asserts the captured frames have real
content (a center-contrast check), injects keystrokes and verifies they changed
the capture, records FFV1, and counts genuinely unique frames (duplicate
min-fps padding is stripped with mpdecimate first).
A GPU-free Docker image at tests/e2e/Dockerfile runs the same
harness with no --gpus and no /dev/dri:
docker build -f tests/e2e/Dockerfile -t waymux-e2e .
docker run --rm waymux-e2e
Crate layout
waymux is a single Cargo workspace plus one Go module (the web viewer bridge).
| Crate | Binary / lib | Responsibility |
|---|---|---|
waymux-cli |
waymux binary |
23 subcommands (clap), dispatched through run_with_transport() (8 transport-routable) and run_local_only(). Holds the Transport trait and the credentials loader. |
waymux-protocol |
lib (published) | Wire contract: RequestMethod (26 variants), Response, EventBody (10 variants), SessionCtlMethod, supporting enums, and encode_frame / decode_frame. Serialization via rmp-serde with named fields and #[serde(default)] throughout. |
waymux-daemon |
waymuxd binary |
Registry engine, SessionBackend trait with LocalBackend, Server (accept loop, per-connection handler, dispatch router, event forwarder, error mapping), cgroup/quota/usage-events modules, and main bootstrap. |
waymux-session |
session subprocess | The per-session compositor. Subsystems: compositor (Wayland protocol dispatch, surface and window tracking), state (thread-safe Arc<State>), control (session RPC server), recording and its encoder backends, the attach server, and the viewer (encoder thread + bridge supervision). |
waymux-attach |
waymux-attach binary |
Attach client: connects to an outer compositor's display and to the session's attach socket to embed a session's surface as a native outer window. |
waymux-mcp |
waymux-mcp binary |
MCP server that exposes every discrete request/response CLI verb to agents by execing the CLI through an argument vector (no shell, no injection). Streaming verbs and credential-writing login are intentionally excluded. |
waymux-mux-mkv |
lib (published) | Matroska muxer used by the recording subsystem to write .mkv output. |
waymux-neko-bridge |
Go binary | Slim-vendored Go WebRTC bridge (derived from neko, Apache-2.0). Handles Pion WebRTC, WebSocket signaling, Ed25519 viewer-token validation, multi-viewer fan-out, GCC bandwidth feedback, and input translation. Spawned as a child of waymux-session. |
Key data flows
Create a session
- CLI
newsendsHellothenCreateSessionover the local socket. - Server gates on Hello, then
dispatch()callsregistry.create(). - Registry spawns
waymux-sessionwith the socket set and waits for the ready handshake (5 s timeout). - Registry installs cgroup/tmpfs handles and the supervisor, then emits
SessionCreated. - CLI prints
name (WxH).
Spawn a client
- CLI
spawnsendsSpawn {argv, env, compositor}. dispatch()callsregistry.spawn_child(), which validatesargv[0], sanitizes the environment, joins the cgroup, and starts the process withWAYLAND_DISPLAYpointing at the session's inner socket.- The Registry records the PID, drains its logs, and on exit emits
ChildExitedorSessionCrashed. - CLI prints
pid N.
Screenshot
- CLI
screenshotsendsScreenshot {window_id, format}. - The daemon forwards via
session_control()to the session control socket. - The session composites the subsurface tree on the CPU, encodes PNG, and returns width/height plus the PNG bytes.
- CLI writes the raw PNG to a file or stdout; metadata goes to stderr.
Record
- CLI
record startsendsRecordStart {path, codec, secondary_codec, mode, min_fps}. - The session selects an encoder, validates the output path (absolute, no
..), and starts a recording thread fed by the compositor frame tap. - CLI prints the primary path (and the secondary path if a secondary codec was set).
record stopfinalizes the MKV container.
Live view
- CLI
viewer startsendsViewerStart {bind, port}. - The session probes a viewer codec (NVENC, then Vulkan), spawns the neko-bridge child, and starts an encoder thread.
- The browser opens the bridge URL, completes WebSocket signaling, and receives the H.264 WebRTC stream; input flows back over the data channel as
InjectOp. - CLI prints the viewer URL.
Security model
waymux is designed to be safe to run locally without any network exposure or privileged configuration.
| Property | Mechanism |
|---|---|
| Local-first | The default path is a per-user Unix socket. No network listener is opened in the local configuration. |
| Same-uid only | Every connection (daemon socket and per-session control socket) is gated by SO_PEERCRED; foreign uids are rejected. The daemon socket is chmod 0600 and the credentials directory is enforced at 0700. |
| Process hardening | spawn_child requires an absolute argv[0], clears the environment and re-adds only safe variables, and can apply an fd-limit rlimit. Dmabuf imports are capped at 256 MiB. |
| Fail-closed viewer token | The WebRTC bridge verifies viewer JWTs with an Ed25519 public key only. A compromised VM can never forge a token because the private key stays on the control plane. On a non-loopback bind, a missing public key or an invalid token rejects all viewers. |
| Bridge DoS hardening | The bridge caps concurrent viewers (default 8) and in-flight handshakes per source IP (default 4), returning 503 when exceeded, with bounded per-connection send queues. |
| Recording path safety | Recording paths are validated to be absolute and free of .. before the recording thread starts. |
Shared session trust model. Any client connected to a session's
inner wayland.sock can screen-capture every other window in that
session (the compositor advertises wlr-screencopy to all clients).
Treat all apps sharing one session as mutually trusting. Isolate untrusted apps
in separate sessions, containers, or VMs. Access control also depends on
XDG_RUNTIME_DIR being 0700: on systemd, /run/user/<uid>
is already 0700, but if you point XDG_RUNTIME_DIR elsewhere, keep
the directory mode 0700.
What is validated vs experimental. The local control plane,
session lifecycle, input injection (key/pointer/touch), screenshots, recording,
and the WebRTC viewer are implemented and exercised by the test suite.
InjectSelector is the one reserved protocol slot that currently
returns E_NOT_IMPLEMENTED: resolve the target with
windows / wait and inject with an explicit
window_id instead. Coordinate scaling for non-1x outputs is not yet
implemented.
Extension points
waymux is designed for evolution. The primary extension seams are:
-
SessionBackend trait. Implement
create/destroy/infoto target a new provisioning substrate. The local subprocess backend is the only shipped implementation, but the trait was designed from the start with a remote VM backend in mind. - Recording and viewer encoders. The codec backends are pluggable behind the recording task interface. New encoders slot in alongside ffv1/nvenc/vaapi/vulkan without touching the rest of the session.
-
Protocol evolution. New
RequestMethod/SessionCtlMethodvariants and struct fields are added with#[serde(default)]so older peers keep parsing. The version handshake accepts any client protocol from 1 through the daemon's current version. -
Event subscribers. Clients subscribe to topic-filtered events
(
sessions,windows,damage,logs, with:namescoping and log replay on subscribe). This is the integration point for external monitoring and automation. -
Attach protocol.
waymux_attach_v1is the seam for embedding a session's output into any outer Wayland compositor via display-fd passing.
Compositor support matrix
waymux hosts direct Wayland clients (any WAYLAND_DISPLAY-aware app)
in every session. It also supports running a nested inner compositor as a client.
| Inner compositor | Status | Notes |
|---|---|---|
| Direct Wayland clients (Chromium, foot, Qt/GTK apps) | Validated | The primary use case. Works in software and hardware rendering. |
| KWin / KDE Plasma 6 | Validated | Runs as a nested inner compositor. On AMD, AMD_DEBUG=nodcc or RADV_DEBUG=nodcc is required to avoid a DCC-tiled buffer capture stall. |
| niri (Smithay) | Validated | Hardware-rendered on AMD Renoir. Same AMD_DEBUG=nodcc requirement. |
| Hyprland (wlroots) | Experimental | Aquamarine binds xdg_wm_base v6 and wl_seat v9; the session currently advertises v5 / v7, so Hyprland fails to start. Unblocking is a Phase 3 item. |
More broadly, the session advertises a limited set of dmabuf formats (ARGB/XRGB8888 only, no multiplanar/YUV) with LINEAR plus EGL-importable tiled modifiers, and a limited set of Wayland interface versions. A compositor that needs more globals or higher interface versions may fall back to software or fail to start. Broader validation and a published support matrix are planned for Phase 3.