rtl8196e-uart-bridge — design notes¶
This document explains what the driver does, why it exists, and the key
choices that shaped the stabilised code in
rtl8196e_uart_bridge_main.c. The companion operator reference is
README.md; the SSH-tunnel hardening recipe lives in
SECURITY.md.
Scope¶
The bridge carries bytes between /dev/ttyS1 (UART1 on the RTL8196E)
and a single TCP client on port 8888 of the gateway. It is the single
shared transport for every host ↔ radio conversation:
| Gateway-side firmware | Host-side client | Protocol over the bridge |
|---|---|---|
| NCP-UART-HW (EmberZNet 7.5.1) | Z2M / ZHA | EZSP v13 |
| RCP-UART-HW (802.15.4) | cpcd → zigbeed |
CPC (Zigbee or Thread) |
| OT-RCP (OpenThread) | otbr-agent |
Spinel-over-CPC |
Anything that talked to the radio over the former userspace
serialgateway now talks to the bridge on the same TCP endpoint. The
host-side stack does not care which kernel-side path ships the bytes;
the transport is the same.
Why the old userspace path hit a wall¶
Pre-v3.0 the same shovel role was fulfilled by serialgateway, a
~520 LOC userspace daemon looping read(tty) / write(tcp) and
vice-versa. On a 400 MHz Lexra RLX4181 (single core, no SMP, no
hardware integer divide), each byte batch crossed the user/kernel
boundary four times and relied on the scheduler waking the daemon
within a tight window set by the 16-byte UART RX FIFO.
At 460 800 baud the wire delivers a byte every 21.7 µs, leaving
roughly 170 µs to drain eight freshly filled FIFO slots before an
overrun is latched. In practice the daemon started missing that
window intermittently under load — /proc/tty/driver/serial showed
non-zero oe: counts, and the ASH / CPC stack above it reacted with
retransmits or outright disconnects. Above 460 800 the problem became
reliably reproducible.
The userspace design could be micro-tuned (wake priorities, pinned memory, larger reads) but any serious fix had to remove the context switches, not make them faster.
The kernel-side shovel¶
The driver forwards bytes entirely in kernel context:
UART1 RX -> 8250 ISR -> tty flip buffer -> bridge receive_buf()
-> kernel_sendmsg(TCP)
TCP RX -> bridge worker kthread
-> tty->ops->write(UART1 TX)
Hook point: tty_port.client_ops override, not a line discipline¶
An earlier prototype used a line discipline. It worked, but two things made us move the hook one level down:
receive_roomate bytes. The default ldisc path consults a per-ldiscreceive_roombudget before delivering data from the flip buffer. When the TCP sendmsg was slow for any reason (client side slow-reader, short burst of congestion) the ldisc would silently drop characters that our code would have happily swallowed into thedrops_errcounter.- Extra ldisc reference-counting round-trip on every flip-buffer flush with no real ldisc behaviour to justify it.
The bridge instead overrides tty_port->client_ops and installs its
own receive_buf, bypassing tty_port_default_receive_buf and the
ldisc layer entirely. The received bytes are forwarded to the TCP
socket with MSG_DONTWAIT; the return value back to the tty core is
always the full count (bytes are accounted for in the driver's drop
counters, not fed back as flow-control pressure to the flip buffer).
Single kthread for TCP accept/recv¶
The TX direction (TCP → UART) runs in one kernel thread that blocks in
kernel_accept() then kernel_recvmsg(), and writes the received
bytes into the UART via the existing tty->ops->write path. This is
the minimal amount of concurrency the driver needs: one thread per
listen socket, one client at a time.
Single-client listener¶
The radio has exactly one consumer at any given moment — Z2M, cpcd,
or otbr-agent. Supporting multiple simultaneous TCP clients would
require duplicating the byte stream and tracking per-client state in a
path that already runs on a tight CPU budget. The bridge accepts one,
refuses additional connects until the first closes, and stops worrying
about it. Clients that reconnect immediately after a disconnect simply
win the next kernel_accept().
Sysfs-only control interface¶
All knobs (tty, baud, port, bind_addr, flow_control,
enable, nrst_pulse, nrst_gpio, status_led_brightness) are
exposed as module parameters under
/sys/module/rtl8196e_uart_bridge/parameters/. Writes are applied
live — the set callbacks teardown and rebuild only the subsystem they
touch (e.g. changing baud reconfigures ktermios without dropping
the TCP client; changing bind_addr rebuilds the listen socket but
keeps the connected client). This avoids shipping a userspace tool
just to control the bridge and makes every setting scriptable from
/etc/init.d.
enable = 0 at load, armed by init script¶
The module loads unconditionally with the kernel but does nothing
until an init script — S50uart_bridge in the userdata overlay —
writes enable=1 once /dev/ttyS1 is known to exist. This avoids
the auto-arm race where the bridge would try to open the tty before
the 8250 driver had created the device node. The init script also
pulls FIRMWARE_BAUD and BRIDGE_BIND from /userdata/etc/radio.conf
before arming, so the operator's persistent choices land on every
boot without a second tool.
STATUS LED fired from the worker, not from userspace¶
Pre-v3.0, the userspace serialgateway daemon wrote
/sys/class/leds/status/brightness directly from its TCP accept and
disconnect paths. Moving the shovel into the kernel lost that hook,
so operators reported the STATUS LED staying off in Zigbee mode even
when Z2M was connected.
The bridge restores it via the Linux LED-trigger subsystem:
- At module init the driver registers a trigger named
uart-bridge-client(led_trigger_register_simple()). - On
kernel_accept()success the worker fires the trigger at the brightness stored instatus_led_brightness(clamped to 0-255). - On disconnect and on disarm the worker fires the trigger at 0.
Userspace binds the trigger to the actual LED with
echo uart-bridge-client > /sys/class/leds/status/trigger; the init
script S50uart_bridge does this at boot and also maps the eth0
led_mode (bright/dim/off) to 255/60/0 for the brightness. The
coupling between the bridge and the LED class goes only through the
well-defined trigger API — no direct sysfs access from kernel, no
hard-coded device names beyond the trigger label.
Changing status_led_brightness while a client is already connected
does not update the LED live; the new value takes effect on the next
connect. That's intentional: an operator changing the brightness is
usually tuning things up, not trying to flicker the LED on a running
session.
Runtime flow-control flip for flash mode¶
The Gecko Bootloader's Xmodem upload uses 115200 baud with hardware
flow control off. A flash ends up temporarily re-purposing the
same UART/TCP path, so the bridge exposes flow_control as a writable
sysfs knob. flash_efr32.sh flips it to 0 for the transfer and
back to 1 afterwards; the bridge stays armed throughout and the TCP
listen socket never drops, so nothing on the host side has to
reconnect.
Software flow control (v1.2) — XON/XOFF lives in the bridge¶
Boards without RTS/CTS wiring between the SoC and the radio (e.g. the
Sengled G4 port, discussions #119/#123) pair a software-flow-control
radio firmware (NCP-UART-SW) with flow_control=sw. Such firmware
escapes data-plane 0x11/0x13 bytes, so a bare XON/XOFF on the wire is
genuine flow control.
Two design points:
- The bridge handles XON/XOFF itself, not termios. The hot path
bypasses the line discipline (see above), and termios
IXON/IXOFFare implemented by the ldisc — setting them would do nothing. In sw modebridge_receive_sw_locked()scans each UART chunk, strips bare XON/XOFF from the TCP-bound stream and gates the TCP→UART worker viasw_tx_paused. A welcome side effect of consuming the control bytes locally: the remote host (Z2M over TCP) sees a clean ASH stream and needs no software-flow-control support of its own. - The flow control is asymmetric by design. The bridge honors
the radio's XOFF (pauses TX up to a bounded 1 s, then fails open and
counts
tx_pause_timeouts) but never emits XOFF toward the radio: the UART→TCP direction keeps its historical drop-and-count semantics (MSG_DONTWAIT), and at ≤1 Mbaud the 400 MHz SoC never falls behind a 16-byte UART FIFO. The bounded wait also keeps the disarm path's synchronouskthread_stop()safe — the worker can never be parked indefinitely on a lost XON.
hw and none modes are untouched: the RX path branches once on the mode and takes the exact v1.1 single-send code otherwise.
DT-seeded defaults (v1.2), module params still in charge¶
Board ports kept patching the two genuinely board-specific knobs (nRST
line, flow-control wiring), so v1.2 lets an optional /radio-bridge DT
node seed their boot defaults (see README "Device tree configuration").
The bridge stays a non-platform driver — converting a field-stable
driver to platform binding for two values wasn't worth the churn; init
just looks the node up by compatible. Explicit kernel-cmdline or sysfs
writes always win over the DT (the param setters record an explicit
write; built-in modules apply cmdline params before late_initcall).
nRST pulse — one open-drain GPIO, not a pin-mux trick¶
nrst_pulse (driver v1.1, discussion #121) resets the EFR32 by claiming
the one GPIO line its nRST pad is wired to (nrst_gpio, default 12 =
pad B4 on the Lidl board — isolated per-pad on the bench) through the
gpiod consumer API with GPIO_OPEN_DRAIN: assert drives the pad low,
release floats it back to input and the EFR32's internal RESETn pull-up
does the rest. RESETn is never driven high — Silabs wires it as an
open-drain input, so pushing it high would fight the chip's own reset
sources. The pad mux to GPIO mode comes for free from the gpio-rtl819x
request() hook, and the line is claimed per pulse, so it stays free
for other consumers between pulses and nrst_gpio changes take effect
on the next pulse without driver state.
Driver v1.0 instead set PIN_MUX_SEL_2 bits {7,10,13} — three separate mux fields copied wholesale from the chip's reset-default value — which re-routed two unrelated pads (B5, B6) for the duration of every pulse. Only the B4 field ever mattered, and the hard-coded mask made the knob useless on RTL8196E boards with different nRST routing.
Options considered and dropped¶
-
Placing the hot path in IRAM. Early scoping assumed we would need the 16 KB on-chip instruction SRAM that the Lexra kernel already uses for the Ethernet RX path. Once the plain-text kernel build shipped, hardware measurements at 892 857 baud under a multi-hour soak showed zero framing/overrun errors and no packet drops. The IRAM work was shelved as unnecessary complexity.
-
Multi-client fan-out. A version that duplicated RX to every connected client was prototyped on paper. It would require per-client send queues, back-pressure accounting per client, and a policy for what happens when a slow client causes a fast one to starve. No workflow on the gateway actually benefits from fan-out — one supervisor per radio is the norm. Dropped.
-
Netlink control plane. Considered as an alternative to sysfs. Rejected because it would require a userspace client library and provide no functionality the sysfs knobs don't already cover. The init script is a three-line shell fragment, not a daemon.
-
Keeping the line discipline. See above —
receive_roombehaviour and the extra ldisc round-trip were both net negatives for our use case, and we have no consumer of ldisc features (canonical mode, signals, echo) since both sides of the wire speak framed binary protocols.
Stability properties¶
- Throughput: 892 857 baud (the N+1 divisor maximum on this SoC)
sustained with the bridge armed and a Z2M EZSP client loaded with
a representative Zigbee network. Kernel 8250 stats show
fe=0 oe=0over multi-hour soaks; bridgedrops_*counters stay at 0. - Lower bauds: 115 200 (baseline NCP/RCP/Router/OT-RCP), 230 400, 460 800, 691 200 validated on the same setup.
- Live reconfiguration: baud, bind address, flow control, listen
port can all be changed on the running bridge without dropping the
TCP client (except
bind_addrandport, which rebuild the listen socket; an already-connected client is not affected). - Boot ordering: the module is built-in and loads with
enable=0; the init script defers arming until after the 8250 probe has published/dev/ttyS1. The serialgateway-era race between daemon startup and tty availability is structurally gone. - Hot path cost (UART -> TCP):
bridge_port_receive_buf()takesbridge_lockand callskernel_sendmsg()while holding it. An early audit flagged this mutex as a potential bottleneck at high baud; measured cost is negligible. An EZSPechoflood at 892 857 baud drove the path ~3 500 calls/s for 30 s with 0drops_err, 0drops_tx, 0 UART framing/overrun errors. The worker kthread consumed ~2.5 % CPU, total system load ~19 %, ~81 % idle. Average per-call mutex cost ≈ 8 µs, which leaves plenty of headroom even if the line ever approached its ~89 kB/s ceiling. The mutex is kept as-is: no refactor to a spinlock or a lock-free queue is warranted.
What this driver does not do¶
- No TLS, no auth. A reachable TCP port is a direct EZSP/CPC/Spinel
session. For untrusted networks, bind to loopback and tunnel over
SSH — see
SECURITY.md. - No buffering beyond what the UART FIFO, tty flip buffer, and TCP socket provide. There is no in-bridge replay/queue if the client disconnects mid-frame; the upper-layer protocol (ASH for EZSP, CPC for RCP/OT-RCP) is responsible for recovery.
- No multi-PAN / multiplexing. One radio, one UART, one TCP client at
a time. Multi-PAN concurrency is handled host-side by
zigbeed+otbr-agentsharing the same RCP viacpcd, which in turn is the single TCP client on our end.