Remote Monitoring Architecture Spec¶

Status: draft v0

Last updated: 2026-03-19

1) Purpose¶

Define a concrete architecture for:

mission-critical runtime on the robot SBC
optional operator-controlled preview streaming to a laptop
off-robot monitoring, visualization, recording, and analysis

The key constraint is that the SBC should spend nearly all steady-state compute on robot-critical work. Preview and diagnostics should be optional, explicitly budgeted, and easy to disable.

2) Goals¶

Keep autonomy and perception functional even when diagnostics are disabled, disconnected, or broken.
Make preview streaming opt-in and remotely controllable from the laptop.
Keep ROS 2 internal to the robot runtime and use explicit external protocols for operator connectivity.
Run RViz2, overlays, plots, bagging, and higher-cost analysis on the host.
Preserve a path to a more isolated split deployment later without reworking the whole system.

3) Non-Goals (v0)¶

full cloud telemetry platform
browser-first remote operations stack
exact pixel-perfect overlay sync in the always-on path
replacing the mission-critical local consumers of detections

4) Current Hardware / Runtime Facts¶

Observed on the current ROCK 5B Plus target:

/dev/video11 is rkisp_mainpath
/dev/video12 is rkisp_selfpath
both paths can stream concurrently at 1280x720 NV12 60 fps
neither path currently advertises BGR24/RGB24
selfpath exposes YUV-family formats plus GREY and RGB565

Implications:

onboard inference should continue using selfpath
preview should preferentially use mainpath
color conversion should happen in the preview stack only when preview is enabled
raw bgr8 should not be the default transport off the robot

Also measured locally on this board:

current inference preprocess NV12 1280x720 -> RGB888 640x640 costs about 0.96 ms/frame
an additional full-resolution NV12 1280x720 -> BGR888 1280x720 RGA conversion costs about 0.64 ms/frame

This means a same-frame in-pipeline preview branch is viable for targeted debug, but should not be the default always-on remote viewing path.

5) Top-Level Architecture¶

The system is split into two planes:

5.1 Mission-Critical Plane¶

Runs on the SBC and stays up regardless of diagnostics state.

Responsibilities:

camera capture for inference
preprocess
inference
detection publication
local behavior/autonomy consumers
local health and perf reporting

5.2 Diagnostic Plane¶

Spans robot and host, and is explicitly optional.

Responsibilities:

preview stream enable/disable
off-robot visualization
recording and evaluation
operator metrics panels
optional host-side tracking / overlays / RViz2

6) Recommended Initial Deployment¶

6.1 Robot Side¶

Run one container initially:

robot-core

Inside robot-core:

mission-critical ROS 2 nodes
robot-diag-control
an on-demand preview streamer subprocess

Do not start with two robot containers by default.

Reason:

the compute overhead of a second container is usually small
the operational complexity is not small
device passthrough, lifecycle, logs, startup ordering, and debugging all get harder
the preview stack is still evolving, so start with one deployment boundary and split later only if it pays for itself

6.2 Host Side¶

Run native applications first, not containers.

Recommended host components:

native operator application or helper process
RViz2 native
optional local host-side decode/analysis helpers

Reason:

RViz2 and desktop video decode are simplest natively
host GUI/container audio/video/display plumbing is usually friction-heavy
the laptop is not compute-constrained the way the SBC is

7) Robot Container vs Two-Container Split¶

7.1 Initial Recommendation: One Container, Two Process Groups¶

Use:

robot-core container
robot-diag-control node inside it
preview streamer launched on demand as a child process or supervised sibling process

Advantages:

simplest boot and deployment story
one ROS environment
one filesystem namespace
easy access to /dev/video11 and /dev/video12
no extra container orchestration needed
lowest integration risk for early slices

Disadvantages:

preview faults are less isolated than with a fully separate container
logs and lifecycle management need discipline

7.2 Later Option: Two Robot Containers¶

Possible later split:

robot-core
robot-video

Use this only after the preview stack is stable and useful enough to justify stronger isolation.

Advantages:

independent restart policy for the streamer
tighter resource accounting and policy
easier security/network hardening if preview becomes externally reachable

Disadvantages:

more startup and lifecycle complexity
device passthrough needs explicit management
more networking/configuration surface
more operational overhead for development and debugging

7.3 Overhead Assessment¶

For containers themselves:

CPU overhead of an extra container is usually negligible
memory overhead is mostly one more process tree plus duplicated runtime libs and buffers
the bigger cost is operational complexity, not raw compute

For the preview stack:

the meaningful budget item is the preview pipeline itself: capture, optional conversion, encode, socket I/O
that cost exists whether the streamer runs as a process in robot-core or in a separate robot-video container

Conclusion:

start as one container with a managed preview subprocess
split later only for fault isolation or deployment hygiene, not because containers themselves are too expensive

8) Robot-Side Components¶

8.1 `robot-core`¶

Responsibilities:

inference path on /dev/video12
/yolo/detections
/vision/perf
local consumers of detections
robot-diag-control

8.2 `robot-diag-control`¶

Lives inside robot-core initially.

Responsibilities:

expose preview control over ROS 2
own preview state machine
start/stop preview streamer subprocess
publish diagnostic status to the rest of the graph
keep preview failure isolated from mission-critical nodes

Recommended outputs:

/diag/status
/diag/preview_status

Recommended services/actions:

/diag/set_preview_mode
/diag/get_diag_status

8.3 Preview Streamer Subprocess¶

Responsibilities:

own /dev/video11
capture NV12
hardware-encode H.265
serve stream to the laptop

Recommended default behavior:

off by default
started only on operator request
bounded profiles only, not arbitrary encoder flags from the UI

9) Host-Side Components¶

9.1 Operator App / Host Helper¶

Native host application or helper process.

Responsibilities:

talk to the robot over gRPC
open/close the preview stream
subscribe to state/events from the robot gateway
optionally decode the stream into local ROS topics for RViz2 or host-side overlay nodes

9.2 UI / Monitoring App¶

Keep the UI separate from ROS internals where possible.

Responsibilities:

preview toggle
preview state display
metrics dashboards
event/fault display
record/export controls

The UI can talk directly to the robot gateway over gRPC or sit on top of a thin local helper process.

9.3 RViz2 / Analysis Tools¶

Run natively on the host.

Examples:

RViz2
rqt_plot
rosbag2
host-side tracking_node
host-side debug_node

10) Control and Data Boundaries¶

10.1 ROS 2 Is Used For¶

Inside the robot, ROS 2 is used for:

status
metrics
detections
event/state publication

Examples:

/yolo/detections
/vision/perf
/diag/*

10.2 Video Transport Is Separate¶

Use a dedicated video transport for preview.

Recommended first transport:

SRT over local Wi-Fi / LAN

Later options:

RTSP if a camera-like operator UX becomes more important later
WebRTC only if browser/native remote ops requirements justify the added complexity

Reason:

DDS is a poor default place for always-on compressed operator video
dedicated video transport gives tighter control over bitrate, latency, and startup behavior

10.3 Preview Transport Options¶

The preview path should keep the robot-side data plane simple:

/dev/video11 mainpath captures NV12
encoder consumes NV12 directly where possible
robot sends compressed H.265 over the network
host decodes and performs any color conversion, scaling, or overlay work locally

This avoids paying for a robot-side NV12 -> BGR conversion in the default preview path.

Recommended interpretation of the main transport options:

RTSP: session/control protocol commonly used to expose a camera-like stream
SRT: secure reliable transport designed for lossy or variable networks
WebRTC: interactive media stack with NAT traversal and adaptive control, but much higher implementation complexity

For this project:

use SRT first for the initial operator preview path
consider RTSP later if a camera-like session model or off-the-shelf tooling becomes more important than transport resilience
defer WebRTC unless a browser-facing or internet-routable operator UI becomes a real requirement

10.4 Transport Tradeoff Matrix¶

RTSP¶

Strengths:

easy mental model and broad tooling support
simple to open from VLC, ffplay, GStreamer, and many desktop apps
good default fit for a local operator preview stream

Weaknesses:

RTSP itself is only the control layer; media is typically carried separately over RTP
behavior over weak Wi-Fi depends heavily on whether media is carried over UDP or TCP

Typical modes:

RTSP/RTP/UDP: lower latency, but packet loss shows up as artifacts or drops
RTSP interleaved over TCP: simpler and more firewall-friendly, but retransmissions can increase jitter and stall behavior on bad Wi-Fi

SRT¶

Strengths:

designed for unreliable links
supports packet recovery, encryption, and configurable latency budget
often behaves better than plain RTP/UDP on noisy Wi-Fi

Weaknesses:

fewer "camera app" style clients than RTSP
slightly more specialized tooling and operational knowledge
usually trades a bit more latency for better resilience

Typical use:

best when "keep the stream usable" matters more than shaving every millisecond

WebRTC¶

Strengths:

very good for browser delivery and interactive remote control surfaces
adaptive and NAT-friendly

Weaknesses:

much more signaling and integration complexity
overkill for the first monitoring slices

11) Preview Modes¶

Define a small set of stable profiles instead of exposing raw encoder knobs:

off
low_bw
balanced
high_quality
exact_sync_debug (later, optional, not default)

Suggested semantics:

off: no preview capture or encode running
low_bw: 720p, low fps, low bitrate
balanced: 720p, moderate fps/bitrate
high_quality: higher bitrate and possibly higher resolution if budget allows
exact_sync_debug: special mode that may use a same-frame preview path instead of the normal secondary ISP path

12) Sync Model¶

Two useful operating modes exist:

12.1 Default Diagnostic Mode¶

inference on selfpath
preview on mainpath
detections and video are close in time but not guaranteed to be the same exact frame

Use for:

operator monitoring
general evaluation
performance-safe remote diagnosis

12.2 Exact-Sync Debug Mode¶

preview derived from the same frame path used for inference
more expensive
used only for targeted debugging when exact overlay alignment matters

Use for:

validating tracking edge cases
detailed regression capture

13) Boot / Lifecycle Model¶

13.1 Robot Boot¶

At boot:

launch robot-core
do not launch preview streamer
robot-diag-control reports preview as disabled

13.2 Preview Enable Flow¶

Host discovers robot over ROS 2.
Operator requests preview.
Host calls /diag/set_preview_mode.
robot-diag-control validates request and starts preview streamer.
Robot publishes preview status and endpoint metadata.
Host opens stream.

13.3 Preview Disable Flow¶

Host requests preview off.
robot-diag-control stops the preview streamer.
Robot publishes preview disabled state.
Host tears down local decode/overlay path.

13.4 On-Demand Subprocess Model¶

The preview streamer should initially be a managed child process of robot-diag-control.

Responsibilities of robot-diag-control:

maintain a single preview state machine
translate preview profiles into a fixed command line or config blob
start the child process
monitor child liveness
expose status and faults over ROS 2
stop the child cleanly on operator request or node shutdown

Suggested state machine:

disabled
starting
running
stopping
faulted

Suggested start behavior:

Validate no preview child is already active.
Resolve the requested profile into width, height, fps, bitrate, and transport.
Spawn the preview child process with stdout/stderr captured to logs.
Wait for a bounded startup window.
Publish running with the resolved stream URI if startup succeeds.
Publish faulted if the child exits early or fails health checks.

Suggested stop behavior:

Mark state as stopping.
Send SIGTERM to the child process.
Wait a short timeout for clean shutdown.
Escalate to SIGKILL only if needed.
Publish disabled after cleanup completes.

Failure containment requirements:

preview child exit must not restart or block mission-critical ROS nodes
repeated preview faults should increment a restart/error counter
the system must tolerate the operator rapidly toggling preview on/off

13.5 Preview Subprocess Inputs and Outputs¶

Initial subprocess inputs:

capture device path
selected preview profile
transport kind
bind address / port

Initial subprocess outputs:

process exit code
stream URI
negotiated resolution and fps
encoder/transport error text
optional periodic bitrate/frame counters

14) Recommended Initial Interfaces¶

14.1 ROS 2 Service: `SetPreviewMode`¶

Initial request fields:

bool enabled
string profile

Initial response fields:

bool accepted
string state
string message
string stream_uri

14.2 ROS 2 Topic: `PreviewStatus`¶

Suggested fields:

current state
current profile
stream URI
width
height
fps
bitrate
error text
restart count

15) Important Decisions Answered¶

Q1. Should the robot run two containers from the start?¶

Recommended answer:

no

Start with one robot-core container and launch the preview streamer as a managed subprocess from robot-diag-control.

Q2. Does splitting into two robot containers materially reduce compute load?¶

Recommended answer:

not in any meaningful way

The preview pipeline cost is in capture/encode/network I/O. The container boundary itself is not the expensive part.

Q3. Why still keep the two-container option?¶

Recommended answer:

for later fault isolation and operational hygiene

Use it only if the preview stack becomes stable enough that independent restart, policy, or packaging actually matters.

Q4. Should the host side use containers initially?¶

Recommended answer:

no

Use native applications first. Keep RViz2, decode, and monitoring native on the laptop.

Q5. How should the host app connect to the robot?¶

Recommended answer:

gRPC for control/state
SRT for preview video

Q6. Where should tracking/overlays run?¶

Recommended answer:

on the host by default

That keeps the SBC focused on mission-critical work.

Q7. Should the robot convert preview frames to BGR before sending them?¶

Recommended answer:

no, not in the default preview path

Keep the preview path in NV12 into the encoder and send compressed H.265. Perform decode, colorspace conversion, scaling, and overlay work on the host.

Q8. Should the host be expected to have Rockchip RGA?¶

Recommended answer:

no

The host laptop will usually not have Rockchip RGA. What it usually has is sufficient CPU/GPU and often hardware video decode support. The architecture should assume only that the host can decode H.265 and render overlays locally.

Q9. What does "on-demand subprocess" mean concretely?¶

Recommended answer:

robot-diag-control launches and supervises the preview pipeline only while preview is enabled

This means preview has a strict lifecycle and no steady-state cost when disabled.

16) Recommended First Implementation Slices¶

Add robot-diag-control inside robot-core with preview state reporting and a stubbed preview state machine.
Add a narrow gRPC surface for preview status/control that can later grow into the robot gateway.
Add a managed preview streamer subprocess controlled by robot-diag-control, using /dev/video11.
Add host-side RViz2 / analysis integration.
Only after that, decide whether a separate robot-video container or an exact-sync debug mode is worth the added complexity.

17) Open Questions¶

What hardware H.265 userspace path should be installed or enabled on the SBC image.
Whether the first preview slice should use software x264 as a bring-up path or wait for hardware H.265 integration.
Whether the first host-side overlay path should decode preview into ROS Image locally, or keep video display separate from ROS overlays until later.

Remote Monitoring Architecture Spec¶

1) Purpose¶

2) Goals¶

3) Non-Goals (v0)¶

4) Current Hardware / Runtime Facts¶

5) Top-Level Architecture¶

5.1 Mission-Critical Plane¶

5.2 Diagnostic Plane¶

6) Recommended Initial Deployment¶

6.1 Robot Side¶

6.2 Host Side¶

7) Robot Container vs Two-Container Split¶

7.1 Initial Recommendation: One Container, Two Process Groups¶

7.2 Later Option: Two Robot Containers¶

7.3 Overhead Assessment¶

8) Robot-Side Components¶

8.1 robot-core¶

8.2 robot-diag-control¶

8.3 Preview Streamer Subprocess¶

9) Host-Side Components¶

9.1 Operator App / Host Helper¶

9.2 UI / Monitoring App¶

9.3 RViz2 / Analysis Tools¶

10) Control and Data Boundaries¶

10.1 ROS 2 Is Used For¶

10.2 Video Transport Is Separate¶

10.3 Preview Transport Options¶

10.4 Transport Tradeoff Matrix¶

RTSP¶

SRT¶

WebRTC¶

11) Preview Modes¶

12) Sync Model¶

12.1 Default Diagnostic Mode¶

12.2 Exact-Sync Debug Mode¶

13) Boot / Lifecycle Model¶

13.1 Robot Boot¶

13.2 Preview Enable Flow¶

13.3 Preview Disable Flow¶

13.4 On-Demand Subprocess Model¶

13.5 Preview Subprocess Inputs and Outputs¶

14) Recommended Initial Interfaces¶

14.1 ROS 2 Service: SetPreviewMode¶

14.2 ROS 2 Topic: PreviewStatus¶

15) Important Decisions Answered¶

Q1. Should the robot run two containers from the start?¶

Q2. Does splitting into two robot containers materially reduce compute load?¶

Q3. Why still keep the two-container option?¶

Q4. Should the host side use containers initially?¶

Q5. How should the host app connect to the robot?¶

Q6. Where should tracking/overlays run?¶

Q7. Should the robot convert preview frames to BGR before sending them?¶

Q8. Should the host be expected to have Rockchip RGA?¶

Q9. What does "on-demand subprocess" mean concretely?¶

16) Recommended First Implementation Slices¶

17) Open Questions¶

8.1 `robot-core`¶

8.2 `robot-diag-control`¶

14.1 ROS 2 Service: `SetPreviewMode`¶

14.2 ROS 2 Topic: `PreviewStatus`¶