Menlo

Concepts

Mental models for building on the Menlo Platform — architecture layers, robots, simulator, agent, and sessions.

The mental model you need before writing code. For exact message shapes and field lists, see the Wire format reference.

Architecture layers

Robot software is organized into three layers with different timing contracts. System 2 can take seconds to reason; System 0 must respond in microseconds. A well-designed robot moves work down the stack only as fast as the timing budget allows.

Robot architecture layersThree stacked layers: System 2 deliberative planner, System 1 reactive controller, System 0 hardware abstraction.System 2 — deliberative plannerSlow, goal-directed reasoning · task planning · LLM / symbolic AISystem 1 — reactive controllerFast, reflex-like responses · skill policies · behavior treesSystem 0 — hardware abstractionLow-level drivers · real-time OS · motor/sensor firmware
LayerTimingResponsibility
System 2100 ms – secondsGoal reasoning, task planning, LLM queries
System 110 – 100 msSkill execution, behavior trees, reactive policies
System 0< 1 msMotor drivers, sensor reads, real-time control loop

Robots

A robot is the core resource in the platform. Every robot — virtual or physical — exposes the same API surface, so code written against one deploys to the other without changes.

TypeDescription
virtualSimulated robot running in the browser via the Uranus simulator. No hardware required. Create one instantly from the Platform UI.
physicalAn Asimov humanoid robot. Controlled today via the Asimov API. Platform UI integration is coming soon.

Physical robots additionally expose firmware modes (damp, stand, move) controlled through the Asimov API. Those modes aren't surfaced in the Platform UI today.


Simulator

The Uranus simulator is a browser-based MuJoCo digital twin of the Asimov robot.

  • Runs inside the Platform UI — no local install required.
  • Uses actuator models measured from real hardware, not idealized physics.
  • Streams telemetry over the same protobuf wire format as physical robots.
  • FPV and third-person camera views, captured as a video track over WebRTC.

Agents validated in the simulator deploy directly to physical hardware without code changes.


Agent

Every session launches with an agent — a voice-driven layer that sits above the direct motion API. Hold Shift in the cockpit and speak a command; the agent transcribes your voice, reasons about the intent, issues the matching robot command, and replies verbally.

The pipeline is three stages: STT (speech-to-text) → LLM (intent + tool use) → TTS (text-to-speech). The LLM has tools for semantic commands and can see the robot's camera feed, so it can answer questions about what the robot is looking at and act on natural language instead of button presses.

The agent is dispatched automatically when you start a session — no separate connection needed. See the Manual control guide for how to use it.


Sessions & channels

A session is a live, bidirectional WebRTC connection to a running robot. Under the hood it's a LiveKit room — the browser, the agent, and the robot all join the same room and exchange messages over a fixed set of channels.

The channels have different reliability guarantees tuned to what they carry:

  • Commands go out on a reliable, ordered channel — you don't want a "stop" to get dropped.
  • Telemetry comes back on a lossy channel at ~10 Hz — if a packet drops, the next one is 100 ms away anyway.
  • System events (boot progress, errors, mode changes) flow on a separate reliable channel so you can react to state transitions without polling telemetry.
  • Video and audio are standard WebRTC media tracks.

Start a session with POST /v1/robots/{id}/session. For the exact channel names, message shapes, and field lists, see the Wire format reference.


For API authentication, see API reference → Authentication.

How is this guide?

On this page