Menlo
Locomotion Training

Domain Randomization

This chapter describes the domain randomization strategy used for Asimov locomotion.

1. Targeted, not broad

The randomization strategy is intentionally selective. The goal is not to randomize every quantity in the simulator. The goal is to randomize the quantities that are known to vary between simulation and hardware.

This chapter should therefore be read with the following principle in mind:

Randomize what is known to vary. Do not randomize what has already been measured with sufficient accuracy.

2. Quantities that are randomized

Representative randomized terms include:

ParameterRangeReason
encoder zero offset (qpos0)+/-0.02 radCalibration error
PD gainsx0.9 - x1.1Motor response variation
toe stiffness3.5 - 5.5 Nm/radSpring variation
foot friction1.0 - 1.5Surface variation
observation delay0-2 stepsCAN timing jitter
action delay0-1 stepsCommand latency
push disturbance+/-0.5 m/s class disturbancesExternal perturbations
reset base orientationyaw+/-180°, pitch+/-0.15 rad, roll+/-0.1 radInitial orientation variation
joint velocity noise+/-0.1 rad/sEncoder velocity noise
IMU angular velocity noise+/-0.01 rad/sGyro measurement noise

These randomizations are tied directly to known sources of mismatch.

3. Quantities intentionally not randomized

Some quantities are intentionally left fixed during initial training.

ParameterReason
body massbroad randomization reduced learning stability during initial walking
link lengthsCAD and URDF geometry were already close to hardware
gravitydeployment environment does not vary meaningfully

This prevents training from spending capacity on unlikely or unnecessary variability.

4. Delay randomization is not generic noise

Observation and actuator delay randomization are especially important in this stack. These delays are not abstract robustness noise — they reflect the real CAN polling structure and firmware timing described in Deep Dive: System Identification. The randomization ranges in the table above correspond directly to the measured variation in those timing paths.

5. Contact-side randomization

Foot friction and contact-dependent terms are also randomized because walking quality depends strongly on floor condition and contact consistency.

These terms help the policy remain usable across:

  • slightly different surfaces
  • moderate contact-model mismatch
  • unit-to-unit variation in toe and foot response

6. Randomization still depends on an accurate base model

Domain randomization is not a substitute for system identification.

The stack first requires:

  • correct hardware mapping
  • realistic actuator parameters
  • stable contact geometry
  • deployable observation design

Only after those are in place does targeted randomization improve robustness in a meaningful way.

How is this guide?

On this page