Deep Dive: System Identification

This chapter documents the hardware-to-simulation quantities that were identified or constrained for stable locomotion transfer.

1. Hardware mapping comes first

Before tuning rewards or training parameters, the joint-level hardware mapping must be correct.

For Asimov legs, the following items were especially important:

the ankle is not directly driven

ankle pitch and roll are produced through a parallel mechanism

toe behavior is passive and spring-driven

leg joints use different actuator families with different reflected inertia and torque-speed limits

Errors in this mapping produced unstable or seemingly random policy behavior.

2. Armature as reflected rotor inertia

In this stack, joint armature should not be interpreted as the literal motor armature. It is used as a reflected inertia term that captures how the motor and gearbox appear at the joint.

This distinction matters because armature strongly affects closed-loop behavior and stability.

Representative identified values are:

Joint family	Example value	Notes
hip pitch	`0.095625`	From motor datasheet and transmission mapping
knee	`0.0339552`	From motor datasheet and transmission mapping
ankle	`0.0565056`	Doubled to reflect two motors driving the parallel ankle

The ankle value required special treatment because pitch and roll are driven by two motors through the RSU ankle mechanism.

3. KP/KD consistency between sim and hardware

Even with calculated KP/KD values, the robot exhibited vibration on startup. After analysis, the root cause was not a hardware limitation — the real motor controllers worked fine. The problem was that the simulation KP/KD values produced an underdamped system, and the policy learned to behave accordingly.

The policy is simply an MLP. Its job is to model some non-linear function based on the data provided to it and how its weights get updated. When trained on data from an underdamped simulated system, the policy learned to behave like an underdamped controller. And how does an underdamped control system respond to an impulse? It oscillates.

This was verified mathematically: the policy's output behavior matched the impulse response of an underdamped second-order system.

The failure chain is:

simulation KP/KD values produce underdamped dynamics

the policy trains on this data and learns to behave like an underdamped controller

on real hardware, the gains are fine — but the policy's learned behavior is already underdamped

the policy's corrections overshoot, and each overshoot triggers a larger correction on the next cycle, exciting sustained oscillation

This realization was critical because it reframed the locomotion problem:

How do I make the domain of data between sim and real match as closely as possible?

The practical lesson is not just about constraining gains to hardware limits — it is about ensuring the simulated dynamics produce training data that matches the real system's response characteristics. If the sim data domain diverges from the real data domain, the policy will learn behavior that does not transfer, regardless of whether the individual parameter values are physically plausible.

4. Motor model details that mattered

The actuator model includes more than simple PD control. The simulation stack models:

per-joint stiffness and damping

effort limits

speed-torque saturation

reflected inertia through armature

static and dynamic friction

explicit action delay

This richer actuator model was a significant part of the sim2real improvement.

Representative actuator parameters for the legs stack include:

Parameter	Example value	Note
stiffness	`65.0`	chosen as a safe deployable value
damping	`5.0`	tuned to match real system response
effort limit	`39.40`	peak torque for the modeled joint family
saturation effort	`120.0`	speed-torque saturation behavior
velocity limit	`12.57 rad/s`	from motor specification
friction static	`1.30`	static friction term
friction dynamic	`0.100`	Coulomb-like dynamic friction term

The simulated actuator path is then wrapped in an explicit delay model with delay_min_lag=0 and delay_max_lag=1.

5. Delay is part of identification

Actuator delay was not treated as a generic nuisance term. It was modeled from the observed timing behavior of the real firmware and communication path.

The training model therefore includes:

action delay on the actuator path

grouped observation delay on the sensing path

real CAN timing structure rather than perfectly synchronized joint state

These delays are part of the identified system, not just regularization noise.

This same reasoning also motivated the move away from an overly pristine built-in actuator interpretation toward a control path that better reflected what the policy would actually see at IO rate.

6. Toe model identification

The toe joint is passive, but it still affects whole-body stability through contact and push-off.

The simulator therefore needs:

toe stiffness

toe damping

toe limits

toe collision geometry

toe-ground contact behavior

In practice, toe resistance had to be increased relative to early assumptions because insufficient toe support caused the policy to ignore the toe during learning.

Toe state was exposed to the critic, not the actor. This allowed training to capture the stabilizing effect of the toe without introducing a deploy-time dependency on unmeasured joint state.

7. Collision geometry is also system identification

Contact behavior is highly sensitive to geometry. The locomotion environment therefore replaced detailed mesh collision with simpler capsule-based foot and toe geometry.

This choice improved determinism and reduced the risk of learning artifacts from unstable mesh contact.

The identified contact model includes:

multiple foot and toe capsules

explicit foot-ground contact sensing

toe contact sensing

tuned friction and contact dimensions on foot and toe geoms

The final contact configuration emphasized repeatability:

Contact setting	Value / choice
contact primitive	capsules instead of mesh collision
foot / toe friction	`0.6`
contact dimension	`condim=3` on foot and toe geometry
capsule radius	approximately `12 mm`

In practice, multiple heel, midfoot, and toe capsules were used so the support polygon was more stable than a single coarse collision shape.

8. Soft limits and deployable ranges

The policy is not trained to use the full hard-stop hardware range. Instead, training uses soft joint limits, typically at 0.9 of the hardware range.

This reduces:

hard-stop impacts

unrealistic exploitation of boundary states

deployment-time shock loads near limit boundaries

The soft-limit factor used in training was approximately 0.9 of the hardware range.

9. Geometry errors can invalidate learning

System identification also includes checking the geometry itself. One important example was toe alignment: when the toes were accidentally tilted relative to the intended flat contact pose, the policy stopped learning effective forward-balance recovery.

This is a useful reminder that a locomotion policy can fail even when gains and rewards are reasonable, simply because the physical model is not internally consistent.

Deep Dive: System Identification

On this page