Deep Dive: System Identification
This chapter documents the hardware-to-simulation quantities that were identified or constrained for stable locomotion transfer.
1. Hardware mapping comes first
Before tuning rewards or training parameters, the joint-level hardware mapping must be correct.
For Asimov legs, the following items were especially important:
- the ankle is not directly driven
- ankle pitch and roll are produced through a parallel mechanism
- toe behavior is passive and spring-driven
- leg joints use different actuator families with different reflected inertia and torque-speed limits
Errors in this mapping produced unstable or seemingly random policy behavior.
2. Armature as reflected rotor inertia
In this stack, joint armature should not be interpreted as the literal motor armature. It is used as a reflected inertia term that captures how the motor and gearbox appear at the joint.
This distinction matters because armature strongly affects closed-loop behavior and stability.
Representative identified values are:
| Joint family | Example value | Notes |
|---|---|---|
| hip pitch | 0.095625 | From motor datasheet and transmission mapping |
| knee | 0.0339552 | From motor datasheet and transmission mapping |
| ankle | 0.0565056 | Doubled to reflect two motors driving the parallel ankle |
The ankle value required special treatment because pitch and roll are driven by two motors through the RSU ankle mechanism.
3. KP/KD consistency between sim and hardware
Even with calculated KP/KD values, the robot exhibited vibration on startup. After analysis, the root cause was not a hardware limitation — the real motor controllers worked fine. The problem was that the simulation KP/KD values produced an underdamped system, and the policy learned to behave accordingly.
The policy is simply an MLP. Its job is to model some non-linear function based on the data provided to it and how its weights get updated. When trained on data from an underdamped simulated system, the policy learned to behave like an underdamped controller. And how does an underdamped control system respond to an impulse? It oscillates.
This was verified mathematically: the policy's output behavior matched the impulse response of an underdamped second-order system.
The failure chain is:
- simulation KP/KD values produce underdamped dynamics
- the policy trains on this data and learns to behave like an underdamped controller
- on real hardware, the gains are fine — but the policy's learned behavior is already underdamped
- the policy's corrections overshoot, and each overshoot triggers a larger correction on the next cycle, exciting sustained oscillation
This realization was critical because it reframed the locomotion problem:
How do I make the domain of data between sim and real match as closely as possible?
The practical lesson is not just about constraining gains to hardware limits — it is about ensuring the simulated dynamics produce training data that matches the real system's response characteristics. If the sim data domain diverges from the real data domain, the policy will learn behavior that does not transfer, regardless of whether the individual parameter values are physically plausible.
4. Motor model details that mattered
The actuator model includes more than simple PD control. The simulation stack models:
- per-joint stiffness and damping
- effort limits
- speed-torque saturation
- reflected inertia through armature
- static and dynamic friction
- explicit action delay
This richer actuator model was a significant part of the sim2real improvement.
Representative actuator parameters for the legs stack include:
| Parameter | Example value | Note |
|---|---|---|
| stiffness | 65.0 | chosen as a safe deployable value |
| damping | 5.0 | tuned to match real system response |
| effort limit | 39.40 | peak torque for the modeled joint family |
| saturation effort | 120.0 | speed-torque saturation behavior |
| velocity limit | 12.57 rad/s | from motor specification |
| friction static | 1.30 | static friction term |
| friction dynamic | 0.100 | Coulomb-like dynamic friction term |
The simulated actuator path is then wrapped in an explicit delay model with delay_min_lag=0 and delay_max_lag=1.
5. Delay is part of identification
Actuator delay was not treated as a generic nuisance term. It was modeled from the observed timing behavior of the real firmware and communication path.
The training model therefore includes:
- action delay on the actuator path
- grouped observation delay on the sensing path
- real CAN timing structure rather than perfectly synchronized joint state
These delays are part of the identified system, not just regularization noise.
This same reasoning also motivated the move away from an overly pristine built-in actuator interpretation toward a control path that better reflected what the policy would actually see at IO rate.
6. Toe model identification
The toe joint is passive, but it still affects whole-body stability through contact and push-off.
The simulator therefore needs:
- toe stiffness
- toe damping
- toe limits
- toe collision geometry
- toe-ground contact behavior
In practice, toe resistance had to be increased relative to early assumptions because insufficient toe support caused the policy to ignore the toe during learning.
Toe state was exposed to the critic, not the actor. This allowed training to capture the stabilizing effect of the toe without introducing a deploy-time dependency on unmeasured joint state.
7. Collision geometry is also system identification
Contact behavior is highly sensitive to geometry. The locomotion environment therefore replaced detailed mesh collision with simpler capsule-based foot and toe geometry.
This choice improved determinism and reduced the risk of learning artifacts from unstable mesh contact.
The identified contact model includes:
- multiple foot and toe capsules
- explicit foot-ground contact sensing
- toe contact sensing
- tuned friction and contact dimensions on foot and toe geoms
The final contact configuration emphasized repeatability:
| Contact setting | Value / choice |
|---|---|
| contact primitive | capsules instead of mesh collision |
| foot / toe friction | 0.6 |
| contact dimension | condim=3 on foot and toe geometry |
| capsule radius | approximately 12 mm |
In practice, multiple heel, midfoot, and toe capsules were used so the support polygon was more stable than a single coarse collision shape.
8. Soft limits and deployable ranges
The policy is not trained to use the full hard-stop hardware range. Instead, training uses soft joint limits, typically at 0.9 of the hardware range.
This reduces:
- hard-stop impacts
- unrealistic exploitation of boundary states
- deployment-time shock loads near limit boundaries
The soft-limit factor used in training was approximately 0.9 of the hardware range.
9. Geometry errors can invalidate learning
System identification also includes checking the geometry itself. One important example was toe alignment: when the toes were accidentally tilted relative to the intended flat contact pose, the policy stopped learning effective forward-balance recovery.
This is a useful reminder that a locomotion policy can fail even when gains and rewards are reasonable, simply because the physical model is not internally consistent.
How is this guide?