HAIC: A Dynamics-Aware World Model for Humanoid Object Interaction
Bob Jiang
February 24, 2026
Introduction
Humanoid robots are finally getting good at whole-body, contact-rich tasks: carrying boxes, opening doors, and manipulating tools while staying balanced. But thereâs a category of tasks that remains deceptively hard:
- objects that arenât rigidly attached to the robot,
- objects that are underactuated (the robot canât directly command their full state), and
- objects that have non-holonomic constraints (e.g., wheels that roll forward but donât slide sideways).
Think: pushing a heavy cart, pulling a suitcase, or even skateboarding. In these cases, the objectâs dynamics fight you. Contacts are intermittent. The object can drift into blind spots. And the robotâs control policy has to reason about what the object is doing now and what it will do nextâoften without clean external state estimation.
A February 2026 paper proposes a compelling approach: HAIC (Humanoid Agile Object Interaction Control via Dynamics-Aware World Model), a framework designed to make these âhumanoid + underactuated objectâ interactions robust without external state estimation, using primarily proprioceptive history and a clever representation called a spatially grounded dynamic occupancy map.
- Paper: âHAIC: Humanoid Agile Object Interaction Control via Dynamics-Aware World Modelâ (arXiv:2602.11758)
- Project page: https://haic-humanoid.github.io/
- arXiv: https://arxiv.org/abs/2602.11758
This post breaks down the core idea, why it matters, and how to think about it in the broader trend toward world-model-driven physical AI.
The real problem: âHOIâ is often easier than it looks
The paper points out an uncomfortable truth: a lot of humanoid âhuman-object interactionâ (HOI) research implicitly assumes the object is either:
- fully actuated (the robot can effectively command the important degrees of freedom), or
- rigidly coupled to the robot (e.g., grasped firmly), or
- easy to estimate via external sensing (motion capture, markers, overhead cameras).
But with carts, skateboards, and other rolling objects, you get extra failure modes:
- Coupling forces: a small error in pushing direction can torque the robot or whip the object.
- Occlusions / blind spots: the object drifts out of view (or never was in view).
- Non-holonomic constraints: the object responds in constrained ways (rolling vs sliding), so a naive control signal causes instability.
- Distribution shift: the moment the policy explores a new behavior, your estimator/model can become wrong.
HAICâs pitch: build a dynamics-aware world model that adapts online to the policyâs behavior and provides the policy with a grounded representation it can use to infer collision boundaries and contact affordancesâeven when the object is partly âinvisible.â
HAIC in one sentence
HAIC predicts high-order object motion (velocity, acceleration) from proprioceptive history, projects that prediction into a geometric prior to create a dynamic occupancy map, and trains a policy with asymmetric fine-tuning so the world model continuously adapts to the policyâs exploration.
That sentence hides three key design choices. Letâs unpack them.
1) Predicting object dynamics from proprioception (no external estimator)
The first ingredient is a dynamics predictor that estimates high-order object stateâspecifically including velocity and accelerationâusing only the robotâs proprioceptive history.
Proprioception usually includes signals like:
- joint positions/velocities,
- IMU measurements,
- contact forces/torques (depending on the platform),
- foot pressure/contact state,
- sometimes motor currents.
Why might this work?
Because when a humanoid pushes or pulls something, the object âtalks backâ through the robot:
- the push resistance shows up as torques,
- rolling friction changes the required effort,
- inertial loads show up when the object accelerates,
- micro-slips and impacts show up as high-frequency signatures.
In other words, the robot can treat the object as something it can feel, not only something it must see.
This is a strong bet: itâs basically saying âstate estimation through contact is viable at scale.â And if itâs true, itâs a big deal for real-world deployments where clean tracking is rare.
2) Turning predictions into a âspatially grounded dynamic occupancy mapâ
Predicted velocities and accelerations are useful, but policies usually benefit from spatial representations:
- Where is the object relative to the robot?
- What space might it sweep through next?
- Where are collision boundaries and safe margins?
HAICâs solution is to project predicted dynamics onto static geometric priors to form a dynamic occupancy map.
Intuition:
- You start with some knowledge of the objectâs shape (or at least a geometric prior).
- You combine it with predicted motion (including high-order terms).
- You get a representation of where the object is and where it is going, grounded in real space.
The abstract emphasizes that this map helps the policy infer:
- collision boundaries,
- contact affordances,
- and crucially, whatâs happening in blind spots.
If youâve worked with robotics stacks, this should feel like a cousin of:
- occupancy grids in navigation,
- signed distance fields in manipulation,
- reachability envelopes in planning.
But the twist here is that itâs not only âwhat is occupied right nowââitâs âwhat will be occupied soon, given predicted dynamics.â
Thatâs what makes it potentially useful for agile interactions like skateboarding, where a slight delay in reacting can be catastrophic.
3) Asymmetric fine-tuning: let the world model chase the policy
One of the hardest problems in learning-based control is non-stationarity:
- When you update the policy, the distribution of states changes.
- Estimators trained on âold behaviorâ can degrade.
- Models become wrong in exactly the situations where the policy is exploring.
HAIC addresses this using asymmetric fine-tuning, described as:
âa world model continuously adapts to the student policy's exploration, ensuring robust state estimation under distribution shifts.â (paraphrased from the abstract)
You can think of this like a teacher-student setup where:
- the student is the policy that actually acts,
- the world model is being fine-tuned to track the studentâs evolving behavior.
This is an underappreciated idea: in real robots, your perception/estimation stack is often forced to be robust to new behaviors. If your model canât keep up, you end up either freezing exploration (bad) or crashing robots (worse).
Asymmetric fine-tuning is essentially saying: donât treat the model as fixed infrastructureâtreat it as a component that must adapt alongside the policy.
What HAIC demonstrates (from the abstract)
The paper reports experiments on a humanoid robot across tasks including:
- Skateboarding (agile, dynamics-sensitive, non-holonomic object)
- Cart pushing/pulling under different loads (classic underactuated interaction)
- Long-horizon box carrying across varied terrain while predicting dynamics of multiple objects
The significance isnât only âcool demos.â The broader point is that these tasks force a system to handle:
- inertial perturbations,
- variable mass and friction,
- intermittent contact and occlusion,
- and long-horizon planning/control.
If HAICâs approach generalizes, itâs a strong template for âhumanoid + real world objectsâ beyond lab-perfect conditions.
Why this matters: the next bottleneck is interaction with the world, not just locomotion
Humanoid robotics has had a visible recent arc:
- Locomotion improved (learned gaits, robust balance, better hardware).
- Manipulation improved (better hands, better policies, more data).
- Now the bottleneck is interaction with underactuated, messy objects in unstructured environments.
Underactuated object interaction is where ârobotics meets physicsâ in the most annoying way:
- A door hinge is non-holonomic.
- A suitcase has wheels.
- A shopping trolley has caster dynamics.
- A pallet jack is basically a dynamics puzzle.
If you want humanoids in homes and warehouses, this is unavoidable.
HAICâs core philosophyâpredict dynamics through proprioception, ground it spatially, and adapt the model onlineâis a plausible path through that mess.
A mental model: HAIC as a contact-centric world model
Hereâs a practical way to categorize HAIC among world-model approaches.
Many world models in robotics prioritize vision:
- encode images into latent states,
- predict future latents,
- plan in latent space.
HAIC (at least from the abstract) is more contact-centric:
- the key signal is proprioception over time,
- the predictions are high-order motion of the object,
- the output is something the policy can use to avoid collisions and choose contacts.
Thatâs an important complement to vision-heavy methods, because a lot of real contact-rich interaction is better sensed through forces than pixels.
In manipulation, thereâs a saying: âvision gets you to the object; touch lets you finish the job.â HAIC is effectively trying to formalize that for whole-body humanoid interaction.
Limitations and open questions (what to watch next)
Even if HAIC works well in the paper, a few practical questions matter for real deployment:
1) How strong must the âstatic geometric priorsâ be?
If you need accurate CAD models of every object, it wonât scale. If coarse priors work (simple bounding shapes, approximate geometry), thatâs much more deployable.
2) How does it handle changes in contact mode?
Underactuated objects often switch regimes:
- rolling â slipping,
- one wheel stuck,
- caster flip,
- object hits an obstacle.
A dynamics predictor needs to adapt quickly when physics changes. The asymmetric fine-tuning may helpâbut the stability and safety properties become crucial.
3) Sample efficiency and data requirements
Online adaptation is great, but robots are expensive. How much data is required to get robust predictions across loads, surfaces, and terrains?
4) Multi-object scaling
The abstract mentions long-horizon tasks predicting dynamics of multiple objects. Thatâs huge if true, but multi-object interaction is where combinatorics explode. The representation and training pipeline need to stay stable.
Takeaways
- Underactuated object interaction is one of the most important âlast-mileâ problems for real humanoids.
- HAIC proposes a unified approach that does not rely on external state estimation.
- The key ingredients are:
- a proprioception-based dynamics predictor (including velocity/acceleration),
- a spatially grounded dynamic occupancy map built from motion + geometry priors,
- and asymmetric fine-tuning so the world model adapts to the policyâs evolving behavior.
- The reported tasks (skateboarding, cart pushing/pulling, long-horizon carrying) are exactly the kind of physics-heavy challenges that separate âgreat demosâ from âdeployable robots.â
If you want a future where humanoids do useful work in human spaces, systems like HAICâwhere contact, dynamics, and adaptation are first-classâare the direction to bet on.
Sources
- Li et al., âHAIC: Humanoid Agile Object Interaction Control via Dynamics-Aware World Modelâ (2026), arXiv:2602.11758. https://arxiv.org/abs/2602.11758
- Project page: https://haic-humanoid.github.io/