digest

Awesome Robots Digest Issue 24: World-Action Models, Open VLA Brains, and Contact-Rich Manipulation

Bob Jiang

March 27, 2026

5 min readFeatured

TL;DR

  • The stack is getting a new layer: “world-action models” (not just world models) are showing up as a serious attempt to unify prediction + control in one scalable interface.
  • Open-source VLA is maturing: we’re moving from “cool demo weights” to more complete packages: datasets, evals, deployment recipes, and robot-specific adapters.
  • Manipulation is going force-first: tactile + diffusion policies are a strong hint that the next step isn’t more pixels — it’s better contact.

Introduction

The robotics conversation is quietly tightening around one uncomfortable truth: we’ve been over-indexing on perception. The harder problems are (1) turning perception into actions that survive contact, (2) making that pipeline repeatable across robots, and (3) shipping it in a stack you can debug.

This week’s stories all rhyme with that theme: NVIDIA is pushing a more integrated “Physical AI” stack narrative, open VLA frameworks keep getting closer to something you can actually build on, and tactile/force-aware learning is showing up as the missing leg of the stool for real manipulation.

Here’s what mattered from the last week.

Top News & Breakthroughs 📰

1) NVIDIA expands its Physical AI stack: Cosmos 3 + GR00T updates + more open model families

NVIDIA’s latest announcements continue the “Android layer for robotics” push: more open models, more simulation scaffolding, and tighter integration across data → sim → policy → deployment.

Why it matters:

  • The winners in humanoids will be the teams that can iterate fast without breaking safety. That requires a stack, not just a model.
  • The framing is shifting from “robot foundation models” to world + action (predicting what happens and what to do next).

Read (NVIDIA Newsroom): https://nvidianews.nvidia.com/news/nvidia-expands-open-model-families-to-power-the-next-wave-of-agentic-physical-and-healthcare-ai

2) World-action models: why “predicting the world” is no longer enough

If you’ve been tracking the “world model” hype cycle, the next natural step is: can the model propose actions that change the world, not just forecast it? That’s the core premise behind the world-action framing.

Why it matters:

  • Action-conditioned prediction is a cleaner abstraction for robotics than passive next-frame prediction.
  • It pushes the industry toward policy interfaces that are evaluable (what actions did you choose, under what uncertainty, with what failure mode?).

(We covered this shift in more detail this week.) https://www.awesomerobots.xyz/blog/nvidia-isaac-groot-n2-world-action-model-physical-ai

3) Open-source VLA keeps getting sharper: Unitree’s UnifoLM-VLA-0 as an “embodied brain” pattern

Unitree’s UnifoLM-VLA-0 is one more signal that the open ecosystem is converging on a pragmatic recipe: VLM backbone + robotics continued pretraining + robot datasets + a deployment path.

Why it matters:

  • Open VLA is becoming a real forcing function on closed stacks: better baselines, faster iteration, and shared evaluation.
  • The practical bottleneck is still data quality + action representations — but the ecosystem is building the plumbing.

Project page: https://unigen-x.github.io/unifolm-vla.github.io/

Code: https://github.com/unitreerobotics/unifolm-vla

(Our write-up this week.) https://www.awesomerobots.xyz/blog/unitree-unifolm-vla-0-open-source-embodied-brain-g1

4) Contact is the frontier: tactile diffusion policies for force-aware manipulation

Vision-only manipulation works until it doesn’t — and it usually doesn’t right at the moment of contact.

A growing line of work on tactile-conditioned diffusion policies is pushing toward policies that don’t just “see and move,” but explicitly learn force-aware behaviors for contact-rich tasks.

Why it matters:

  • Force/torque and tactile signals are not “aux sensors”; they’re the difference between grasping and holding, between pushing and jamming.
  • Diffusion-style policy learning is a natural fit for multi-modal, multi-solution action spaces.

Paper (arXiv): https://arxiv.org/abs/2510.13324

(Our explainer this week.) https://www.awesomerobots.xyz/blog/tactile-diffusion-policies-force-aware-manipulation

5) The boring hardware story that decides deployments: power conversion + power integrity for humanoids

Humanoid demos don’t fail because the transformer wasn’t big enough; they fail because the power system browns out, overheats, or can’t survive peak loads during contact.

This week’s hardware deep-dive looks at pSemi power modules as a proxy for the bigger theme: power is a system-level limiter on real humanoid capability and uptime.

Read: https://www.awesomerobots.xyz/blog/psemi-pe26100-pe25304-power-modules-humanoid-robots

6) Field robotics reminder: “small” perception improvements compound in agriculture

A nice contrast to humanoid hype: practical perception + grasp selection in agriculture (e.g., tomato harvesting) is still one of the highest-signal examples of robotics delivering measurable ROI—because the evaluation loop is brutally honest.

Read: https://www.awesomerobots.xyz/blog/harvest-ease-estimation-tomato-picking-robots

Research Spotlight 🔬

Force-aware manipulation is becoming a pipeline, not a trick

If you zoom out, tactile + force-aware learning is starting to look like the next “standard stack,” similar to how VLA pipelines became the default story for generalist policies:

  • cheap/robust tactile sensing,
  • demonstrations with meaningful contact variation,
  • diffusion policies that model multiple valid trajectories,
  • evaluation that penalizes the right failure modes (slip, crush, jam, stall).

The big opportunity: build datasets and benchmarks where contact quality is measurable and repeatable.

Event Horizon 📅

ICRA 2026 (June 1–5, 2026 • Vienna)

ICRA is still the best “what’s reproducible” conference. If you’re building a real stack, it’s worth watching for:

  • simulation → real transfer tooling,
  • manipulation benchmarks with contact-rich tasks,
  • humanoid locomotion papers that include real failure analysis.

Conference: https://2026.ieee-icra.org/about/

Tool / Resource of the Week 🛠️

Featured: UnifoLM-VLA (Unitree) — a good reference for open VLA packaging

Even if you don’t use this exact stack, it’s a useful reference for how open VLA projects should ship in 2026:

  • clear training/eval story,
  • explicit backbone choice,
  • robotics-specific continued pretraining,
  • and a deployment narrative.

Repo: https://github.com/unitreerobotics/unifolm-vla

Community Corner 👥

Open VLA is forcing better hygiene

As more “robot brains” go open-source, the community pressure shifts in a healthy direction:

  • publish evals (not just hero demos),
  • document action spaces and robot adapters,
  • share failure modes and contact edge cases.

That’s how we get from impressive clips to systems people can trust.

Conclusion 🎯

This week’s digest is basically one message: the robotics stack is getting more honest.

World-action framing pushes us to evaluate policies as decision-makers, not video predictors. Open VLA pushes faster iteration and better baselines. And tactile/force-aware policies remind us that the real world is mostly contact.

See you next week.

Stay Connected

Share this article:

Tags:

#robotics#AI#humanoid robots#VLA#world models#robot learning#weekly digest

About Bob Jiang

Robotics engineer and AI researcher with 10+ years experience in agile software management, AI, and machine learning.

Related Articles