What VLA Models Still Cannot Do Safely in Robotics

Vision-language-action models are one of the most important ideas in Physical AI.

They connect perception, language, and robot actions in one learned policy. That is a real step forward. A robot that can look at a scene, understand an instruction, and produce an action sequence is a different class of system from a scripted state machine with a few perception nodes bolted on.

But a VLA model is not a safety architecture.

The useful question for robotics engineers, AI engineers, founders, and CTOs is not “will VLA models work?”

They already work in constrained demonstrations and are improving quickly.

The useful question is:

What should a VLA model still not be allowed to do by itself on a real robot?

My answer: a VLA model should not own command authority, safety limits, actuator timing, workspace admission, recovery behavior, or final responsibility for physical motion. It can propose skills, select goals, interpret scenes, and help generalize behavior, but a separate control and safety stack must validate, supervise, bound, and sometimes reject its outputs before anything reaches actuators.

This article extends the safety boundary from why LLMs should not control motors and robots into the VLA era. It also connects to the broader Physical AI framing in what Physical AI really means, the older VLA overview in how VLA models are changing robotics, and the authority model in splitting responsibility between an LLM, ROS 2, and a microcontroller.

Key takeaways

VLA models are promising robot policies, but they should be treated as proposal engines unless the deployment has strict safety envelopes, validation gates, and runtime supervision.
The main safety gap is not language understanding. It is command authority under uncertainty: partial observability, distribution shift, contact dynamics, timing, tool use, and recovery.
A VLA should not directly own E-stops, safety-rated limits, actuator control loops, collision envelopes, or degraded-mode transitions.
Safe deployment needs a layered architecture: VLA inference, skill registry, command validator, planner/controller, runtime supervisor, watchdogs, and hardware-level failsafes.
Benchmarking a VLA for robotics should include negative tests: unseen objects, occlusion, ambiguous instructions, stale perception, forbidden zones, moving humans, tool misuse, and recovery from failed grasps.
The durable artifact is a VLA safety gate: a matrix that defines what the model may propose, what deterministic software must validate, and what hardware or real-time control must own.

Citation-ready answer

Vision-language-action models should not be trusted as complete robot safety systems. A VLA can map visual observations and language instructions into robot actions, but safe robotics deployment still requires deterministic command validation, workspace limits, collision checking, runtime supervision, watchdogs, degraded modes, real-time control, and hardware failsafes outside the model. The practical rule is simple: let the VLA propose goals or skills; let the robot safety and control stack decide what is physically admissible.

What a VLA actually changes

A VLA model tries to unify three things:

1
2
3

vision observation
  -> language/task context
  -> action representation

The key research insight is that robot actions can be represented in a format that a large model can learn alongside vision and language. The RT-2 paper describes this direction clearly: actions are expressed as tokens so a vision-language model can be co-trained on web-scale vision-language data and robot trajectory data.

That matters because it gives the robot policy more semantic grounding. The model may generalize better to new objects, instructions, and scene concepts than a narrow imitation policy trained only on one task.

The Open X-Embodiment work pushes the same direction from the data side: more robots, more embodiments, more tasks, more demonstrations. OpenVLA then makes the open-source question more practical by training a 7B-parameter VLA on large real-world robot demonstrations and releasing code and checkpoints.

This is important progress.

It still does not make the learned policy the safety boundary.

The safety problem is authority under uncertainty

Robots fail differently from chatbots.

A bad text answer can be corrected. A bad robot command can hit a fixture, break a gripper, pinch a cable, drop a tool, damage a part, or enter a human workspace.

The hard part is not only “does the VLA understand the instruction?”

The hard part is:

1
2
3

Does this command remain safe under the current physical state,
sensor uncertainty, timing delay, contact condition, actuator limit,
workspace boundary, and recovery path?

A VLA is not naturally good at owning that entire question. It may infer intent from vision and language, but physical admissibility depends on state estimation, calibration, collision geometry, control-loop timing, force limits, tool constraints, safety zones, and hardware health.

Those are system responsibilities.

The VLA safety boundary

Use this as the minimum architecture for a robot that uses a VLA near real hardware:

camera / sensors
  -> perception preprocessor
  -> VLA model
      -> proposed goal / skill / action tokens
  -> command validator
      -> schema check
      -> workspace check
      -> collision check
      -> speed / force / torque limits
      -> tool and object constraints
      -> freshness and confidence checks
  -> planner / controller
  -> runtime supervisor
      -> watchdogs
      -> safety envelope
      -> degraded mode manager
      -> stop / hold / retreat policy
  -> robot hardware

The VLA is in the proposal path.

It is not the final authority.

That distinction is the same pattern I use in evaluating a local LLM for robotics tool use: the model may select or parameterize a tool, but a deterministic layer must decide whether the request is admissible.

What VLA models should not own

Responsibility	Why the VLA should not own it	Safer owner
Emergency stop	Must work independently of model behavior and compute health	Hardware safety circuit or safety PLC
Joint-level control loop	Requires deterministic timing and stability	Real-time controller or microcontroller
Collision envelope	Needs geometry, limits, and conservative checks	Motion planner and safety supervisor
Workspace admission	Must block forbidden zones even if language is persuasive	Command validator
Tool authority	Tool misuse can create physical risk	Skill registry plus policy gate
Degraded-mode transition	Depends on sensor confidence, health, and fault state	Runtime supervisor
Recovery after contact	Requires force, timing, and local state	Controller and supervisor
Human proximity rule	Must be conservative and sensor-driven	Safety-rated or supervised perception layer
Audit and incident trace	Needs replayable system events	Observability and safety log stack

This is not anti-VLA.

It is pro-robot.

A VLA becomes more useful when it is surrounded by systems that let it be creative where creativity helps and conservative where physics matters.

Failure modes that still matter

A VLA safety review should start with failure modes, not demo videos.

Failure mode	Example	Required control
Ambiguous instruction	“Put that over there” with two possible targets	Ask for clarification or restrict to low-risk motion
Object confusion	Similar tools, labels, colors, or partially occluded objects	Perception confidence threshold and target verification
Distribution shift	New lighting, camera angle, gripper, surface, or fixture	Scenario evals and fallback to manual/known skill
Unsafe affordance	Model sees a handle but the object is hot, sharp, powered, or fragile	Object risk metadata and tool constraints
Workspace violation	Proposed path crosses a human zone or keep-out volume	Motion planning and geofence rejection
Timing staleness	Camera frame or TF transform is too old	Freshness gate before command execution
Contact surprise	Grasp slips, object moves, tool catches	Force/torque monitoring and abort policy
Overconfident recovery	Model keeps trying after repeated failure	Retry budget and degraded-mode transition
Hidden state	Door latched, cable attached, fixture locked	State-check skill or human confirmation
Instruction injection	Visual marker or user text requests unsafe behavior	Instruction authority separation

The important pattern is that many failures are not solved by a larger model alone. They require external state, physical constraints, conservative admission rules, and recovery policies.

Command granularity decides risk

Not all VLA outputs have the same risk.

There is a large safety difference between these three outputs:

1
2
3

high-level goal: "sort the blue blocks into the left tray"
skill call:      pick_and_place(object=blue_block, target=left_tray)
actuator stream: joint velocities at 100 Hz

The closer the VLA gets to actuator-level output, the more it owns timing, stability, and contact behavior. That is the dangerous direction for most teams.

For practical robotics projects, I prefer this boundary:

VLA output level	Production recommendation
Scene description	Usually safe if not used directly as authority
Goal proposal	Useful with validation and human/operator override
Skill selection	Useful if skills are typed, bounded, and logged
Skill parameterization	Acceptable only with strict schema and physical checks
Trajectory proposal	Requires planner validation, collision checking, and simulation
Direct actuator command	Avoid outside controlled research or very narrow certified stacks

ROS 2 actions are often a good boundary for long-running robot skills because they support goals, feedback, results, and cancellation. The ROS 2 action design explicitly models an action server that may accept or reject goals, provide feedback, and handle cancel requests. That is the kind of interface a VLA should face, not a raw motor bus.

The ROS 2 actions design is useful here because it separates the requester from the executor. Let the VLA request a bounded action. Let the action server, planner, and supervisor decide whether that goal can execute.

Lifecycle state belongs outside the model

A robot that uses a VLA needs explicit operating modes:

disabled
calibration
manual
supervised autonomy
autonomous bounded task
degraded mode
fault hold
emergency stop

The model should not silently move the system between these states.

The ROS 2 managed node lifecycle gives a useful software pattern: nodes can be unconfigured, inactive, active, or finalized, with transitions exposed to a supervisory process. The same idea applies at the robot architecture level. VLA inference can be active while robot motion remains inactive. A perception stack can be healthy while an actuator path is fault-held. A planner can accept goals while the supervisor blocks execution.

That separation is not bureaucracy. It is how you avoid one fluent model output becoming system-wide authority.

A practical VLA deployment matrix

Before deploying a VLA near a real robot, fill this matrix.

Layer	Allowed VLA influence	Hard boundary
Perception	Interpret scene, describe objects, suggest target candidates	Cannot override sensor validity or calibration state
Task planning	Propose task sequence or skill choice	Cannot bypass skill registry
Skill parameters	Suggest object, pose, tray, speed class, tool	Must pass schema, workspace, object, and freshness checks
Motion planning	Provide goal constraints, not raw motion authority	Planner owns collision and kinematic feasibility
Control	No direct control-loop authority	Controller owns timing, stability, limits
Safety	No ownership of E-stop, stop category, or safety-rated boundary	Safety system owns stop and interlock
Recovery	Suggest next step after failure	Supervisor owns retry budget and degraded mode
Audit	Provide explanation	Logs own evidence and replay

If a row says “the VLA owns this alone,” the design is probably too optimistic.

The benchmark plan that matters

Most VLA demos test success on intended tasks.

Production readiness needs tests for wrongness.

Run at least these test classes:

Test class	What to measure	Pass condition
Known task success	Can the model complete the intended skill?	Meets task-specific success rate
Ambiguity handling	Does it ask or choose safely when the request is unclear?	No unsafe default action
Forbidden-zone rejection	Does the system reject goals outside the workspace?	100% rejection
Stale perception	Does it execute with old frames or transforms?	No execution past freshness limit
Occlusion and clutter	Does it confuse targets under partial visibility?	Conservative failure or clarification
Human intrusion	Does motion stop or hold when a person enters the zone?	Stop/hold within defined limit
Repeated failure	Does it keep trying after bad grasps or blocked motion?	Retry budget enforced
Tool misuse	Does it call the wrong skill or unsafe tool?	Skill registry blocks invalid calls
Recovery	Does it retreat, hold, or degrade safely?	Supervisor selects safe state
Audit replay	Can you reconstruct the decision chain?	Logs include observation, proposal, validation, execution result

The benchmark should include adversarial and boring failures. The boring failures are the ones that happen in real deployments: lighting changes, loose calibration, an operator moving a part, a cable in the way, a gripper pad wearing out, or a fixture moved by 20 millimeters.

Runtime supervision is not optional

The runtime supervisor is the layer that asks:

1	Given the robot's current state, is this command still safe now?

It should check:

command age
sensor freshness
TF transform freshness
robot mode
actuator health
collision state
speed and force limits
workspace boundary
retry count
human proximity
stop input state

This is close to the architecture in designing degraded modes for AI-enabled robots. A robot should not have only “working” and “failed.” It should have explicit reduced-capability modes where risky functions are disabled while safe diagnostic or manual operations remain available.

For VLA systems, degraded mode is especially important because the AI layer can fail softly. It may still produce plausible outputs when perception is stale, confidence is low, or the task has drifted outside training distribution.

Plausible is not safe.

What good looks like

A VLA-enabled robot is much closer to production when these statements are true:

The VLA can only propose goals, skills, or bounded parameters.
Every skill has a declared schema, owner, risk tier, and allowed mode.
The command validator can reject unsafe outputs without asking the model.
The planner checks kinematics, collision, and workspace boundaries.
The controller owns real-time timing and actuator limits.
The supervisor owns operating mode, degraded mode, retries, stop, and recovery.
The hardware safety path still works if the model, GPU, network, ROS graph, or host computer fails.
Logs can reconstruct perception input, VLA proposal, validation decision, action execution, and stop reason.
Benchmarks include unsafe, ambiguous, stale, occluded, and out-of-distribution scenarios.

That is the difference between a VLA demo and a VLA robot architecture.

FAQ

Are VLA models unsafe by definition?

No. They are not unsafe by definition. They are unsafe when treated as complete control and safety systems. A VLA can be valuable when it is constrained to propose goals or skills and when deterministic systems validate physical admissibility.

Can a VLA directly output robot actions?

In research, yes. In production, direct actuator authority is usually the wrong boundary. A safer architecture converts VLA outputs into bounded goals or skill requests, then uses planners, controllers, supervisors, and hardware safety systems to decide what can actually move.

What is the biggest gap between VLA demos and deployed robots?

The biggest gap is not object recognition. It is robust behavior under physical uncertainty: partial observability, calibration drift, contact dynamics, humans entering the workspace, stale sensor data, tool misuse, and recovery from failed actions.

Should VLA systems use ROS 2 actions?

Often, yes. ROS 2 actions are a good interface for long-running robot skills because they support goal submission, feedback, result handling, and cancellation. They are a better boundary for AI-generated skill requests than raw topics carrying motor-level commands.

How should a team start testing a VLA safely?

Start offline with logged scenes and simulated commands. Then use a shadow mode where the VLA proposes actions but the robot does not execute them. Compare proposals against human/operator decisions and deterministic validators. Only then allow execution of low-risk, bounded skills with hard workspace limits and immediate stop paths.

What should stay outside the model forever?

Emergency stop, safety-rated interlocks, actuator control loops, workspace hard limits, watchdogs, and final stop authority should stay outside the VLA. These are robot safety responsibilities, not language-model responsibilities.