The Last 20%: What Self-Driving Cars Teach Us About AI Agents

by

|

Jul 3, 2025

|

Agentic AI promises extraordinary possibilities: for individuals, organisations, and the systems they depend on. But as we’re beginning to learn, real-world autonomy isn’t just about building smarter systems. It’s about designing for messy, unpredictable, often irrational environments; where logic breaks, edge cases multiply, and delegation without context can quickly become disaster.

A recent experiment by Anthropic made this painfully clear. In a controlled study, their AI agent was given a seemingly simple job: operate a vending machine. Instead of succeeding, it had what researchers described as a cognitive breakdown. The agent couldn’t resolve internal logic conflicts, failed to adapt to unexpected inputs, and spiralled into increasingly alarming actions. The more autonomy it had, the less reliable it became under pressure.

It’s an amusing story, akin to the Sorcerer’s Apprentice. It’s also a warning.

And it echoes what the autonomous vehicle industry has known for years: digital autonomy might feel impressive in the lab, but it’s meaningless until tested in the wild. Elon Musk claimed autonomy was easy in 2015 and ten years later we’re still waiting. It turns out: The first 80 percent of progress? Relatively rapid, even easy. But it’s the last 20 percent, full of edge cases, trade-offs, and real-world weirdness, where 80 per cent of the difficulty lies.

So, how do we design agentic AI to handle that complexity? We take a page from the self-driving car playbook, and build a maturity model for digital autonomy.

Autonomy Breaks When Reality Gets Messy

We like to think of digital agents as clean systems: if we give them clear goals and enough data, they’ll figure out the rest. But as Anthropic’s “Project Vend” showed, even tightly scoped tasks can fall apart when autonomy meets uncertainty.

This isn’t a failure of AI ambition. It’s a failure of agent design maturity.

That’s why autonomous vehicles have a structured scale. From Level 0 (no automation) to Level 5 (full self-driving in all conditions), there’s a progressive roadmap that makes space for complexity. It reflects not just capability but trust, control, and governance at each stage.

We need the same for AI agents.

So, our team at DataSapien has made a version of that framework, adapted for digital autonomy to explore where value emerges, where risk multiplies, and why edge-native AI is essential along the way.

The Agentic AI Autonomy Ladder (A0–A5)

Level	Agent Capability	Example	Edge-Native AI Required?
A0	Manual	Calculator, search bar	No: No agency
A1	Reactive Suggestions	Recommender engine	Optional: Low sensitivity
A2	Execution Autonomy	“Send this email,” “Book my calendar”	Yes: Task execution with personal data
A3	Contextual Autonomy	Adjusts to user intent, context, time	Critical: Privacy-sensitive inference
A4	Goal Autonomy	Plans and completes multi-step goals	Essential: High trust, full loop control
A5	General Autonomy	Acts across domains, self-initiating	Not viable without: Extreme risk

The value of AI autonomy starts to show at Level A2. This is when agents stop just suggesting and start acting. But this is also the moment where privacy, explainability, and personalisation become non-negotiable. From there, complexity spikes:

A3 agents operate with contextual nuance

A4 agents act independently in bounded domains

A5 agents attempt open-domain decision-making (and almost everyone with knowledge of the space will agree that we’re far from ready for this).

Autonomy isn’t a feature you launch. It’s a level you earn through building dependable and trustable use cases. Progressively.

(For a deeper dive into the near future of AI Agents and Autonomy, check out this Y-Combinator presentation from Andrej Karpathy, from a couple of weeks ago at the time of writing)

Why Edge-Native AI Is Essential at Levels A2–A4

When agents begin making decisions on your behalf, where those decisions are made matters.

Level A2: Executing a calendar booking or sending an email involves sensitive personal data. This data shouldn’t leave the device. And, with personal edge-AI, doesn’t need to.

Level A3: Context-aware agents process your routines, health states, personality traits, mood, location, and habits. This must be inferred locally, not exposed to the cloud.

Level A4: End-to-end goal automation means chaining multiple actions across domains. Trust requires transparency, traceability, and full human control. This is only achievable through local, personal, on-device intelligence.

Cloud-based inference at these levels is not only fragile, it’s highly unsafe. It creates black-box decisions, unpredictable latency, and uncontainable risk for people and organisations.

Edge-native private and personal AI isn’t a performance enhancement. It’s a trust requirement. We’d go further and state that Agentic AI doesn’t work at scale without it.

That’s why DataSapien is built from the ground up to power agentic workflows locally, through our Personal AI SDK, enabling safe, contextual autonomy while keeping zero-shared data on-device.

What We Can Learn from the Autonomous “Road Ahead”

Autonomous cars taught us that scaling autonomy isn’t a product sprint, it’s an ecosystem challenge. It requires:

Clear capability thresholds

Real-world guardrails

Evolving governance models

And most importantly: deep respect for evolving edge cases

The same applies to AI agents. We’re only just entering the valuable middle band Levels (A2 to A4) where delegation becomes useful, and trust becomes essential.

That’s where the future is being built right now. And that’s where organisations need to design for resilience, not just intelligence.

Coming Next: The Governance Layer for Autonomy

As autonomy rises, responsibility must rise with it. In Part 2 of this series, we’ll introduce the concept of Net Fiduciaries: organisations that commit not just to building capable agents, but to acting in the user’s best interest when autonomy is delegated.

Because the future of AI isn’t just about what agents can do.

It’s about what happens when they fail: and who’s accountable when they do.

Drop us a line if you’d like to hear more, we’d love to hear from you.