Most legal AI still assumes a simple shape: document in, prompt in, answer out.
It works, up to a point. You get something coherent, often convincing, sometimes genuinely useful. That surface quality has been enough to drive adoption.
But it also hides where legal work actually succeeds or fails.
The real work is not answering a question about a document. It is deciding what needs to happen next, who is responsible, what can be relied on, and what must be checked before anything moves forward.
That is a planning problem.
The illusion of completeness
A well-formed answer creates a sense that the task is done.
In practice, it rarely is.
A clause summary does not tell you:
- whether the clause matters in the current phase of the deal
- whether it conflicts with another obligation
- whether it has already been addressed elsewhere
- whether it needs escalation before being relied on
The answer feels complete because it is self-contained. Legal work is not.
Most problems in matters are not caused by misunderstanding a clause. They come from missed steps, unclear ownership, or decisions made without full context.
Optimising for better answers does not fix that.
Where things actually break
Take a typical workflow:
A document is reviewed. Risks are identified. A summary is produced. Someone reads it and moves on.
What is missing is not more detail in the summary.
It is everything around it:
- Who owns the risk that was flagged?
- Has it been resolved, accepted, or deferred?
- Does it affect other parts of the matter?
- Should it block the next step?
- Has anything changed since the summary was generated?
None of that sits inside the answer.
It sits in the coordination of the work.
This is where most systems are silent.
Documents are inputs, not the system
Legal tech has historically treated documents as the centre of gravity.
That made sense when documents were the primary artefact being worked on.
Once AI is introduced, that assumption becomes limiting.
Documents become inputs into a broader system:
- tasks are created from them
- obligations are tracked beyond them
- decisions are made in response to them
- state evolves independently of them
If the system only understands documents, it cannot understand the matter.
That gap is where risk accumulates.
The missing layer
What is missing is a system of record for the work itself.
Not just:
- what documents exist
- what answers have been generated
But:
- what stage the matter is in
- what decisions are pending
- what has been agreed
- what is still at risk
- what is allowed to happen next
Without that, every interaction with AI starts from a partial view.
The model may be accurate within that slice, but the slice itself is incomplete.
Orchestration as a first-class concern
Orchestration is not a technical detail. It is the structure of the work.
It answers questions like:
- what needs to happen before this output can be used?
- who is allowed to approve it?
- what context must be present for it to be valid?
- what changes once it is accepted?
This sits alongside model choice, not beneath it.
A more capable model does not solve a coordination problem. It often makes it harder to spot, because the outputs look better.
Why this matters now
As AI becomes embedded in workflows, the failure modes change.
You move from:
- incorrect answers
to:
- correct answers used in the wrong way
- incomplete outputs relied on too early
- decisions made without visibility of the full state
These are harder to detect and harder to unwind.
They are also where liability sits.
Firms that continue to optimise for answer quality alone will see diminishing returns.
The gains are real, but they plateau.
The risks continue to compound.
A different way to think about it
If you model legal work as a planning problem, a different set of priorities emerges.
You start to focus on:
- how tasks are defined and sequenced
- how state is captured and updated
- how decisions are gated and recorded
- how context is carried across steps
AI still plays a role, but as part of a system rather than the centre of it.
The question shifts from: "Can the model answer this?"
to: "Should this answer be used, by whom, and what happens next?"
That is a harder question.
It is also the one that determines whether the system can be trusted.
Where this leads
Once planning becomes the focus, several things follow naturally:
- matter state becomes a first-class concept, not an afterthought
- evaluation moves from prompt testing to scenario testing
- governance shifts from policy documents to enforced controls
- routing decisions consider context, not just cost or capability
At that point, the system starts to resemble other mature engineering disciplines.
Not because legal work becomes software.
But because coordination, sequencing, and control start to matter more than individual outputs.
Closing thought
Better answers will continue to improve legal workflows.
They are not the limiting factor anymore.
The constraint is how those answers are integrated into the work.
Until that is addressed, most systems will remain impressive in isolation and fragile in practice.