Designing a legal AI routing layer

Who it is for

Teams responsible for how AI is actually used, not just approved.

If you’re dealing with multiple tools, multiple models, or competing pressures from cost, speed and risk, this is the layer you’re already operating in, whether you’ve named it or not.

The real problem

Most firms don’t have a routing layer. They have:

Default tools that get overused
Workarounds that become standard practice
“Trusted” individuals making judgement calls no one else can reproduce

That works until it doesn’t.

When something goes wrong, the question is simple: why did this task go through that path?

If the answer is “that’s what we usually do”, you don’t have a system. You have drift.

What a routing layer actually is

A routing layer is not model selection.

It is a policy system that decides:

which tasks are allowed to run
where they are allowed to run
under what constraints
with what level of oversight
and who is accountable for the outcome

It sits between user intent and execution.

Done properly, it becomes the control point for cost, risk and consistency.

Start with task shape, not tools

Most implementations start with tools. That’s the mistake.

You need to define tasks in a way that reflects legal work as it actually happens:

Clause extraction from known document types
Open-ended research across uncertain sources
Drafting with direct client reliance
Internal summarisation with no downstream exposure

These are not the same problem. Treating them as interchangeable is how routing breaks down.

A usable taxonomy is:

Deterministic vs interpretive
Closed corpus vs open world
Internal vs external consumption
Low vs high reliance

If your taxonomy can’t distinguish these cleanly, your routing decisions will collapse under edge cases.

Add policy dimensions that force trade-offs

Routing only becomes meaningful when it forces explicit trade-offs.

At a minimum:

Sensitivity What is the data exposure risk? Client confidential, restricted, public?
Destination Where does the output go? Internal note, client deliverable, system of record?
Reliance Will someone act on this without re-checking?
Urgency Is speed critical, or is there time for layered review?

These are not labels. They are constraints.

For example:

High sensitivity + external destination + high reliance → eliminates most cloud models, introduces mandatory review, may require private inference
Low sensitivity + internal + low reliance → allows cheaper models, faster paths, minimal oversight

This is where routing starts to control cost and risk in a real way.

Define allowed paths, not preferred tools

Once tasks and constraints are clear, define paths, not tools.

A path is a combination of:

Model tier (small local, mid-tier hosted, frontier)
Execution environment (on-device, private cloud, public API)
Retrieval approach (none, constrained RAG, open search)
Oversight (none, sampling, mandatory human review)
Block conditions (when the task should not proceed)

Example:

“Clause extraction, low sensitivity, internal use” → Small model, local or low-cost hosted, no retrieval, no review

“Drafting client-facing advice, high sensitivity, high reliance” → Restricted model set, private environment, structured inputs, mandatory human sign-off

This framing removes ambiguity. It also makes it auditable.

Decision rights are where most systems fail

Even well-designed routing matrices fail because no one defines who can change them.

You need explicit answers to:

Who owns the routing policy?
Who can approve a new path?
Who can grant an exception?
Who signs off on high-risk categories?

Without this, exceptions become the default.

In practice:

Policy ownership should sit across legal, risk and engineering
Exceptions should be time-bound and named
High-risk changes should require dual approval

If a partner can override routing without traceability, your routing layer is cosmetic.

Versioning is not optional

Routing decisions change over time. Models improve, costs shift, regulations tighten.

If you don’t version your routing policy:

You can’t explain past decisions
You can’t demonstrate improvement
You can’t isolate where something went wrong

Each version should capture:

Task definitions
Policy dimensions
Allowed paths
Decision rights

And critically:

What changed, and why

Minimum artefacts

If you don’t have these, you don’t have a routing layer:

Routing matrix Task × sensitivity × destination × reliance → allowed paths
Exception register Owner, scope, justification, expiry
Policy change log Versioned updates with rationale
Execution logs tied to decisions Not just prompts and outputs, but which route was taken and why

30-day rollout that actually works

Week 1 Map the top five task types that generate real volume or risk. Ignore edge cases.

Week 2 Draft a routing matrix with legal, risk and engineering in the same room. Force decisions.

Week 3 Run scenarios. Break it deliberately. Find where the policy is vague or contradictory.

Week 4 Publish v1 with named owners, explicit constraints and an exception process.

Do not aim for completeness. Aim for clarity on the highest-impact paths.

Where most teams go wrong

They optimise for flexibility instead of control
They treat routing as a UX feature rather than a governance layer
They log outputs but not decisions
They assume cheaper models will stay cheap
They avoid defining “blocked” states

A routing layer that never says “no” is not a routing layer.

Checklist

Task taxonomy reflects real legal work, not tool categories
High-risk combinations are explicitly constrained or blocked
Private inference conditions are defined, not implied
Decision rights are named and enforced
Exceptions are time-bound and auditable
Routing decisions are logged alongside outputs

Use the Routing Simulator to pressure-test policy choices before they are adopted. Treat it as a safe environment to explore failure modes, not just validate happy paths.