Back to Aira

April 6, 2026 · 8 min read

Why Every AI Agent Needs a Governance Layer

AI agents are making consequential decisions — approving loans, triaging patients, executing trades — with no oversight, no audit trail, and no proof of what happened. Governance can't be an afterthought bolted on later. It needs to be infrastructure, built into every decision from the start.

The Problem: Autonomous Decisions, Zero Accountability

A lending agent approves a €200,000 loan. A support agent escalates a case and shares customer PII with a third-party vendor. A trading agent executes a position based on a model prediction. All of these happen in milliseconds, with no human in the loop and no record beyond a mutable log file.

Six months later, the regulator asks three questions:

  1. Who authorized this decision? Was there a policy? Did a human review it?
  2. What was the reasoning? Which models were involved? Did they agree?
  3. Can you prove it? Not a database entry you control — tamper-proof, independently verifiable proof.

Most companies can't answer any of these. The agent acted autonomously. Nobody reviewed it. The only "evidence" is a log file the company can edit at will.

The EU AI Act (Article 14, enforceable August 2026), FINRA 2026 guidelines, and SR 11-7 all require answers to these questions. The fines for not having them range from €7.5M to €35M.

Multi-Model Consensus: Don't Trust One Model

Single-model architectures have a fundamental flaw: you're trusting one black box to be right. If the model hallucinates, confabulates, or drifts, there's no check. No second opinion. No signal that something went wrong.

Aira's consensus engine fans out every decision to 2-5 independent models. Each model evaluates the same action against the same policy. Aira scores agreement across responses and flags disagreement automatically.

# 3 models evaluate a loan decision against the same policy:
#
# Claude Sonnet:  "DENY — loan amount exceeds 2x annual income"
# GPT-5.2:        "DENY — credit score 742 is borderline for €200K"
# Gemma 4 31B:    "REVIEW — needs more context on existing debt"
#
# Agreement score: 0.67 (2/3 DENY)
# Disagreement flag: true (mixed verdict)
# Action: held for human review
#
# Receipt: cryptographic proof that 3 models evaluated,
#          2 denied, 1 requested review

When models agree, you have corroborated confidence. When they disagree, you have an early warning system. Either way, the evaluation is logged with a cryptographic receipt — not a log line, but a signed, timestamped proof of what each model said.

Policy Engine: Four Modes of Governance

Consensus tells you whether models agree. Policies tell you what's allowed. Aira's policy engine evaluates every agent action before it takes effect, in one of four modes:

Rules Mode

Deterministic conditions. No LLM involved. Instant evaluation. Configure in the dashboard — no code changes needed.

# Policy: "All loan decisions over €100,000 require human approval"
# Conditions: action_type == "loan_decision" AND amount > 100000
# Decision: require_approval
#
# Evaluated in <1ms. No model call. No ambiguity.

AI Mode

A single LLM evaluates the action against a natural-language policy. Useful for nuanced decisions where rigid rules fall short.

# Policy (plain English):
# "Any action involving customer PII or financial data
#  exceeding €5,000 must be reviewed by compliance."
#
# Aira sends the action + policy to the evaluating model.
# Model returns: APPROVE, DENY, or REVIEW — with reasoning.
# The evaluation itself gets its own cryptographic receipt.

Content Scan Mode

Automatically scans action payloads for sensitive content — PII, credentials, toxic language — before the action proceeds. No policy authoring required; the scanner applies built-in detection rules and flags violations.

# Action payload: "Please send the SSN 123-45-6789 to vendor@example.com"
#
# Content scan detects: SSN pattern, external email address
# Decision: DENY — sensitive data exfiltration blocked
# No model call needed. Deterministic detection.

Consensus Mode

Multiple models evaluate the same policy independently. If they disagree, the action is automatically held for human review. This is the highest-assurance mode — no single model can unilaterally approve a consequential action.

# 3 models evaluate a PII-sharing action:
#
# Claude Sonnet:  "DENY — sharing PII with external vendor violates policy"
# GPT-5.2:        "DENY — data minimization principle not met"
# Gemma 4 31B:    "APPROVE — vendor has DPA in place"
#
# Disagreement detected → action held for human review
# All three evaluations recorded with individual receipts

Policies are configured in the Aira dashboard by compliance teams. Developers don't need to change code when policies change. The governance layer is decoupled from application logic.

Human-in-the-Loop: Approval When It Matters

When a policy triggers require_approval, the action is held. It doesn't execute. It doesn't proceed. It waits.

Designated approvers receive an email with the full action context and secure single-use Approve/Deny links. They can also review and act from the Aira dashboard. An Ed25519 receipt is minted for every outcome — approved, denied, or failed.

from aira import Aira

aira = Aira(api_key="aira_live_xxx")

# Step 1: Authorize the action before it executes
auth = aira.authorize(
    action_type="loan_decision",
    details="Approved €200,000 loan for customer C-4521. Credit: 742, income: €85K.",
    agent_id="lending-agent",
    model_id="claude-sonnet-4-6",
)

# Policy "High-value loan review" matches (amount > €100K)
# auth.status == "pending_approval"
# auth.policy_evaluation == {
#     "policy_name": "High-value loan review",
#     "decision": "require_approval",
#     "reasoning": "Loan amount €200K exceeds 2x income threshold"
# }
#
# Compliance team receives email:
#   Subject: "Action requires approval — loan_decision"
#   Body: full action context + reasoning
#   [Approve] [Deny]
#
# Officer clicks Approve → action proceeds

# Step 2: Notarize the outcome after execution
receipt = aira.notarize(
    action_uuid=auth.action_uuid,
    outcome="completed",
    outcome_details="Loan disbursed to customer C-4521.",
)

This is EU AI Act Article 14 compliance built directly into the decision pipeline. Human oversight isn't a checkbox on a form — it's a cryptographically proven gate that blocks consequential actions until a qualified human signs off.

Cryptographic Receipts: Proof, Not Logs

Every governed action produces a cryptographic receipt. Not a database row. Not a log entry. A receipt — signed, timestamped, and independently verifiable without Aira's involvement.

Each receipt contains:

  • Ed25519 digital signature — proves the receipt hasn't been tampered with
  • RFC 3161 trusted timestamp — issued by an independent timestamp authority, proves when the action occurred
  • SHA-256 payload hash — fingerprint of the entire action context
  • Full decision lineage — which policies evaluated, which models responded, who approved
# A single governed action produces a chain of receipts:
#
# 1. Policy evaluation receipt
#    → proves "High-value loan review" policy was evaluated
#    → Ed25519 signature + RFC 3161 timestamp
#
# 2. Consensus receipt
#    → proves 3 models evaluated, records each verdict
#    → agreement score: 0.67, disagreement flagged
#
# 3. Human approval receipt
#    → proves compliance@acme.com approved at 2026-04-06T14:32:00Z
#    → secure single-use approval link
#
# 4. Action receipt
#    → proves the final action was governed end-to-end
#    → links to all upstream receipts
#
# Chain: policy_eval → consensus → human_approval → action_governed
# Every receipt independently verifiable. Court-admissible in EU, US, CH.

These receipts are not stored in a database you control. They are cryptographic artifacts. A regulator, auditor, or court can verify them independently at /verify/action/{id} — no authentication, no vendor API call, no trust in your infrastructure required.

EU AI Act Article 14: Human Oversight, Built In

Article 14 of the EU AI Act requires that high-risk AI systems are designed to be "effectively overseen by natural persons." This means:

  • Humans must be able to understand the AI system's capabilities and limitations
  • Humans must be able to monitor operation and intervene in real time
  • Humans must be able to override or reverse automated decisions

Aira implements all three. Every agent is identified via a W3C DID (Decentralized Identifier), so its actions are attributable to a specific, verifiable identity. The policy engine surfaces what the agent is doing and why. The approval flow lets humans intervene before actions execute. The dashboard provides real-time monitoring. And every intervention is cryptographically recorded.

This isn't "we have a human review process" written in a compliance document. It's a technical enforcement layer with cryptographic proof that humans were in the loop — exactly what auditors need.

Two API Calls: authorize() then notarize()

All of this — multi-model consensus, policy evaluation, human approval, cryptographic receipts — is triggered by two function calls: one before the action executes, one after.

from aira import Aira

aira = Aira(api_key="aira_live_xxx")

# Before execution: authorize the action
auth = aira.authorize(
    action_type="loan_decision",
    details="Approved €200,000 loan for customer C-4521. Credit: 742, income: €85K.",
    agent_id="lending-agent",
    model_id="claude-sonnet-4-6",
)

# 1. Policy engine evaluates all matching policies
# 2. If consensus mode → fans out to multiple models
# 3. If approval required → holds action, emails approvers
# 4. auth.status tells you whether to proceed

# ... execute the action ...

# After execution: notarize the outcome
receipt = aira.notarize(
    action_uuid=auth.action_uuid,
    outcome="completed",
    outcome_details="Loan disbursed to customer C-4521.",
)

# Mints Ed25519 + RFC 3161 receipt with full audit trail.
# Governance becomes infrastructure, not overhead.

The developer doesn't implement governance logic. The compliance team doesn't need to understand the codebase. Policies are configured in the dashboard. The API enforces them automatically. When policies change, no code is redeployed.

Full Audit Trail: Complete Decision Lineage

Every decision that passes through Aira is logged with its complete lineage: which agent acted, which models were consulted, which policies matched, what the verdicts were, whether a human approved, and when it all happened — down to the millisecond, with cryptographic proof at every step.

CapabilityLogging toolsPolicy enginesAira
Multi-model consensusNoNoYes — 2-5 models, scored
Policy evaluationNoStatic rules onlyRules + AI + content scan + consensus
Human-in-the-loopNoManual workflowEmail + dashboard approval
Cryptographic proofNo — self-attested logsNo — hash chain at bestEd25519 + RFC 3161
EU AI Act Art. 14Not addressedPartialFull compliance

The audit trail isn't just for regulators. It's for your team. When an agent makes a bad decision, you can trace exactly what happened: which policy should have caught it, which models missed it, whether the approval flow was configured correctly. Governance becomes a debugging tool, not just a compliance cost.

Getting Started

# Install the SDK
pip install aira-sdk

# Govern your agent's actions
from aira import Aira

aira = Aira(api_key="aira_live_xxx")

# Authorize before execution
auth = aira.authorize(
    action_type="loan_decision",
    details="Approved €200,000 loan. Credit: 742, income: €85K.",
    agent_id="lending-agent",
    model_id="claude-sonnet-4-6",
)

# ... execute the action ...

# Notarize after execution
receipt = aira.notarize(
    action_uuid=auth.action_uuid,
    outcome="completed",
    outcome_details="Loan disbursed.",
)

# Configure policies in the dashboard — no code changes needed
# https://app.airaproof.com/dashboard/policies

Two API calls. Multi-model consensus. Policy enforcement. Human approval. Cryptographic proof. Full audit trail. Governance becomes infrastructure — invisible to developers, configurable by compliance teams, and verifiable by anyone.