
An enterprise AI agent is only as safe as the tools it can reach.
A model that writes a summary is one risk class. A model that can query customer records, open tickets, update CRM fields, trigger payments, modify IAM groups, deploy code, or send emails is a different system. At that point, the AI application is no longer “just chat.” It is an execution surface connected to enterprise authority.
The wrong pattern is to give agents a folder of callable functions and hope prompt instructions will keep them disciplined.
The useful pattern is a safe tool registry.
A safe tool registry is the control plane that defines which AI agents may call which tools, under which identity, with which schema, risk tier, policy checks, approval requirements, rate limits, audit fields, and rollback behavior.
This article extends the control-plane model from AI governance architecture, the abuse-case lens from threat modeling enterprise AI agents, and the approval-state design from human-in-the-loop approval patterns. It also complements RAG governance: retrieval controls decide what an agent may know; tool registry controls decide what an agent may do.
Key takeaways
- Tool use is the boundary where enterprise AI moves from answer generation to production action.
- A safe tool registry should store more than function names. It needs identity bindings, permission scopes, input schemas, output schemas, risk tiers, policy checks, approval rules, audit requirements, and owner metadata.
- RBAC is useful for coarse access, but ABAC is usually required for production AI agents because context matters: data class, device posture, tenant, region, workflow state, user assurance, and transaction amount.
- High-risk tools should not be callable directly from model output. They need deterministic policy gates, human approval, or a two-step proposal and execution pattern.
- Audit logs must capture the model context, caller identity, tool contract, policy decision, arguments, result class, and correlation ID. “Agent called tool” is not enough.
- The durable artifact is a tool registry schema and permission matrix that can be reviewed by IT, security, platform engineering, and business owners.
Citation-ready answer
A safe tool registry for enterprise AI agents is the governed catalog of tools an agent is allowed to use. It maps each tool to an owner, identity binding, permission scope, input and output schema, risk tier, policy checks, approval requirement, rate limit, audit fields, and rollback behavior. The registry should be enforced at runtime by a policy gate, not only described in prompts, so agents can propose actions but cannot exceed least privilege, data boundaries, or approval rules.
Why tool registries matter
Agent security is not mainly about whether a model can produce malicious text.
The serious risk starts when text becomes action:
1 | user request |
The tool call is where prompt injection, identity abuse, excessive agency, data exfiltration, and business-process errors become operational incidents. The OWASP Top 10 for LLM Applications treats agentic behavior and excessive capability as a concrete application security concern, not a speculative AI ethics issue: OWASP Top 10 for LLM Applications.
NIST’s AI Risk Management Framework gives the broader operating model: organizations need governance, mapping, measurement, and management of AI risks across the lifecycle, not informal trust in a single model response: NIST AI Risk Management Framework.
For enterprise agents, the registry is where that governance becomes executable.
The core design mistake
Many teams start with a tool list like this:
1 | [ |
That is not a registry. It is a menu.
A safe registry must answer questions that the model should not be trusted to answer by itself:
- Who owns this tool?
- Which agent identities may call it?
- Which human user can delegate authority to the agent?
- Which data classes may the tool access?
- Which arguments are allowed?
- Which workflow state must exist before execution?
- Which calls require approval?
- Which calls are never allowed?
- What must be logged?
- How is the action reversed or compensated?
If these answers live only in prompt text, they are not controls. They are suggestions.
Reference architecture
A practical enterprise AI tool registry has five layers.
1 | 1. Identity layer |
The model should not call enterprise systems directly. It should call a broker or orchestration layer that resolves the registry entry, validates arguments, asks the policy engine for a decision, and only then executes the tool with the correct identity.
What belongs in a tool registry entry
The registry entry should be explicit enough for security review and runtime enforcement.
| Field | Purpose | Example |
|---|---|---|
tool_id | Stable identifier | crm.update_contact_status |
owner | Accountable team or person | Sales Operations |
business_capability | Why the tool exists | Customer lifecycle management |
risk_tier | Execution risk | low, medium, high, restricted |
data_classes | Data it may read or write | customer_pii, commercial_confidential |
allowed_agents | Agent identities that may request it | support_triage_agent |
allowed_roles | Human roles that may delegate it | support_manager |
input_schema | JSON contract for arguments | contact_id, status, reason |
output_schema | Expected result contract | status, change_id, timestamp |
policy_checks | Deterministic gates | tenant match, data class, MFA, amount limit |
approval_rule | Human review requirement | approval required for external customer impact |
rate_limit | Abuse and error containment | 30 calls per hour per agent |
audit_fields | Evidence to store | prompt hash, tool args, decision, result class |
rollback | Compensation plan | restore previous CRM status |
secrets_boundary | Credential handling | tool broker only, no secret exposure to model |
This is the difference between a product feature and an enterprise control.
Permission matrix for AI agents
The registry should produce a matrix that security and business owners can actually review.
| Tool | Agent may propose | Agent may execute | Required identity | Approval | Audit level |
|---|---|---|---|---|---|
| Search internal knowledge base | Yes | Yes | agent service account + user context | No | standard |
| Read CRM record | Yes | Yes, scoped | user-delegated identity | No for same account scope | standard |
| Update CRM status | Yes | Conditional | user-delegated identity | Yes for customer-visible changes | enhanced |
| Create support ticket | Yes | Yes | agent service account | No | standard |
| Send external email | Yes | Conditional | user-delegated identity | Yes above risk threshold | enhanced |
| Run SQL query | Yes | Read-only only | brokered service account | Approval for sensitive datasets | enhanced |
| Issue refund | Yes | Conditional | finance-approved workflow identity | Always above threshold | high-assurance |
| Modify IAM group | Yes | No by default | privileged workflow identity | Always, separation of duties | high-assurance |
| Deploy production change | Yes | No direct execution | CI/CD identity | change approval required | high-assurance |
Notice the distinction between “may propose” and “may execute.”
That distinction is critical. It allows the AI system to be useful without giving it uncontrolled production authority. The agent can draft an action, gather evidence, create a request, or prepare a command. A deterministic policy gate, approval workflow, or existing enterprise system decides whether execution is allowed.
RBAC is not enough
Role-based access control is a useful start.
It can answer:
- Is this user a support manager?
- Is this agent part of the finance assistant group?
- Is this tool limited to IT operations?
But production AI agents need context-sensitive decisions. That is where attribute-based access control becomes important.
ABAC can include:
- data classification
- customer region
- tenant boundary
- record ownership
- device posture
- user authentication strength
- time and location
- workflow state
- transaction amount
- model confidence
- tool risk tier
- active incident state
An enterprise agent should not be allowed to update a customer record simply because the user has a broad role. The policy decision should depend on the specific record, the action, the data class, the workflow state, and the evidence.
Tool risk tiers
Risk tiering prevents teams from treating every function call the same.
| Risk tier | Example tools | Runtime pattern |
|---|---|---|
| Tier 0: read-only public or low-sensitivity | public docs search, status page lookup | direct execution with standard logging |
| Tier 1: internal read | knowledge search, ticket lookup, CRM read within scope | user-context access, data class filtering, standard audit |
| Tier 2: internal write | create ticket, update internal note, draft report | schema validation, ownership check, enhanced audit |
| Tier 3: external or customer-visible action | send email, update customer status, publish response | approval or delayed execution, rollback plan |
| Tier 4: financial, security, legal, or production-impacting | issue refund, change IAM group, deploy code, modify billing | proposal-only by default, human approval, separation of duties |
The goal is not to block useful automation.
The goal is to make the execution path match the blast radius.
Runtime enforcement
The registry is only useful if the runtime enforces it.
A safe execution flow looks like this:
1 | 1. Agent requests a tool call with structured JSON. |
The model should never receive broad credentials. It should never choose its own service account. It should never silently bypass the broker. It should never be able to turn a read tool into a write tool by changing arguments.
This runtime pattern aligns with the secure-by-design guidance in the NCSC collection on secure AI system development, which emphasizes security across design, development, deployment, and operation: Guidelines for secure AI system development.
A minimal tool contract
A practical tool contract can start with this shape:
1 | { |
This contract is intentionally boring.
Boring is good here. It gives platform engineers something to validate, security something to review, and business owners something to sign off.
Failure modes the registry should prevent
| Failure mode | What happens without a registry | Control in the registry |
|---|---|---|
| Prompt injection asks agent to exfiltrate data | Agent calls broad search or export tool | data class filtering, tool scope, deny export |
| Agent calls the wrong system | Similar tool names confuse selection | stable tool IDs, descriptions, allowed workflow state |
| User delegates authority they do not have | Agent executes with overbroad service account | user-context authorization, role and attribute checks |
| Read tool becomes write tool | Arguments trigger side effect | separate read and write tools, schema restrictions |
| Customer-visible action is sent too early | Email or CRM change executes immediately | approval rule, delayed execution, evidence pack |
| Incident response loses evidence | Logs only show final answer | audit fields, trace ID, prompt hash, arguments, result |
| Tool owner changes API behavior | Agent still calls old contract | versioned tool contract, compatibility test |
| Compromised agent loops tool calls | High-volume execution | rate limits, anomaly detection, kill switch |
| Sensitive dataset is queried from wrong region | Data residency violation | region attribute, tenant boundary, policy denial |
MITRE ATLAS is useful here because it frames adversarial AI behavior as concrete tactics and techniques, including prompt injection and abuse of AI-enabled systems: MITRE ATLAS.
The tool registry is not the whole defense. It is the place where the application can make these risks enforceable.
Audit logs for tool use
Audit logging should be designed before production launch, not after the first incident.
At minimum, store:
- request ID and trace ID
- timestamp
- user identity
- agent identity
- model and version
- tool ID and version
- tool owner
- prompt or prompt hash
- retrieved sources or evidence IDs
- structured arguments
- policy checks evaluated
- policy decision
- approval ID if applicable
- execution identity
- result class
- external system change ID
- rollback reference
Do not store only the final natural-language answer. That is the least useful artifact during an incident.
NIST’s Generative AI Profile provides a more specific risk framing for generative AI systems and is a useful reference when deciding what should be measured, governed, and documented: NIST Generative AI Profile.
Operating model
A safe tool registry needs ownership, not just code.
| Responsibility | Owner |
|---|---|
| Registry platform | AI platform team or platform engineering |
| Tool business owner | Business system owner |
| Security policy | CISO team or application security |
| Identity integration | IAM team |
| Approval workflow | Process owner and risk owner |
| Audit retention | Security, legal, compliance, records owner |
| Tool contract tests | Product engineering or platform engineering |
| Incident response | Security operations and system owner |
This is why a center of excellence that only writes AI guidelines is not enough. Tool governance has to live in the software delivery path.
When a team adds a tool, the pull request should include the registry entry, schema tests, risk tier, owner approval, policy mapping, and audit expectations. If those fields are missing, the tool should not ship.
Implementation checklist
Use this checklist before allowing any AI agent to call enterprise tools in production.
Registry design
- Every tool has a stable ID, owner, version, and business purpose.
- Read tools and write tools are separate.
- Every tool has a risk tier.
- Every tool has input and output schemas.
- Every tool lists data classes it may access.
- Every tool has a rollback or compensation note.
Identity and permission
- Agents have explicit identities, not shared generic credentials.
- Tool execution uses scoped credentials through a broker.
- User delegation is represented explicitly.
- RBAC handles coarse access.
- ABAC handles record, tenant, data class, region, workflow state, and assurance.
- Privileged tools require separation of duties.
Runtime controls
- The model cannot call enterprise systems directly.
- The broker validates schemas before execution.
- The policy engine can deny calls regardless of model confidence.
- High-risk calls become pending actions until approved.
- Rate limits and kill switches exist.
- Tool versions can be rolled back.
Audit and response
- Every tool call has a trace ID.
- The audit event stores caller, agent, model, tool, arguments, policy decision, and result.
- Approval evidence is linked to the execution event.
- Sensitive arguments are redacted or tokenized where needed.
- Incident responders can reconstruct the chain from user request to enterprise action.
FAQ
Is a tool registry the same as a plugin list?
No. A plugin list describes what an agent could call. A safe tool registry defines what an agent is allowed to call, under which identity, in which context, with which schema, policy checks, approval rules, and audit obligations.
Should the LLM decide whether a tool call is allowed?
No. The model can recommend a tool and produce structured arguments, but authorization should be deterministic and external to the model. The policy gate should be able to deny a call even when the model is confident.
Do we need both RBAC and ABAC?
Usually yes. RBAC gives understandable role boundaries. ABAC adds the context required for production AI agents: tenant, data class, device posture, workflow state, amount, region, and assurance level.
What tools should be proposal-only?
Any tool that affects money, legal exposure, customer-visible communication, security posture, identity, infrastructure, production deployment, or physical operations should normally start as proposal-only until controls, approvals, tests, and rollback are proven.
How does this relate to human-in-the-loop approval?
The registry decides when approval is required. The approval workflow decides who can approve, what evidence they see, how long the request can wait, what happens on timeout, and how the decision is logged.
What is the smallest useful first version?
Start with stable tool IDs, owners, risk tiers, input schemas, allowed agents, allowed roles, approval flags, and audit fields. Then add ABAC, rate limits, rollback metadata, and automated contract tests as the registry matures.