
When a BI dashboard queries your warehouse, it reads data and renders a chart. When an AI agent queries your warehouse, it reads data, reasons over it, calls tools, and takes action. That difference, the jump from "data access" to "data action," is why traditional data governance frameworks fall short for agentic AI.
Most organizations already govern who can see which rows and columns. Fewer have thought through what happens when an autonomous agent, operating on delegated authority, decides to join two datasets, retrieve context from a knowledge graph, invoke an API, and email the result to a customer. The governance surface area expands from a single query to a chain of decisions, each with its own access, interpretation, and side-effect risks. This guide provides a practical, implementation-oriented framework for governing that chain end to end.
Who this is for (and what it is not)
This guide is for data leaders, platform engineers, analytics engineers, and security/compliance teams building or evaluating infrastructure for enterprise AI agents. The scope is narrow on purpose: governing agent data access, tool use, context assembly, and auditability. General AI ethics, model alignment, and responsible AI policy design are adjacent topics with their own literature. If your question is "how do I stop an agent from querying data it shouldn't see, misinterpreting a metric, or taking an action without approval," keep reading.
The core problem: agents turn "data access" into "data action"
A traditional analytics workflow has a human in the loop at every decision point. The analyst writes a query, interprets the result, and decides what to do with it. An AI agent compresses that entire cycle into seconds, often chaining multiple steps autonomously.
Tool use is the key escalation. An agent with access to a SQL connector, an email API, and a CRM write endpoint can query revenue data, draft a message, and update a customer record in a single turn. If any link in that chain has overly broad permissions, stale business logic, or ungoverned context, the blast radius is larger and faster than anything a dashboard could produce.
The OWASP Top 10 for Large Language Model Applications catalogs risks like prompt injection, sensitive information disclosure, and insecure output handling. Each of these risks becomes more dangerous when the LLM has tools at its disposal. Governance must account for that.
A practical governance model (mapped to NIST AI RMF)
The NIST AI Risk Management Framework Playbook organizes risk management into four functions: Govern, Map, Measure, and Manage. The playbook provides suggested actions aligned to subcategories within each function and is designed to be tailored, not followed as a rigid checklist. It offers a useful skeleton for mapping agent-specific data governance controls to a recognized standard.
The table below translates each NIST function into concrete controls for AI agent governance.
NIST AI RMF Function | Agent Governance Focus | Example Controls |
|---|---|---|
Govern | Ownership, policy, and organizational accountability | Define agent data owners; codify policies for data access, tool use, and approval gates; establish change management for agent configs |
Map | Risk identification and threat modeling for agent workflows | Enumerate data sources agents can reach; map tool permissions; identify sensitive joins and high-risk actions; threat-model against OWASP LLM Top 10 |
Measure | Evaluation, testing, and monitoring of governance controls | Red-team prompt injection tests; measure policy bypass rates; track metric consistency across agents; audit log completeness checks |
Manage | Incident response, continuous improvement, and drift remediation | Alert on anomalous access patterns; revoke credentials on policy violation; retrain or reconfigure agents when semantic drift is detected |
The framework is voluntary and broadly scoped. The value here is using it as a shared vocabulary with risk and compliance teams who may already be adopting NIST AI RMF for broader AI governance efforts.
Reference architecture: where governance actually gets enforced
Governance that lives only in a system prompt is governance that can be bypassed with a well-crafted injection. Effective agent governance requires enforcement at multiple layers, each with its own controls and audit surface. The five layers below form a reference architecture spanning identity, runtime, data plane, semantic layer, and retrieval context.
Layer 1: Identity for agents (humans, service accounts, and delegated authority)
Every agent action should be traceable to an identity. In practice, that means agents run under dedicated service accounts (not shared credentials) with roles that map to the minimum required permissions. When an agent acts on behalf of a user, the delegation chain must be explicit: "Agent X is executing with User Y's permissions, scoped to Project Z."
Separation of duties matters here. An agent that can both query sensitive data and write to an external API should have those capabilities governed by distinct roles, so that a compromise of the write credential does not automatically grant read access to PII. Platforms like Databricks Unity Catalog enforce a three-level namespace and centralized access control that can be mapped to agent service accounts.
Layer 2: Tool governance (allowlists, contracts, and sandboxing)
Every tool an agent can invoke should have a narrow contract: what parameters it accepts, what resources it can reach, and what side effects it may produce. Tool allowlists define which tools are available in a given agent configuration; anything not on the list is inaccessible. Parameter constraints prevent an agent from, say, passing arbitrary SQL to a query tool when it should only be calling a parameterized endpoint.
The LangChain security policy codifies this well: scope permissions to the application's need, assume any access may be used in any way the permissions allow, and combine layered defenses rather than relying on a single technique. LangChain's sandbox documentation goes further, describing isolated execution environments where agents can run code without accessing host credentials, filesystems, or the network. These patterns apply regardless of which agent framework you use.
Layer 3: Data-plane enforcement (RLS/CLS, masking, and purpose-based access)
Row-level security (RLS) and column-level security (CLS) must be enforced where data is stored and queried, not in agent prompts. Snowflake's row access policies are a concrete example: policies determine which rows are returned based on the querying role, using mapping tables for complex access rules. When an agent's service account runs a query, the warehouse enforces the same RLS/CLS policies that would apply to a human analyst with equivalent permissions.
Dynamic masking adds another layer, redacting or tokenizing sensitive columns (SSNs, emails, salary bands) at query time based on role. Purpose-based access goes one step further by requiring the query to declare a purpose (e.g., "customer support resolution") that is validated against the data's allowed-use metadata. Prompts can be overridden; warehouse policies cannot.
Layer 4: Semantic enforcement (ontology, metrics, and "meaning constraints")
Access control answers "can this agent see the data?" Semantic enforcement answers "will this agent interpret the data correctly?" Without a governed semantic layer, two agents querying the same warehouse can return different answers for "Q3 revenue" because they apply different filters, date ranges, or currency conversions.
Ontology-based governance centralizes business definitions (metrics, entities, relationships, and their valid computation logic) so that every agent assembles context from the same source of truth. Galaxy, as an ontology-driven knowledge graph, serves this role: it defines metrics, entities, and relationships in a shared, versioned layer that agents reference when building queries or interpreting results. Semantic enforcement through Galaxy complements data-plane controls like Unity Catalog by governing meaning and consistency, not just access and metadata.
Layer 5: Context governance for RAG and GraphRAG
Retrieval-augmented generation (RAG) and GraphRAG introduce a distinct governance surface: the documents, embeddings, and graph structures that agents retrieve at inference time. If a vector store contains documents with mixed sensitivity levels and retrieval is based purely on semantic similarity, an agent may surface a confidential board memo when answering a routine sales question.
Governed RAG requires retrieval-time access control, where the agent's identity and permissions are checked against each retrieved chunk before it enters the context window. GraphRAG adds another dimension: graph traversal can expose relationships between entities even when individual nodes are access-controlled. If an agent can traverse a graph from a public product entity to a confidential acquisition target via a shared supplier node, the governance boundary has been breached at the relationship level. Provenance metadata, recording which source documents or records contributed to the retrieved context, is necessary for audit and for debugging incorrect answers.
OWASP LLM risks mapped to governance controls
The following table connects the most agent-relevant OWASP LLM Top 10 risks to specific governance controls and enforcement points.
OWASP LLM Risk | Governance Control | Enforcement Point |
|---|---|---|
Prompt injection | Tool allowlists, least-privilege credentials, input validation | Agent runtime (Layer 2) |
Sensitive information disclosure | Data classification, RLS/CLS, dynamic masking, retrieval-time access control | Data plane (Layer 3), RAG layer (Layer 5) |
Insecure output handling | Output validation, write-action approval gates, no raw SQL execution from model output | Agent runtime (Layer 2), approval workflow |
Supply chain vulnerabilities | Connector/plugin governance, MCP server allowlists, dependency pinning | Tool governance (Layer 2) |
Data poisoning (RAG corpus) | Provenance tracking, controlled ingestion pipelines, source validation | RAG/GraphRAG layer (Layer 5) |
Policies that actually work for agents (templates)
Policy-as-code is the goal; policy-as-document is the starting point. Below are three templates that can be translated into enforceable configurations across your agent stack.
Policy template: "What data can be accessed"
This template specifies allowed datasets, sensitivity classifications, row and column restrictions, approved join paths, purpose binding, and a time-to-live for cached access grants.
Policy template: "What tools can be used"
Each tool has parameter constraints. Network egress is denied by default. The agent executes within a sandbox, and credentials are scoped to read-only access on the sales schema.
Policy template: "What actions require approval"
Write operations, large exports, and external communications require human approval. If approval times out, the default is denial, and the agent owner and security team are notified.
Lineage and auditability: the minimum viable trace
An agent decision that cannot be reconstructed is an agent decision that cannot be audited, debugged, or defended in a compliance review. The minimum viable trace spans three layers.
What to log at the model/provider layer
Provider audit logs capture organizational and platform-level events. The OpenAI Admin and Audit Logs API provides an immutable log of events including API key lifecycle, service account activity, login/logout failures, organization configuration changes, and project lifecycle events. These logs help security teams identify unauthorized access and meet compliance requirements at the provider boundary.
What to log at the agent runtime layer
This is where the reasoning chain lives. Log the full prompt (or a redacted version if prompts contain sensitive data), tool schemas offered to the model, tool call arguments and return values, retrieval sets (which chunks or graph nodes were fetched), redaction or masking decisions applied to retrieved content, and any approval gate triggers. Every log entry should carry a trace ID that links it to the originating user request and the agent's identity.
What to log at the data layer
The data platform should log the SQL queries executed, the policy evaluation outcomes (which RLS/CLS rules fired), any denied access events, and query latency. Unity Catalog, for example, automatically captures user-level audit logs and lineage data tracking how assets are created and used. Cross-referencing data-layer logs with runtime-layer logs via a shared trace ID produces the end-to-end lineage needed for post-hoc analysis.
Minimum viable audit trace checklist:
Provider layer: org/project/key lifecycle events logged and retained
Runtime layer: prompts, tool calls, retrieval sets, and redaction decisions logged with trace IDs
Data layer: query history, policy evaluations, and denied access events logged
Trace IDs link all three layers for a single agent action
Logs are immutable and retained per your compliance policy (30, 90, 365 days)
Sensitive content in logs is redacted or tokenized
Testing and monitoring: prove governance works under attack
Tests to run before production
Before deploying a governed agent, run adversarial scenarios aligned to the OWASP LLM Top 10. At minimum, every agent should pass the following:
Prompt injection resistance: Attempt to override system instructions via user input to invoke denied tools or access restricted data. The agent should refuse or fall back to safe defaults.
Over-broad retrieval: Craft queries designed to pull documents or graph nodes outside the agent's access scope. Retrieval-time access control should filter these before they enter the context window.
Unsafe tool execution: Attempt to pass destructive or exfiltration-oriented parameters (DROP TABLE, network egress to external URLs) through tool calls. Tool contracts and sandboxes should block these.
Semantic inconsistency: Ask the same business question through different phrasings and verify the agent returns consistent metric definitions. Semantic governance via Galaxy's ontology should prevent metric drift.
Approval gate bypass: Attempt to trigger a write or export action without hitting the human-in-the-loop gate. The gate should fire regardless of how the action was requested.
Monitors and alerts to keep on
Post-deployment, maintain continuous signals for:
Policy bypass attempts: Any denied-access event at the data plane or tool layer should generate an alert. Repeated attempts from the same agent identity warrant investigation.
Semantic drift: Track whether agent-generated metric values diverge from baseline values computed by the governed semantic layer. Galaxy can serve as the reference computation for comparison.
Anomalous access patterns: Volume spikes, unusual query times, or access to datasets not previously queried by an agent role. Set thresholds based on historical baselines.
Retrieval scope creep: Monitor the provenance metadata of retrieved context chunks. If an agent begins retrieving from new, ungoverned sources, flag it for review.
How this looks in Galaxy (example implementation)
Galaxy functions as an ontology-driven knowledge graph and shared context layer. The following walkthrough illustrates how Galaxy's architecture maps to the governance layers described above, complementing (not replacing) data platform governance like Unity Catalog.
Galaxy pattern: ontology as the policy surface for meaning
In Galaxy, business concepts, including metrics, entities, relationships, and their valid computation logic, are defined in a versioned ontology. When an agent needs to answer "What is Q3 revenue?", it does not freestyle a SQL query with its own interpretation of fiscal quarters or revenue recognition rules. Instead, it references Galaxy's ontology, which specifies the exact definition: which tables, filters, currency conversions, and date boundaries apply.
The ontology acts as a policy surface for meaning. If a new metric version changes the revenue recognition logic, the ontology is updated, and all agents referencing it pick up the change. Agents that attempt to compute revenue using a non-sanctioned definition can be flagged by comparing their output lineage against the ontology's canonical computation path.
Galaxy pattern: governed context packaging for agents
When Galaxy assembles context for an agent, it packages three things together: the data or metric definitions the agent needs, the permissions metadata confirming the agent's identity has access, and the provenance metadata recording where each piece of context originated. This "governed context package" means the agent receives only what it is authorized to see, with semantic consistency enforced at assembly time.
Consider a concrete scenario. A sales forecasting agent asks Galaxy for pipeline data. Galaxy checks the agent's role against the data access policy (Layer 3), confirms the agent is allowed to see the pipeline_snapshot table with a region filter applied, retrieves the metric definition for "weighted pipeline value" from the ontology (Layer 4), and returns the packaged context with provenance links to the source table, the metric version, and the RLS policy that was evaluated. The agent can then reason over this context without risk of accessing restricted rows or misinterpreting the metric.
Galaxy pattern: audit-ready agent workflows
The output of an agent workflow in Galaxy carries traceable links back to the sources, policies, and transformations that produced it. If a stakeholder questions an agent's forecast number, the audit trail shows: (1) which ontology version defined the metric, (2) which data access policy was evaluated and which rows were included or excluded, (3) which tool calls were made and with what parameters, and (4) which retrieval context was assembled and from which sources.
This audit chain is what transforms an agent from a black box into a system that can satisfy compliance reviews. The lineage data produced by Galaxy integrates with data platform lineage (e.g., Unity Catalog's built-in lineage tracking) to provide a full picture from raw data through semantic interpretation to agent action.
Operational checklist (ship it without regrets)
Use this checklist before promoting any agent workflow to production.
Identity: Agent runs under a dedicated service account with least-privilege roles; no shared credentials
Tool governance: Every tool has an allowlist entry, parameter constraints, and sandbox requirement
Data access policy: RLS/CLS policies tested with the agent's service account; denied joins verified
Semantic governance: Agent references a governed ontology (Galaxy or equivalent) for all metric and entity definitions
Context governance: RAG and GraphRAG retrieval enforces access control at retrieval time; provenance metadata attached
Approval gates: Write actions, large exports, and external communications routed through human-in-the-loop approval
Audit trail: Trace IDs link provider, runtime, and data-layer logs; logs are immutable and retained
Adversarial testing: Prompt injection, over-broad retrieval, unsafe tool execution, and semantic inconsistency tests passed
Monitoring: Alerts configured for policy bypass attempts, semantic drift, and anomalous access patterns
Common failure modes (and how to avoid them)
Prompt-only guardrails. Putting access rules in the system prompt and hoping the LLM respects them is the most common and most dangerous mistake. Prompt instructions are suggestions to the model, not enforcement points. Move every access rule to a layer the model cannot override: the data plane, the tool gateway, or the semantic layer.
Shared service accounts. When every agent in the org runs under the same service account, you lose the ability to attribute actions, scope permissions, or revoke access granularly. Each agent workflow needs its own identity, even if it adds operational overhead during setup.
Ungoverned retrieval corpora. Many teams stand up a vector store, load it with internal documents, and connect it to an agent without checking document sensitivity labels, access permissions, or freshness. A RAG corpus is a data source and needs the same classification, access control, and lineage treatment as any warehouse table.
Metric drift across agents. Two agents answering "What is monthly churn?" with different computation logic will erode trust in both. Centralizing metric definitions in an ontology, like Galaxy's knowledge graph, prevents conflicting answers and provides a single audit point when definitions change.
FAQ
Is this AI governance or data governance? Both, or more precisely, data governance for AI agents is where the two overlap. AI governance (as framed by NIST AI RMF) covers model risk, safety, and organizational accountability. Data governance covers access, quality, lineage, and policy. When an AI agent accesses and acts on enterprise data, you need controls from both disciplines. The framework in this guide focuses on the data governance controls that plug into a broader AI risk program.
Where should we start? Start with identity and data-plane enforcement. Assign dedicated service accounts to agents, apply RLS/CLS at the warehouse level, and log everything. These two controls alone prevent the most damaging failure modes (unauthorized access and untraceable actions). Add semantic governance and tool governance as your agent workflows grow in complexity.
Do we need a semantic layer if we already have a data catalog? A data catalog (like Unity Catalog) governs access, metadata, and lineage. A semantic layer (like Galaxy) governs meaning: how metrics are computed, how entities relate, and which definitions are canonical. Agents need both. The catalog tells the agent what it can see; the semantic layer tells it what the data means and how to use it correctly.
How does Galaxy relate to our existing data platform governance? Galaxy is designed to complement, not replace, data platform governance. It integrates with your warehouse's access controls and your catalog's metadata. Galaxy adds the ontology layer that ensures agents interpret data consistently and the governed context packaging that bundles permissions, provenance, and semantic definitions into a single assembly step.
What if our agents use multiple LLM providers? The governance architecture described here is provider-agnostic at the runtime, data, and semantic layers. Provider-specific controls (like the OpenAI Audit Logs API) add visibility at the model layer but should not be your only audit surface. Ensure your runtime and data-layer logging works regardless of which LLM the agent calls.
Interested in learning more about Galaxy?





