AI Tool Use Security — Dein AI-Agent hat ungesicherte Tools. Shell-Befehle, HTTP-Requests, File-Write. Prompt Injection → RCE, SSRF, Data Exfiltration. Dein CEO hat den CISO gefeuert.
Dein AI-Agent hat keine Tool-Security, keine Scope-Restriktion und kein HITL. Shell-Befehle ohne Sandbox, HTTP ohne Allowlist, File-Write ohne Confirmation. 48h Incident-Response, Daten-Exfiltration, dein CEO hat den CISO gefeuert. Hier ist, wie du das verhinderst.
Was ist Tool Use Security? Einfach erklärt.
Stell dir Tool Use Security wie die Sicherheit von Werkzeugen vor: Wenn ein LLM Tools aufrufen kann — Shell-Befehle, HTTP-Requests, Datenbank-Abfragen — explodiert die Angriffsfläche. Prompt Injection kann durch ungesicherte Tools zum Host, internen Netzwerk oder sensiblen Daten pivotieren. Gute Tool Use Security bedeutet: Least Tool Principle, Sandbox, HITL.
↓ Springe direkt zur technischen Tiefe7 Tool Risk Kategorien
| Tool | Risiko | Angriffsvektor | Verteidigung |
|---|---|---|---|
| Shell / Code Execution | CRITICAL | Prompt injection → arbitrary command execution on host | Run in --read-only container with --cap-drop=ALL. Allowlist permitted commands. 30s hard timeout. Never run as root. |
| HTTP / Web Requests | HIGH | SSRF → internal network access, metadata endpoint, cloud credentials | Allowlist permitted domains/IPs. Block RFC-1918 ranges and link-local (169.254.x.x). Validate URLs before fetch. Log all requests. |
| File System Read | HIGH | Path traversal → read /etc/passwd, ~/.ssh/id_rsa, .env files | Restrict to declared workspace directory. Validate resolved path against workspace root. Block symlink traversal. |
| File System Write | CRITICAL | Overwrite config files, inject malicious code, modify agent behavior | Require human confirmation for all writes. Scope to temp directory only. Audit all write operations. |
| Database Queries | HIGH | SQL injection via LLM-generated queries, data exfiltration | Use parameterized queries only — never string-interpolated SQL. Read-only credentials for read operations. Scope to minimal required tables. |
| Email / Notifications | HIGH | Data exfiltration via email, spam/phishing via LLM-drafted content | Require human approval for all external sends. Allowlist recipients. Content review before send. Rate limit: max 10 emails/hour. |
| Calendar / Scheduling | MEDIUM | Unwanted calendar events, social engineering via agent-created meetings | Human-in-the-loop for all external calendar invites. Scope to own calendar only by default. |
Principle of Least Tool
Starte mit null Tools. Füge nur das zurück, was die spezifische Aufgabe benötigt. Ein Summarization-Agent benötigt gar keine Tools. Ein Research-Agent benötigt nur HTTP Read. Ein Coding-Agent benötigt nur File Read + Write in einem scoped Temp-Verzeichnis.
# BAD: register all tools "just in case"
agent = Agent(tools=[ShellTool(), FileTool(), HTTPTool(),
EmailTool(), DBTool(), CalendarTool()])
# GOOD: minimum required for the specific task
summarizer = Agent(tools=[]) # No tools needed
researcher = Agent(tools=[HTTPTool(allowlist=["arxiv.org", "pubmed.ncbi.nlm.nih.gov"])])
coder = Agent(tools=[
FileTool(workspace="/tmp/agent-sandbox", mode="rw"),
# Shell removed — use isolated subprocess instead
])Real-World Scars: Production Incidents
Shell-Tool ohne Sandbox. Prompt Injection → RCE auf Host, Daten-Exfiltration. Fix: Container mit --cap-drop=ALL, Allowlist, Timeout.
HTTP-Tool ohne Allowlist. SSRF → internes Netzwerk, Metadata-Endpoint, Cloud-Credentials. Fix: Domain-Allowlist, RFC-1918 Block.
Sofortmaßnahmen: Was heute tun?
Tool-Audit durchführen
Liste alle Tools, klassifiziere nach Risiko, entferne unnötige Tools.
Sandbox für gefährliche Tools
Isoliere Shell/Code-Tools in Container mit --cap-drop=ALL.
HITL für CRITICAL Tools
Human-in-the-Loop für Shell, File-Write, Email-Tools.
Interaktive Tool Use Checkliste
Tool Use Maturity Score Calculator
Industrie-Durchschnitt: 16/100
Häufige Fragen
What is the biggest security risk of LLM function calling?
Unscoped tool access combined with prompt injection. An LLM with access to a shell tool and no sandboxing can be prompted to execute arbitrary commands. The fix: every tool must have a declared scope, run in an isolated container, and dangerous tools (shell, file write, HTTP) require human confirmation or are restricted to an allowlist.
How do I implement human-in-the-loop for AI tool use?
For high-risk tools: before execution, present the proposed tool call (tool name + parameters) to a human operator via a review interface. Only execute after explicit approval. Log: approver identity, approval timestamp, original LLM reasoning. Implement a timeout — if no approval within X minutes, cancel the action.
Can I trust tool outputs fed back to the LLM?
Never unconditionally. Tool outputs can contain adversarial content (e.g., a web page with injected instructions). Sanitize all tool outputs before feeding back to the LLM: strip HTML, extract structured data only, apply the same injection detection as user inputs. Treat tool output as untrusted data, not as trusted system context.
How do I prevent SSRF via AI HTTP tools?
1) Allowlist permitted domains — reject everything else. 2) Resolve the URL and check the IP is not RFC-1918 (10.x, 172.16.x, 192.168.x) or link-local (169.254.x.x). 3) Follow redirects but re-validate each redirect target. 4) Block metadata endpoints: 169.254.169.254 (AWS), metadata.google.internal. 5) Log all HTTP tool calls with URL, response code, response size.