AI Red Teaming: Testen Ihrer AI-Agent-Verteidigung
Sie können nicht verteidigen, was Sie nicht angegriffen haben. AI Red Teaming testet systematisch jede Schicht Ihres Agent-Stacks — von Prompt-Grenzen bis zu Container-Escape-Vektoren — damit Sie Schwachstellen finden, bevor Angreifer es tun. Dieses Playbook liefert die vollständige Test-Methodik mit 25 spezifischen Testfällen über 5 Kategorien.
Was ist AI Red Teaming? Einfach erklärt
Stell dir AI Red Teaming wie einen Penetrationstest vor, aber speziell für KI-Systeme. Anstatt Netzwerke oder Server anzugreifen, versuchen wir, die KI dazu zu bringen, Dinge zu tun, die sie nicht tun sollte — wie gefährliche Anweisungen auszuführen oder ihre eigenen Sicherheitsregeln zu umgehen. Das Ziel: Schwachstellen finden, bevor echte Angreifer sie finden.
↓ Springe zu Test-Kategorien, CI/CD Integration und Severity-Klassifikation
Test-Kategorien & Fälle
- ▸Direct system prompt override
- ▸Indirect injection via document
- ▸Nested injection in tool output
- ▸Role-playing jailbreak
- ▸Encoded instruction injection (base64, unicode)
- ▸Request for dangerous content (should refuse)
- ▸Privilege escalation attempt
- ▸Out-of-scope task request
- ▸Social engineering the agent
- ▸Persistence/memory manipulation
- ▸Prompt to output full system prompt
- ▸Extract other users' data via RAG
- ▸Leak environment variables or secrets
- ▸Output training data verbatim
- ▸API key extraction via crafted query
- ▸Infinite recursion prompt
- ▸Memory exhaustion via long context
- ▸Token flooding to exceed rate limit
- ▸Slow tool call bomb
- ▸Embedding space flooding in RAG
- ▸Model checksum verification
- ▸Dependency vulnerability scan
- ▸Backdoor trigger phrase test
- ▸Model behavior consistency across versions
- ▸Serialization attack on model artifacts
CI/CD Integration: Automatisches Security Gate
# GitHub Actions — AI security gate
name: AI Agent Security Tests
on: [push, pull_request]
jobs:
ai-red-team:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Verify model checksums
run: sha256sum -c models/checksums.txt
- name: Run behavioral test suite
run: python tests/behavioral_suite.py --agent moltbot
env:
AGENT_ENDPOINT: http://localhost:8080
- name: Prompt injection scan
run: python tests/injection_tests.py --category RT01 RT02 RT03
- name: Assert zero critical findings
run: python tests/assert_results.py --max-critical 0
# Block deployment if any critical finding
- name: Gate deployment
if: failure()
run: echo "SECURITY GATE FAILED — deployment blocked" && exit 1Schweregrad-Klassifikation von Befunden
CRITICAL — Deployment blockieren
- • System prompt fully overrideable
- • Agent can exfiltrate secrets/credentials
- • Unrestricted command execution
- • Cross-tenant data access
HIGH — Innerhalb 7 Tagen fixen
- • Partial injection (limited override)
- • Rate limit bypassable
- • Excessive agency without confirmation
- • Audit log gaps
MEDIUM — Innerhalb 30 Tagen fixen
- • Inconsistent refusal behavior
- • Verbose error messages
- • Suboptimal sandboxing
LOW — Verfolgen & Verbessern
- • Hallucination without guardrail
- • Missing structured output validation
- • Log verbosity issues
Häufige Fragen
What is AI red teaming?
AI red teaming is the practice of adversarially testing AI systems to discover security vulnerabilities before attackers do. For LLM-based agents, it includes: prompt injection testing, jailbreak attempts, data exfiltration probes, behavioral boundary testing, and infrastructure security testing. The goal is to find weaknesses in both the model's behavior and the surrounding system.
How often should I red team my AI agents?
Minimum: before every major model update or agent capability change. Best practice: run automated adversarial test suites in CI/CD on every build. Quarterly: comprehensive manual red team exercise including novel attack vectors. After any security incident: immediate re-test of affected attack surface.
What is a behavioral test suite for AI agents?
A behavioral test suite is a set of deterministic tests that verify an AI agent behaves correctly and securely. It includes: refusal tests (agent must decline dangerous requests), boundary tests (agent stays within declared scope), consistency tests (same input produces safe output across model versions), and canary tests (known injection patterns must be blocked). Run in CI/CD before every deployment.
Can I automate AI red teaming?
Yes, partially. Automated tests cover: known injection patterns, refusal boundary testing, output length/format validation, rate limit enforcement, model checksum verification. Human red teamers are still required for: novel attack vectors, social engineering scenarios, and creative jailbreak development. Use Moltbot to orchestrate automated tests and track results over time.