Lesson 5 of 5
Safety & Guardrails
Estimated time: 7 minutes
Safety & Guardrails
An AI agent that can write, commit, and deploy code is powerful. It's also a system that can do real damage if misconfigured. This lesson is about preventing the bad outcomes — file deletion, secret exposure, runaway deployments, and unexpected costs — before they happen.
Prerequisites
This Is the Most Important Lesson
Everything else in this course makes the agent useful. This lesson makes it safe. Skip it and you're one bad prompt away from a production incident. Read every section.
Defense in Depth
Layer 1: Layer 2: Layer 3:
Task Restrictions Sandbox Limits Pipeline Gates
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ File blocks │ │ No network │ │ CI must pass │
│ Branch rules │ │ CPU/mem cap │ │ Human review │
│ Size limits │ │ Timeout │ │ Staging test │
│ Pattern deny │ │ No host FS │ │ Auto-rollback│
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
└────────────────────┴────────────────────┘
│
All three layers must
be configured. No shortcuts.
File and Path Restrictions
Define exactly which files the agent can and cannot touch. This is your first line of defense.
guardrails:
files:Explicit blocklist — agent CANNOT modify theseblocked_paths:
".env*" # Environment variables / secrets
"*.pem" # Certificates
"*.key" # Private keys
"prisma/schema.prisma" # Database schema
"src/lib/auth.ts" # Authentication logic
"src/middleware.ts" # Security middleware
"docker-compose*.yml" # Infrastructure
".github/workflows/*.yml" # CI/CD pipelines
"package-lock.json" # Dependency lock (indirect changes only)
Explicit allowlist (optional, stricter approach)If set, agent can ONLY modify files matching these patternsallowed_paths:
"src/components/**"
"src/app/**"
"src/lib/**"
"src/types/**"
"tests/**"
Maximum changes per PRmax_files_changed: 15
max_lines_added: 500
max_lines_deleted: 200Allowlist vs Blocklist
A blocklist says "touch anything except these." An allowlist says "touch ONLY these." For production systems, use an allowlist — it's safer because unknown files are blocked by default. For development or prototyping, a blocklist is less restrictive.
Operation Restrictions
Beyond files, restrict what the agent can do.
guardrails:
operations:Git restrictionsgit:
allowed_actions: [commit, push, create_branch, create_pr]
blocked_actions: [force_push, delete_branch, rebase_main]
branch_pattern: "agent/*" # All agent branches prefixed
target_branch: "main" # PRs always target main
max_commits_per_pr: 10 # Prevent runaway commit loopsCode executionexecution:
allowed_commands:
"npm test"
"npm run lint"
"npm run build"
"npx tsc --noEmit"
blocked_commands:
"rm -rf *"
"curl *" # No arbitrary downloads
"wget *"
"npm publish" # Never publish packages
"docker *" # No container manipulation
blocked_patterns:
"process.env" # Don't read env vars in code
"eval(" # No dynamic code execution
"exec(" # No shell execution
"child_process" # No spawning processes
Dependency managementdependencies:
allow_new_packages: false # Must request approval
blocked_packages:
"is-odd" # No trivial packages
"event-stream" # Known compromised packages
max_new_dependencies: 2 # Per PRBot: Task paused — I need to modify src/middleware.ts to add
the new API route, but this file is in the blocked list.Options:
You modify the file manually and I'll continue
You temporarily unblock it for this task
I'll find an alternative approach that doesn't
require changing middleware
You: Try option 3Bot: Found an alternative — I'll add the route handler
directly in the API route file with its own auth check
instead of relying on middleware. Continuing...Rate Limits and Kill Switches
Prevent runaway behavior with hard limits and an emergency stop.
guardrails:
limits:Task limitsmax_concurrent_tasks: 1 # One task at a time
max_tasks_per_hour: 5 # Prevent spam
max_tasks_per_day: 20 # Daily ceilingCompute limitssandbox_timeout: 600 # 10 min max per task
max_retries: 3 # Fix attempts before escalating
max_api_calls: 100 # LLM API calls per taskCost limitsmax_cost_per_task: 2.00 # USD, based on API usage
max_cost_per_day: 20.00 # Daily cost ceiling
alert_at: 15.00 # Alert when daily spend hits $15kill_switch:Emergency stop — halts all agent activity immediatelyenabled: true
triggers:
command: "stop agent" # Chat command
command: "kill switch" # Chat command
webhook: "/api/agent/kill" # API endpoint
condition: "error_rate > 10%" # Auto-trigger
condition: "cost > $50" # Cost runawayYou: Stop agentBot: Agent stopped immediately.
Current task: cancelled (no PR created)
Sandbox: terminated
Queue: cleared (2 pending tasks removed)
The agent will not process any tasks until you
explicitly restart it with "start agent".Last activity log:
14:32 — Task started: "Add pagination to users API"
14:33 — Modified 3 files
14:34 — Stopped by kill switch
No changes were pushed to GitHub.Test Your Kill Switch
Run a kill switch test during setup, not during an emergency. Send "stop agent" while a benign task is running and verify it halts cleanly. Like a fire drill — practice before you need it.
Audit Logging
Every agent action should be logged. When something goes wrong (or when you want to understand what happened), the audit log is your source of truth.
audit:
enabled: true
log_level: detailed # Log every file read, write, command
retention: 90 # Days to keep logs
storage: local # or: s3, cloudwatchlog_events:
task_received # What was requested
files_read # What the agent looked at
files_modified # What changed
commands_executed # What ran in sandbox
api_calls # LLM calls with prompts
pr_created # PR details
guardrail_triggered # Any blocked actions
error # Failures
alerts:
channel: "#agent-alerts"
on:
guardrail_triggered
error
kill_switch_activated
cost_limit_approachedYou: Show me the audit log for todayBot: Agent Audit Log — March 10, 202614:02 TASK "Add search bar to header" — started
14:02 READ src/components/layout/Header.tsx
14:02 READ src/components/ui/input.tsx
14:03 WRITE src/components/layout/Header.tsx (+42 lines)
14:03 WRITE src/components/layout/SearchDropdown.tsx (new, 87 lines)
14:03 WRITE src/components/layout/tests/Header.test.tsx (+34 lines)
14:04 EXEC npm test — passed (47/47)
14:04 EXEC npm run lint — passed
14:05 GIT Branch: agent/add-search-bar
14:05 GIT PR #147 created → main
14:05 TASK Complete (3m 12s, $0.34 API cost)14:30 TASK "Update user API pagination" — started
14:31 GUARD Blocked: attempted to modify .env.local
14:31 GUARD Agent found alternative approach
14:33 TASK Complete (2m 45s, $0.28 API cost)Daily total: 2 tasks, $0.62 spent, 1 guardrail triggeredBefore using the agent on a real production codebase, verify every item:
- Sandbox isolation confirmed (Docker, no host filesystem access)
- Network allowlist configured (only GitHub + package registry)
- File blocklist includes all secrets, config, and infra files
- Operation blocklist includes destructive commands
- Branch protection rules require PR reviews for main
- CI pipeline includes security scanning (Semgrep or equivalent)
- Kill switch tested and working
- Cost limits set with alerts
- Audit logging enabled and writing to persistent storage
- Human review required before any merge
- Auto-rollback configured for production deploys
- Team members know how to trigger the kill switch
If the agent causes an issue in production:
- Trigger kill switch — stop all agent activity
- Rollback — revert the production deploy (auto-rollback should handle this)
- Review audit log — understand exactly what the agent did
- Root cause — was it a bad task description, missing guardrail, or agent error?
- Update guardrails — add new restrictions to prevent recurrence
- Post-mortem — document what happened and share with the team
The goal is never "zero incidents" — it's "fast detection, fast recovery, permanent fix."
Coding Agent Security Guide
Comprehensive security documentation for autonomous coding agents, including threat models and hardening guides.
https://docs.openclaw.ai/agents/coding/security
Which guardrail layer is most critical for preventing a coding agent from causing production damage?
Mastering OpenClaw Memory Architecture
# Mastering OpenClaw Memory Architecture In OpenClaw, memory is not just a database; it’s a process of continuous evolution. This is what we call the **"Molt."** Just as a lobster sheds its shell to ...