Safety & Guardrails

An AI agent that can write, commit, and deploy code is powerful. It's also a system that can do real damage if misconfigured. This lesson is about preventing the bad outcomes — file deletion, secret exposure, runaway deployments, and unexpected costs — before they happen.

Prerequisites

This Is the Most Important Lesson

Everything else in this course makes the agent useful. This lesson makes it safe. Skip it and you're one bad prompt away from a production incident. Read every section.

Defense in Depth

  Layer 1:              Layer 2:              Layer 3:
  Task Restrictions     Sandbox Limits        Pipeline Gates
  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
  │ File blocks  │     │ No network   │     │ CI must pass │
  │ Branch rules │     │ CPU/mem cap  │     │ Human review │
  │ Size limits  │     │ Timeout      │     │ Staging test │
  │ Pattern deny │     │ No host FS   │     │ Auto-rollback│
  └──────┬───────┘     └──────┬───────┘     └──────┬───────┘
         │                    │                    │
         └────────────────────┴────────────────────┘
                              │
                     All three layers must
                     be configured. No shortcuts.

File and Path Restrictions

Define exactly which files the agent can and cannot touch. This is your first line of defense.

file-guardrails.yaml

guardrails:
files:Explicit blocklist — agent CANNOT modify theseblocked_paths:
".env*"                    # Environment variables / secrets
"*.pem"                    # Certificates
"*.key"                    # Private keys
"prisma/schema.prisma"     # Database schema
"src/lib/auth.ts"          # Authentication logic
"src/middleware.ts"         # Security middleware
"docker-compose*.yml"      # Infrastructure
".github/workflows/*.yml"  # CI/CD pipelines
"package-lock.json"        # Dependency lock (indirect changes only)
Explicit allowlist (optional, stricter approach)If set, agent can ONLY modify files matching these patternsallowed_paths:
"src/components/**"
"src/app/**"
"src/lib/**"
"src/types/**"
"tests/**"
Maximum changes per PRmax_files_changed: 15
max_lines_added: 500
max_lines_deleted: 200

Allowlist vs Blocklist

A blocklist says "touch anything except these." An allowlist says "touch ONLY these." For production systems, use an allowlist — it's safer because unknown files are blocked by default. For development or prototyping, a blocklist is less restrictive.

Operation Restrictions

Beyond files, restrict what the agent can do.

operation-guardrails.yaml

guardrails:
operations:Git restrictionsgit:
allowed_actions: [commit, push, create_branch, create_pr]
blocked_actions: [force_push, delete_branch, rebase_main]
branch_pattern: "agent/*"   # All agent branches prefixed
target_branch: "main"       # PRs always target main
max_commits_per_pr: 10      # Prevent runaway commit loopsCode executionexecution:
allowed_commands:
"npm test"
"npm run lint"
"npm run build"
"npx tsc --noEmit"
blocked_commands:
"rm -rf *"
"curl *"                # No arbitrary downloads
"wget *"
"npm publish"           # Never publish packages
"docker *"              # No container manipulation
blocked_patterns:
"process.env"           # Don't read env vars in code
"eval("                 # No dynamic code execution
"exec("                 # No shell execution
"child_process"         # No spawning processes
Dependency managementdependencies:
allow_new_packages: false   # Must request approval
blocked_packages:
"is-odd"               # No trivial packages
"event-stream"         # Known compromised packages
max_new_dependencies: 2    # Per PR

Blocked operation

Bot:  Task paused — I need to modify src/middleware.ts to add
the new API route, but this file is in the blocked list.Options:
You modify the file manually and I'll continue
You temporarily unblock it for this task
I'll find an alternative approach that doesn't
require changing middleware
You:  Try option 3Bot:  Found an alternative — I'll add the route handler
directly in the API route file with its own auth check
instead of relying on middleware. Continuing...

Rate Limits and Kill Switches

Prevent runaway behavior with hard limits and an emergency stop.

rate-limits.yaml

guardrails:
limits:Task limitsmax_concurrent_tasks: 1     # One task at a time
max_tasks_per_hour: 5       # Prevent spam
max_tasks_per_day: 20       # Daily ceilingCompute limitssandbox_timeout: 600        # 10 min max per task
max_retries: 3              # Fix attempts before escalating
max_api_calls: 100          # LLM API calls per taskCost limitsmax_cost_per_task: 2.00     # USD, based on API usage
max_cost_per_day: 20.00     # Daily cost ceiling
alert_at: 15.00             # Alert when daily spend hits $15kill_switch:Emergency stop — halts all agent activity immediatelyenabled: true
triggers:
command: "stop agent"           # Chat command
command: "kill switch"          # Chat command
webhook: "/api/agent/kill"      # API endpoint
condition: "error_rate > 10%"   # Auto-trigger
condition: "cost > $50"         # Cost runaway

Kill switch in action

You:  Stop agentBot:  Agent stopped immediately.
Current task: cancelled (no PR created)
Sandbox: terminated
Queue: cleared (2 pending tasks removed)
The agent will not process any tasks until you
explicitly restart it with "start agent".Last activity log:
14:32 — Task started: "Add pagination to users API"
14:33 — Modified 3 files
14:34 — Stopped by kill switch
No changes were pushed to GitHub.

Test Your Kill Switch

Run a kill switch test during setup, not during an emergency. Send "stop agent" while a benign task is running and verify it halts cleanly. Like a fire drill — practice before you need it.

Audit Logging

Every agent action should be logged. When something goes wrong (or when you want to understand what happened), the audit log is your source of truth.

audit-config.yaml

audit:
enabled: true
log_level: detailed          # Log every file read, write, command
retention: 90                # Days to keep logs
storage: local               # or: s3, cloudwatchlog_events:
task_received             # What was requested
files_read                # What the agent looked at
files_modified            # What changed
commands_executed          # What ran in sandbox
api_calls                 # LLM calls with prompts
pr_created                # PR details
guardrail_triggered       # Any blocked actions
error                     # Failures
alerts:
channel: "#agent-alerts"
on:
guardrail_triggered
error
kill_switch_activated
cost_limit_approached

Viewing the audit log

You:  Show me the audit log for todayBot:  Agent Audit Log — March 10, 202614:02  TASK  "Add search bar to header" — started
14:02  READ  src/components/layout/Header.tsx
14:02  READ  src/components/ui/input.tsx
14:03  WRITE src/components/layout/Header.tsx (+42 lines)
14:03  WRITE src/components/layout/SearchDropdown.tsx (new, 87 lines)
14:03  WRITE src/components/layout/tests/Header.test.tsx (+34 lines)
14:04  EXEC  npm test — passed (47/47)
14:04  EXEC  npm run lint — passed
14:05  GIT   Branch: agent/add-search-bar
14:05  GIT   PR #147 created → main
14:05  TASK  Complete (3m 12s, $0.34 API cost)14:30  TASK  "Update user API pagination" — started
14:31  GUARD Blocked: attempted to modify .env.local
14:31  GUARD Agent found alternative approach
14:33  TASK  Complete (2m 45s, $0.28 API cost)Daily total: 2 tasks, $0.62 spent, 1 guardrail triggered

Coding Agent Security Guide

Comprehensive security documentation for autonomous coding agents, including threat models and hardening guides.

https://docs.openclaw.ai/agents/coding/security

Knowledge Check

Which guardrail layer is most critical for preventing a coding agent from causing production damage?

Safety & Guardrails

Safety & Guardrails

Prerequisites

Defense in Depth

File and Path Restrictions

Operation Restrictions

Rate Limits and Kill Switches

Audit Logging

Session Isolation for Multi-Tenant Bot Deployments

Mastering OpenClaw Memory Architecture