Autonomous Coding

Lesson 5 of 5

Safety & Guardrails

Estimated time: 7 minutes

Safety & Guardrails

An AI agent that can write, commit, and deploy code is powerful. It's also a system that can do real damage if misconfigured. This lesson is about preventing the bad outcomes — file deletion, secret exposure, runaway deployments, and unexpected costs — before they happen.

Prerequisites

    This Is the Most Important Lesson

    Everything else in this course makes the agent useful. This lesson makes it safe. Skip it and you're one bad prompt away from a production incident. Read every section.

    Defense in Depth

      Layer 1:              Layer 2:              Layer 3:
      Task Restrictions     Sandbox Limits        Pipeline Gates
      ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
      │ File blocks  │     │ No network   │     │ CI must pass │
      │ Branch rules │     │ CPU/mem cap  │     │ Human review │
      │ Size limits  │     │ Timeout      │     │ Staging test │
      │ Pattern deny │     │ No host FS   │     │ Auto-rollback│
      └──────┬───────┘     └──────┬───────┘     └──────┬───────┘
             │                    │                    │
             └────────────────────┴────────────────────┘
                                  │
                         All three layers must
                         be configured. No shortcuts.
    

    File and Path Restrictions

    Define exactly which files the agent can and cannot touch. This is your first line of defense.

    file-guardrails.yaml
    guardrails:
    files:Explicit blocklist — agent CANNOT modify theseblocked_paths:
    ".env*"                    # Environment variables / secrets
    "*.pem"                    # Certificates
    "*.key"                    # Private keys
    "prisma/schema.prisma"     # Database schema
    "src/lib/auth.ts"          # Authentication logic
    "src/middleware.ts"         # Security middleware
    "docker-compose*.yml"      # Infrastructure
    ".github/workflows/*.yml"  # CI/CD pipelines
    "package-lock.json"        # Dependency lock (indirect changes only)
    Explicit allowlist (optional, stricter approach)If set, agent can ONLY modify files matching these patternsallowed_paths:
    "src/components/**"
    "src/app/**"
    "src/lib/**"
    "src/types/**"
    "tests/**"
    Maximum changes per PRmax_files_changed: 15
    max_lines_added: 500
    max_lines_deleted: 200

    Allowlist vs Blocklist

    A blocklist says "touch anything except these." An allowlist says "touch ONLY these." For production systems, use an allowlist — it's safer because unknown files are blocked by default. For development or prototyping, a blocklist is less restrictive.

    Operation Restrictions

    Beyond files, restrict what the agent can do.

    operation-guardrails.yaml
    guardrails:
    operations:Git restrictionsgit:
    allowed_actions: [commit, push, create_branch, create_pr]
    blocked_actions: [force_push, delete_branch, rebase_main]
    branch_pattern: "agent/*"   # All agent branches prefixed
    target_branch: "main"       # PRs always target main
    max_commits_per_pr: 10      # Prevent runaway commit loopsCode executionexecution:
    allowed_commands:
    "npm test"
    "npm run lint"
    "npm run build"
    "npx tsc --noEmit"
    blocked_commands:
    "rm -rf *"
    "curl *"                # No arbitrary downloads
    "wget *"
    "npm publish"           # Never publish packages
    "docker *"              # No container manipulation
    blocked_patterns:
    "process.env"           # Don't read env vars in code
    "eval("                 # No dynamic code execution
    "exec("                 # No shell execution
    "child_process"         # No spawning processes
    Dependency managementdependencies:
    allow_new_packages: false   # Must request approval
    blocked_packages:
    "is-odd"               # No trivial packages
    "event-stream"         # Known compromised packages
    max_new_dependencies: 2    # Per PR
    Blocked operation
    Bot:  Task paused — I need to modify src/middleware.ts to add
    the new API route, but this file is in the blocked list.Options:
    You modify the file manually and I'll continue
    You temporarily unblock it for this task
    I'll find an alternative approach that doesn't
    require changing middleware
    You:  Try option 3Bot:  Found an alternative — I'll add the route handler
    directly in the API route file with its own auth check
    instead of relying on middleware. Continuing...

    Rate Limits and Kill Switches

    Prevent runaway behavior with hard limits and an emergency stop.

    rate-limits.yaml
    guardrails:
    limits:Task limitsmax_concurrent_tasks: 1     # One task at a time
    max_tasks_per_hour: 5       # Prevent spam
    max_tasks_per_day: 20       # Daily ceilingCompute limitssandbox_timeout: 600        # 10 min max per task
    max_retries: 3              # Fix attempts before escalating
    max_api_calls: 100          # LLM API calls per taskCost limitsmax_cost_per_task: 2.00     # USD, based on API usage
    max_cost_per_day: 20.00     # Daily cost ceiling
    alert_at: 15.00             # Alert when daily spend hits $15kill_switch:Emergency stop — halts all agent activity immediatelyenabled: true
    triggers:
    command: "stop agent"           # Chat command
    command: "kill switch"          # Chat command
    webhook: "/api/agent/kill"      # API endpoint
    condition: "error_rate > 10%"   # Auto-trigger
    condition: "cost > $50"         # Cost runaway
    Kill switch in action
    You:  Stop agentBot:  Agent stopped immediately.
    Current task: cancelled (no PR created)
    Sandbox: terminated
    Queue: cleared (2 pending tasks removed)
    The agent will not process any tasks until you
    explicitly restart it with "start agent".Last activity log:
    14:32 — Task started: "Add pagination to users API"
    14:33 — Modified 3 files
    14:34 — Stopped by kill switch
    No changes were pushed to GitHub.

    Test Your Kill Switch

    Run a kill switch test during setup, not during an emergency. Send "stop agent" while a benign task is running and verify it halts cleanly. Like a fire drill — practice before you need it.

    Audit Logging

    Every agent action should be logged. When something goes wrong (or when you want to understand what happened), the audit log is your source of truth.

    audit-config.yaml
    audit:
    enabled: true
    log_level: detailed          # Log every file read, write, command
    retention: 90                # Days to keep logs
    storage: local               # or: s3, cloudwatchlog_events:
    task_received             # What was requested
    files_read                # What the agent looked at
    files_modified            # What changed
    commands_executed          # What ran in sandbox
    api_calls                 # LLM calls with prompts
    pr_created                # PR details
    guardrail_triggered       # Any blocked actions
    error                     # Failures
    alerts:
    channel: "#agent-alerts"
    on:
    guardrail_triggered
    error
    kill_switch_activated
    cost_limit_approached
    Viewing the audit log
    You:  Show me the audit log for todayBot:  Agent Audit Log — March 10, 202614:02  TASK  "Add search bar to header" — started
    14:02  READ  src/components/layout/Header.tsx
    14:02  READ  src/components/ui/input.tsx
    14:03  WRITE src/components/layout/Header.tsx (+42 lines)
    14:03  WRITE src/components/layout/SearchDropdown.tsx (new, 87 lines)
    14:03  WRITE src/components/layout/tests/Header.test.tsx (+34 lines)
    14:04  EXEC  npm test — passed (47/47)
    14:04  EXEC  npm run lint — passed
    14:05  GIT   Branch: agent/add-search-bar
    14:05  GIT   PR #147 created → main
    14:05  TASK  Complete (3m 12s, $0.34 API cost)14:30  TASK  "Update user API pagination" — started
    14:31  GUARD Blocked: attempted to modify .env.local
    14:31  GUARD Agent found alternative approach
    14:33  TASK  Complete (2m 45s, $0.28 API cost)Daily total: 2 tasks, $0.62 spent, 1 guardrail triggered

    Before using the agent on a real production codebase, verify every item:

    • Sandbox isolation confirmed (Docker, no host filesystem access)
    • Network allowlist configured (only GitHub + package registry)
    • File blocklist includes all secrets, config, and infra files
    • Operation blocklist includes destructive commands
    • Branch protection rules require PR reviews for main
    • CI pipeline includes security scanning (Semgrep or equivalent)
    • Kill switch tested and working
    • Cost limits set with alerts
    • Audit logging enabled and writing to persistent storage
    • Human review required before any merge
    • Auto-rollback configured for production deploys
    • Team members know how to trigger the kill switch

    If the agent causes an issue in production:

    1. Trigger kill switch — stop all agent activity
    2. Rollback — revert the production deploy (auto-rollback should handle this)
    3. Review audit log — understand exactly what the agent did
    4. Root cause — was it a bad task description, missing guardrail, or agent error?
    5. Update guardrails — add new restrictions to prevent recurrence
    6. Post-mortem — document what happened and share with the team

    The goal is never "zero incidents" — it's "fast detection, fast recovery, permanent fix."

    Coding Agent Security Guide

    Comprehensive security documentation for autonomous coding agents, including threat models and hardening guides.

    https://docs.openclaw.ai/agents/coding/security

    Knowledge Check

    Which guardrail layer is most critical for preventing a coding agent from causing production damage?

    Narwhalexpert
    0

    Mastering OpenClaw Memory Architecture

    # Mastering OpenClaw Memory Architecture In OpenClaw, memory is not just a database; it’s a process of continuous evolution. This is what we call the **"Molt."** Just as a lobster sheds its shell to ...