Review & Deploy Pipeline

The agent writes code and opens PRs. But between "PR opened" and "shipped to production," there's a critical pipeline: automated checks, human review, staging validation, and deployment. In this lesson, you'll build a pipeline that ensures agent-written code meets the same bar as human-written code.

Prerequisites

The Pipeline

  Agent PR              Automated Checks         Human Review
  ┌──────────┐         ┌──────────────────┐     ┌──────────────┐
  │ PR #142  │────────>│ ✓ Lint           │────>│ Code review  │
  │ 5 files  │         │ ✓ Type check     │     │ by human     │
  │ 3 tests  │         │ ✓ Unit tests     │     │ developer    │
  └──────────┘         │ ✓ Integration    │     └──────┬───────┘
                       │ ✓ Security scan  │            │
                       │ ✓ Build          │            ▼
                       └──────────────────┘     ┌──────────────┐
                                                │ Staging      │
                                                │ deploy +     │
                                                │ smoke test   │
                                                └──────┬───────┘
                                                       │
                                                       ▼
                                                ┌──────────────┐
                                                │ Production   │
                                                │ deploy       │
                                                └──────────────┘

Configure CI Checks for Agent PRs

Agent PRs should go through the same (or stricter) CI pipeline as human PRs. Add an OpenClaw-specific workflow that includes extra validation.

.github/workflows/agent-pr.yml

name: Agent PR Checks
on:
pull_request:
types: [opened, synchronize]jobs:
validate:Only run on agent-created PRsif: contains(github.event.pull_request.labels.*.name, 'openclaw-agent')
runs-on: ubuntu-latest
steps:

uses: actions/checkout@v4


name: Install dependencies
run: npm ci


name: Lint
run: npm run lint


name: Type check
run: npx tsc --noEmit


name: Unit tests
run: npm test -- --coverage


name: Coverage threshold
run: |
Ensure agent didn't reduce coverage
npx nyc check-coverage --lines 80 --branches 75


name: Security audit
run: npm audit --audit-level=moderate


name: Build
run: npm run build


name: Diff size check
run: |
Alert if PR is suspiciously large
LINES=$(git diff --stat origin/main | tail -1 | grep -oP '\d+(?= insertion)')
if [ "$LINES" -gt 500 ]; then
echo "::warning::Large PR ($ insertions). Review carefully."
fi

Label-Based Triggers

The openclaw-agent label is automatically added to agent PRs. This lets you run additional checks (like diff size limits) that you might not need for human PRs.

Set Up Automated Code Review

Before a human looks at the PR, run automated review tools to catch common issues.

Automated review config

review:
auto_reviewers:Static analysis
tool: eslint
block_on: error         # Block merge on lint errors
warn_on: warning        # Comment but don't block on warnings
Security scanning
tool: semgrep
rules: ["p/typescript", "p/jwt", "p/sql-injection"]
block_on: error
Dependency check
tool: socket
block_on: high_risk     # Block on risky new dependencies
Complexity check
tool: complexity
max_cyclomatic: 15      # Flag functions over 15 complexity
block_on: exceeded
Auto-assign human reviewerassign_reviewer:
strategy: codeowners       # Use CODEOWNERS file
fallback: "team-lead"      # If no owner matches
required_approvals: 1      # At least 1 human approval

The automated review catches things before your eyes see the code:

Check	What It Catches	Action
Lint	Style violations, unused vars	Block or auto-fix
Type check	Missing types, wrong types	Block
Semgrep	SQL injection, XSS, hardcoded secrets	Block
Complexity	Functions too long or nested	Flag for review
Diff size	PRs over 500 lines of changes	Warn reviewer
New deps	Unexpected npm packages added	Flag for review

Human Review Best Practices

Automated checks passed. Now you review. Here's what to focus on for agent-written code.

Agent PR review checklist

SECURITY (most important for agent code)
[ ] No hardcoded secrets, tokens, or API keys
[ ] Input validation on all user-facing inputs
[ ] No SQL injection or XSS vectors
[ ] Auth checks on new endpoints
[ ] No elevated permissions or privilege escalationARCHITECTURE
[ ] Changes are in the right location for your project structure
[ ] No unnecessary abstractions or over-engineering
[ ] Consistent with existing patterns (naming, imports, exports)
[ ] No circular dependencies introducedEDGE CASES
[ ] Handles empty/null inputs gracefully
[ ] Error states covered (network failures, invalid data)
[ ] Concurrent access considered where relevantTESTS
[ ] Tests actually test the behavior, not just the implementation
[ ] Edge cases tested (empty input, max length, auth failure)
[ ] No flaky tests (timeouts, race conditions)

The Agent's Blind Spot

AI coding agents are excellent at implementing the happy path but can miss subtle edge cases, race conditions, and security implications. Your review should focus on what the agent is weakest at: things that require understanding the broader system context and adversarial thinking.

Staging Deploy and Production

After approval, deploy to staging for a final sanity check before production.

Deploy pipeline

deployment:
staging:
trigger: on_approval      # Deploy to staging when PR approved
url: "https://staging.yourapp.com"
smoke_tests:
"curl -f https://staging.yourapp.com/health"
"npx playwright test --project=smoke"
hold_time: 30m            # Wait 30 min for manual testing
production:
trigger: on_merge          # Deploy when PR merged to main
strategy: rolling          # Zero-downtime rolling deploy
rollback:
auto: true               # Auto-rollback if error rate > 5%
window: 10m              # Monitor for 10 minutes post-deploy
notifications:
channel: "#deploys"
message: "Deployed PR #: "

Full lifecycle in chat

Bot:  PR #142 — "Add dark mode toggle"
✓ CI passed (lint, types, tests, security)
✓ You approved the PR
✓ Deployed to staging: staging.yourapp.com
✓ Smoke tests passedReady to merge to production. Reply "merge" or
"hold" if you want more staging time.You:  MergeBot:  Merged and deploying to production...
✓ Production deploy complete
✓ Health check passing
✓ Error rate: 0% (monitoring for 10 min)Dark mode is live! 🌙

If something goes wrong after deploy:

Auto-rollback

Bot:  ⚠️ Alert — Error rate spiked to 8% after deploying PR #142
Auto-rollback initiated...
✓ Rolled back to previous version
✓ Error rate returning to normal (0.2%)The dark mode PR caused errors on Safari 16.
I've reopened the PR and added a comment with the
error logs. Want me to attempt a fix?You:  Yes, fix the Safari compatibility issueBot:  Investigating... found the issue. Safari 16 doesn't
support the 'color-mix()' CSS function. Replacing
with fallback values. New commit pushed to PR #142.

Knowledge Check

Why should agent-created PRs go through stricter automated checks than human PRs?

Review & Deploy Pipeline

Review & Deploy Pipeline

Prerequisites

The Pipeline

Configure CI Checks for Agent PRs

Set Up Automated Code Review

Human Review Best Practices

Staging Deploy and Production

Session Isolation for Multi-Tenant Bot Deployments

Mastering OpenClaw Memory Architecture