Lesson 4 of 5
Review & Deploy Pipeline
Estimated time: 10 minutes
Review & Deploy Pipeline
The agent writes code and opens PRs. But between "PR opened" and "shipped to production," there's a critical pipeline: automated checks, human review, staging validation, and deployment. In this lesson, you'll build a pipeline that ensures agent-written code meets the same bar as human-written code.
Prerequisites
The Pipeline
Agent PR Automated Checks Human Review
┌──────────┐ ┌──────────────────┐ ┌──────────────┐
│ PR #142 │────────>│ ✓ Lint │────>│ Code review │
│ 5 files │ │ ✓ Type check │ │ by human │
│ 3 tests │ │ ✓ Unit tests │ │ developer │
└──────────┘ │ ✓ Integration │ └──────┬───────┘
│ ✓ Security scan │ │
│ ✓ Build │ ▼
└──────────────────┘ ┌──────────────┐
│ Staging │
│ deploy + │
│ smoke test │
└──────┬───────┘
│
▼
┌──────────────┐
│ Production │
│ deploy │
└──────────────┘
Configure CI Checks for Agent PRs
Agent PRs should go through the same (or stricter) CI pipeline as human PRs. Add an OpenClaw-specific workflow that includes extra validation.
name: Agent PR Checks
on:
pull_request:
types: [opened, synchronize]jobs:
validate:Only run on agent-created PRsif: contains(github.event.pull_request.labels.*.name, 'openclaw-agent')
runs-on: ubuntu-latest
steps:
uses: actions/checkout@v4
name: Install dependencies
run: npm ci
name: Lint
run: npm run lint
name: Type check
run: npx tsc --noEmit
name: Unit tests
run: npm test -- --coverage
name: Coverage threshold
run: |
Ensure agent didn't reduce coverage
npx nyc check-coverage --lines 80 --branches 75
name: Security audit
run: npm audit --audit-level=moderate
name: Build
run: npm run build
name: Diff size check
run: |
Alert if PR is suspiciously large
LINES=$(git diff --stat origin/main | tail -1 | grep -oP '\d+(?= insertion)')
if [ "$LINES" -gt 500 ]; then
echo "::warning::Large PR ($ insertions). Review carefully."
fiLabel-Based Triggers
The openclaw-agent label is automatically added to agent PRs. This lets you run additional checks (like diff size limits) that you might not need for human PRs.
Set Up Automated Code Review
Before a human looks at the PR, run automated review tools to catch common issues.
review:
auto_reviewers:Static analysis
tool: eslint
block_on: error # Block merge on lint errors
warn_on: warning # Comment but don't block on warnings
Security scanning
tool: semgrep
rules: ["p/typescript", "p/jwt", "p/sql-injection"]
block_on: error
Dependency check
tool: socket
block_on: high_risk # Block on risky new dependencies
Complexity check
tool: complexity
max_cyclomatic: 15 # Flag functions over 15 complexity
block_on: exceeded
Auto-assign human reviewerassign_reviewer:
strategy: codeowners # Use CODEOWNERS file
fallback: "team-lead" # If no owner matches
required_approvals: 1 # At least 1 human approvalThe automated review catches things before your eyes see the code:
| Check | What It Catches | Action |
|---|---|---|
| Lint | Style violations, unused vars | Block or auto-fix |
| Type check | Missing types, wrong types | Block |
| Semgrep | SQL injection, XSS, hardcoded secrets | Block |
| Complexity | Functions too long or nested | Flag for review |
| Diff size | PRs over 500 lines of changes | Warn reviewer |
| New deps | Unexpected npm packages added | Flag for review |
Human Review Best Practices
Automated checks passed. Now you review. Here's what to focus on for agent-written code.
SECURITY (most important for agent code)
[ ] No hardcoded secrets, tokens, or API keys
[ ] Input validation on all user-facing inputs
[ ] No SQL injection or XSS vectors
[ ] Auth checks on new endpoints
[ ] No elevated permissions or privilege escalationARCHITECTURE
[ ] Changes are in the right location for your project structure
[ ] No unnecessary abstractions or over-engineering
[ ] Consistent with existing patterns (naming, imports, exports)
[ ] No circular dependencies introducedEDGE CASES
[ ] Handles empty/null inputs gracefully
[ ] Error states covered (network failures, invalid data)
[ ] Concurrent access considered where relevantTESTS
[ ] Tests actually test the behavior, not just the implementation
[ ] Edge cases tested (empty input, max length, auth failure)
[ ] No flaky tests (timeouts, race conditions)The Agent's Blind Spot
AI coding agents are excellent at implementing the happy path but can miss subtle edge cases, race conditions, and security implications. Your review should focus on what the agent is weakest at: things that require understanding the broader system context and adversarial thinking.
Staging Deploy and Production
After approval, deploy to staging for a final sanity check before production.
deployment:
staging:
trigger: on_approval # Deploy to staging when PR approved
url: "https://staging.yourapp.com"
smoke_tests:
"curl -f https://staging.yourapp.com/health"
"npx playwright test --project=smoke"
hold_time: 30m # Wait 30 min for manual testing
production:
trigger: on_merge # Deploy when PR merged to main
strategy: rolling # Zero-downtime rolling deploy
rollback:
auto: true # Auto-rollback if error rate > 5%
window: 10m # Monitor for 10 minutes post-deploy
notifications:
channel: "#deploys"
message: "Deployed PR #: "Bot: PR #142 — "Add dark mode toggle"
✓ CI passed (lint, types, tests, security)
✓ You approved the PR
✓ Deployed to staging: staging.yourapp.com
✓ Smoke tests passedReady to merge to production. Reply "merge" or
"hold" if you want more staging time.You: MergeBot: Merged and deploying to production...
✓ Production deploy complete
✓ Health check passing
✓ Error rate: 0% (monitoring for 10 min)Dark mode is live! 🌙If something goes wrong after deploy:
Bot: ⚠️ Alert — Error rate spiked to 8% after deploying PR #142
Auto-rollback initiated...
✓ Rolled back to previous version
✓ Error rate returning to normal (0.2%)The dark mode PR caused errors on Safari 16.
I've reopened the PR and added a comment with the
error logs. Want me to attempt a fix?You: Yes, fix the Safari compatibility issueBot: Investigating... found the issue. Safari 16 doesn't
support the 'color-mix()' CSS function. Replacing
with fallback values. New commit pushed to PR #142.Why should agent-created PRs go through stricter automated checks than human PRs?
Mastering OpenClaw Memory Architecture
# Mastering OpenClaw Memory Architecture In OpenClaw, memory is not just a database; it’s a process of continuous evolution. This is what we call the **"Molt."** Just as a lobster sheds its shell to ...