db6cbbdec1
Initial commit tracking session context, playbooks, and automation specs for claude-config and agent-builder Claude Code conversations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
155 lines
6.7 KiB
Markdown
155 lines
6.7 KiB
Markdown
# Playbook: Agent Builder Agent
|
|
|
|
## Purpose
|
|
Builds `claude_agent` and `script` type automations from the `automation_ideas` table. Uses Ollama (llama3.1:8b) as the primary code generator and `claude -p` as the overseer/validator. Deploys to sandbox, runs all 4 test levels, notifies user for promotion approval.
|
|
|
|
## Trigger
|
|
Scheduled or manual. Queries `automation_ideas` for the next row where:
|
|
- `type IN ('claude_agent', 'script')`
|
|
- `status = 'ready_to_build'`
|
|
- `builder_status = 'not_started'`
|
|
- Ordered by `priority ASC NULLS LAST, id ASC`
|
|
|
|
Only processes one automation per run.
|
|
|
|
## Infrastructure
|
|
- Runs on: server-01 (n8n-sandbox, port 5679)
|
|
- Ollama endpoint: http://localhost:11434 (server-01 local)
|
|
- Model: llama3.1:8b
|
|
- Claude overseer: `claude -p` (non-interactive, SDK credits — use sparingly)
|
|
- Vault: sandbox AppRole at /opt/appdata/docker/docker-compose/vault/approle/
|
|
- Database: production api_business (read automation_ideas, write agent_test_results)
|
|
- NTFY: production NTFY instance for notifications
|
|
|
|
## Step-by-Step
|
|
|
|
### Step 1 — Claim the automation
|
|
```sql
|
|
UPDATE automation_ideas
|
|
SET builder_status = 'queued'
|
|
WHERE id = <selected_id> AND builder_status = 'not_started';
|
|
```
|
|
If 0 rows updated: another builder claimed it — stop, notify, exit.
|
|
|
|
### Step 2 — Build the prompt for Ollama
|
|
Construct a generation prompt using all available fields from the automation row:
|
|
- `name`: what the automation is called
|
|
- `task_description`: full structured spec — this is the primary instruction
|
|
- `type`: claude_agent or script
|
|
- `infrastructure_requirement`: what infra it needs access to
|
|
|
|
Prompt structure:
|
|
```
|
|
You are an expert automation engineer. Build a {type} automation with the following specification.
|
|
|
|
Name: {name}
|
|
Infrastructure: {infrastructure_requirement}
|
|
|
|
Specification:
|
|
{task_description}
|
|
|
|
Requirements:
|
|
- If type is claude_agent: output a complete shell-executable claude -p command with full system prompt and all logic. The agent must be self-contained.
|
|
- If type is script: output a complete Python script. Include a Dockerfile if the script has dependencies beyond stdlib.
|
|
- Output ONLY the code. No explanation, no markdown fences, no commentary.
|
|
- The code must handle its own error cases and log to stdout.
|
|
- Secrets must be fetched from Vault via AppRole — never hardcoded. AppRole credentials at /opt/appdata/docker/docker-compose/vault/approle/role-id and secret-id.
|
|
```
|
|
|
|
### Step 3 — Generate with Ollama
|
|
```
|
|
POST http://localhost:11434/api/generate
|
|
{
|
|
"model": "llama3.1:8b",
|
|
"prompt": "<constructed prompt>",
|
|
"stream": false
|
|
}
|
|
```
|
|
Set builder_status = 'building' before calling.
|
|
|
|
If Ollama call fails or times out (>120s): set builder_status = 'failed', log error, notify via NTFY, stop.
|
|
|
|
### Step 4 — Overseer validation with claude -p
|
|
Pass the generated code to `claude -p` for structural review. Keep the prompt minimal to conserve SDK credits:
|
|
|
|
```bash
|
|
claude -p "Review this {type} automation code for the following only:
|
|
1. Does it correctly fetch secrets from Vault via AppRole (never hardcoded)?
|
|
2. Are there any obvious syntax errors or missing imports?
|
|
3. Does the logic match this spec summary: {name} — {task_description[:200]}
|
|
|
|
Respond with: PASS or FAIL, then one sentence explaining why.
|
|
Do not rewrite the code."
|
|
```
|
|
|
|
If FAIL: log claude's reason, set builder_status = 'failed', notify via NTFY with the failure reason, stop.
|
|
If PASS: proceed.
|
|
|
|
### Step 5 — Deploy to sandbox
|
|
**For `script` type:**
|
|
1. Write the generated code to a temp directory on server-01
|
|
2. If a Dockerfile was generated, build the image: `docker build -t agent-{id}-{slug} .`
|
|
3. Run a test container: `docker run --rm agent-{id}-{slug}` (dry run, no side effects)
|
|
|
|
**For `claude_agent` type:**
|
|
1. Write the generated claude -p command to a shell script
|
|
2. Make it executable
|
|
3. Run it once with `--dry-run` flag if supported, or with a test input that produces no side effects
|
|
|
|
If deployment fails: set builder_status = 'failed', log error, notify via NTFY, stop.
|
|
|
|
### Step 6 — Run 4-level automated tests
|
|
Run each level in order. Stop and fail if any level fails. Log every result to `agent_test_results`.
|
|
|
|
**Level 1 — Structure**
|
|
Validate the generated artifact:
|
|
- For scripts: `python3 -m py_compile script.py` — must exit 0
|
|
- For claude agents: verify the shell script is syntactically valid bash
|
|
- For Dockerfiles: `docker build --check` if available, else verify FROM and key directives exist
|
|
- Insert result: `INSERT INTO agent_test_results (automation_id, test_level, status, execution_log) VALUES ({id}, 1, 'pass'/'fail', '{log}')`
|
|
|
|
**Level 2 — Deployment**
|
|
- Verify the artifact can be deployed cleanly (no missing dependencies, image builds successfully, script runs without import errors)
|
|
- Must complete without crashing
|
|
- Insert result to agent_test_results
|
|
|
|
**Level 3 — Smoke**
|
|
- Execute the automation with minimal/test inputs
|
|
- Must run to completion without an unhandled exception or non-zero exit
|
|
- Insert result to agent_test_results
|
|
|
|
**Level 4 — Assertion**
|
|
- Verify the correct side effect occurred (not string matching — check the actual system state)
|
|
- Examples: a file was created, a DB row was written, an API call returned 200, a container is running
|
|
- Insert result to agent_test_results
|
|
|
|
### Step 7 — Notify user for promotion approval
|
|
If all 4 levels pass:
|
|
1. Set builder_status = 'awaiting_approval'
|
|
2. Send NTFY notification:
|
|
```
|
|
Title: Agent Ready for Promotion — {name}
|
|
Body: All 4 test levels passed in sandbox. Automation id={id} ({type}) is ready for production promotion. Reply to approve or reject.
|
|
```
|
|
|
|
User must explicitly approve before any production deployment. No auto-promotion in v1.
|
|
|
|
### Step 8 — On approval
|
|
Set builder_status = 'approved', then 'deployed' after production deployment completes.
|
|
Update automation_ideas status = 'deployed'.
|
|
|
|
## Error handling
|
|
- Any unhandled exception: set builder_status = 'failed', log to agent_test_results with test_level=0 and status='fail', send NTFY alert
|
|
- Always release the claim (reset builder_status to 'not_started') if failing before Step 3 so another run can retry
|
|
- After Step 3: leave as 'failed' — requires manual review before retry
|
|
|
|
## NTFY notification patterns
|
|
- Build started: `[Agent Builder] Building {name} (id={id}, type={type})`
|
|
- Overseer FAIL: `[Agent Builder] FAIL — Overseer rejected {name}: {reason}`
|
|
- Test level fail: `[Agent Builder] FAIL — {name} failed Level {n}: {error}`
|
|
- Ready for approval: `[Agent Builder] READY — {name} passed all tests, awaiting your approval`
|
|
- Unhandled error: `[Agent Builder] ERROR — {name}: {exception}`
|
|
|
|
## SDK credit budget
|
|
`claude -p` is called once per automation (Step 4 only). Keep the overseer prompt under 500 tokens. Do not call claude -p for retries or debugging — only for the initial validation pass.
|