Files

T

Backtalk6858 db6cbbdec1 init: add claude-config and agent-builder context files

Initial commit tracking session context, playbooks, and automation specs
for claude-config and agent-builder Claude Code conversations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-06-17 23:08:23 -05:00

6.7 KiB

Raw Blame History

Playbook: Agent Builder Agent

Purpose

Builds claude_agent and script type automations from the automation_ideas table. Uses Ollama (llama3.1:8b) as the primary code generator and claude -p as the overseer/validator. Deploys to sandbox, runs all 4 test levels, notifies user for promotion approval.

Trigger

Scheduled or manual. Queries automation_ideas for the next row where:

type IN ('claude_agent', 'script')
status = 'ready_to_build'
builder_status = 'not_started'
Ordered by priority ASC NULLS LAST, id ASC

Only processes one automation per run.

Infrastructure

Runs on: server-01 (n8n-sandbox, port 5679)
Ollama endpoint: http://localhost:11434 (server-01 local)
Model: llama3.1:8b
Claude overseer: claude -p (non-interactive, SDK credits — use sparingly)
Vault: sandbox AppRole at /opt/appdata/docker/docker-compose/vault/approle/
Database: production api_business (read automation_ideas, write agent_test_results)
NTFY: production NTFY instance for notifications

Step-by-Step

Step 1 — Claim the automation

UPDATE automation_ideas
SET builder_status = 'queued'
WHERE id = <selected_id> AND builder_status = 'not_started';

If 0 rows updated: another builder claimed it — stop, notify, exit.

Step 2 — Build the prompt for Ollama

Construct a generation prompt using all available fields from the automation row:

name: what the automation is called
task_description: full structured spec — this is the primary instruction
type: claude_agent or script
infrastructure_requirement: what infra it needs access to

Prompt structure:

You are an expert automation engineer. Build a {type} automation with the following specification.

Name: {name}
Infrastructure: {infrastructure_requirement}

Specification:
{task_description}

Requirements:
- If type is claude_agent: output a complete shell-executable claude -p command with full system prompt and all logic. The agent must be self-contained.
- If type is script: output a complete Python script. Include a Dockerfile if the script has dependencies beyond stdlib.
- Output ONLY the code. No explanation, no markdown fences, no commentary.
- The code must handle its own error cases and log to stdout.
- Secrets must be fetched from Vault via AppRole — never hardcoded. AppRole credentials at /opt/appdata/docker/docker-compose/vault/approle/role-id and secret-id.

Step 3 — Generate with Ollama

POST http://localhost:11434/api/generate
{
  "model": "llama3.1:8b",
  "prompt": "<constructed prompt>",
  "stream": false
}

Set builder_status = 'building' before calling.

If Ollama call fails or times out (>120s): set builder_status = 'failed', log error, notify via NTFY, stop.

Step 4 — Overseer validation with claude -p

Pass the generated code to claude -p for structural review. Keep the prompt minimal to conserve SDK credits:

claude -p "Review this {type} automation code for the following only:
1. Does it correctly fetch secrets from Vault via AppRole (never hardcoded)?
2. Are there any obvious syntax errors or missing imports?
3. Does the logic match this spec summary: {name} — {task_description[:200]}

Respond with: PASS or FAIL, then one sentence explaining why.
Do not rewrite the code."

If FAIL: log claude's reason, set builder_status = 'failed', notify via NTFY with the failure reason, stop. If PASS: proceed.

Step 5 — Deploy to sandbox

For script type:

Write the generated code to a temp directory on server-01
If a Dockerfile was generated, build the image: docker build -t agent-{id}-{slug} .
Run a test container: docker run --rm agent-{id}-{slug} (dry run, no side effects)

For claude_agent type:

Write the generated claude -p command to a shell script
Make it executable
Run it once with --dry-run flag if supported, or with a test input that produces no side effects

If deployment fails: set builder_status = 'failed', log error, notify via NTFY, stop.

Step 6 — Run 4-level automated tests

Run each level in order. Stop and fail if any level fails. Log every result to agent_test_results.

Level 1 — Structure Validate the generated artifact:

For scripts: python3 -m py_compile script.py — must exit 0
For claude agents: verify the shell script is syntactically valid bash
For Dockerfiles: docker build --check if available, else verify FROM and key directives exist
Insert result: INSERT INTO agent_test_results (automation_id, test_level, status, execution_log) VALUES ({id}, 1, 'pass'/'fail', '{log}')

Level 2 — Deployment

Verify the artifact can be deployed cleanly (no missing dependencies, image builds successfully, script runs without import errors)
Must complete without crashing
Insert result to agent_test_results

Level 3 — Smoke

Execute the automation with minimal/test inputs
Must run to completion without an unhandled exception or non-zero exit
Insert result to agent_test_results

Level 4 — Assertion

Verify the correct side effect occurred (not string matching — check the actual system state)
Examples: a file was created, a DB row was written, an API call returned 200, a container is running
Insert result to agent_test_results

Step 7 — Notify user for promotion approval

If all 4 levels pass:

Set builder_status = 'awaiting_approval'

Send NTFY notification:

Title: Agent Ready for Promotion — {name}
Body: All 4 test levels passed in sandbox. Automation id={id} ({type}) is ready for production promotion. Reply to approve or reject.

User must explicitly approve before any production deployment. No auto-promotion in v1.

Step 8 — On approval

Set builder_status = 'approved', then 'deployed' after production deployment completes. Update automation_ideas status = 'deployed'.

Error handling

Any unhandled exception: set builder_status = 'failed', log to agent_test_results with test_level=0 and status='fail', send NTFY alert
Always release the claim (reset builder_status to 'not_started') if failing before Step 3 so another run can retry
After Step 3: leave as 'failed' — requires manual review before retry

NTFY notification patterns

Build started: [Agent Builder] Building {name} (id={id}, type={type})
Overseer FAIL: [Agent Builder] FAIL — Overseer rejected {name}: {reason}
Test level fail: [Agent Builder] FAIL — {name} failed Level {n}: {error}
Ready for approval: [Agent Builder] READY — {name} passed all tests, awaiting your approval
Unhandled error: [Agent Builder] ERROR — {name}: {exception}

SDK credit budget

claude -p is called once per automation (Step 4 only). Keep the overseer prompt under 500 tokens. Do not call claude -p for retries or debugging — only for the initial validation pass.

6.7 KiB Raw Blame History