project_name: agent-builder # Agent Builder — Session Context ## What this project does Design, build, and test autonomous N8N agents on server-01 sandbox before any production promotion. First two agents: Agent Builder Agent + N8N Builder Agent. ## Scheduled work (2026-06-16, running behind — started ~6:38 PM) 1. Vision Alignment Grill-Me — Agent Builder + N8N Builder vision + testing methodology 2. Agent Builder Agent — Deploy + Test (server-01 sandbox) 3. N8N Builder Agent — Deploy + Test (server-01 sandbox) ## Architecture - Agents run as N8N workflows on server-01 (n8n-sandbox, port 5679) - Sandbox-first: all agents tested in sandbox before any production promotion - server-01 sandbox stack: n8n-sandbox, postgres-sandbox, vault-sandbox, bitwarden-bridge-sandbox, vaultwarden-sandbox - Sandbox N8N API key: prod Vault at secret/sandbox/n8n - Sandbox reachable at 192.168.1.90 ## Key decisions (set during vision grill-me — 2026-06-16) - Agent Builder Agent: builds `claude_agent` and `script` types — Ollama (llama3.1:8b) does the building, `claude -p` is overseer/validator - N8N Builder Agent: builds `n8n_automation` types — Ollama generates workflow JSON, imports via N8N API, assigns credentials - automation_ideas schema changes needed: rename `description` → `task_description` (full structured spec), add `type` (n8n_automation/claude_agent/script), add `builder_status` - New `agent_test_results` table needed in api_business DB - Sandbox must mirror production: AppRole, Vaultwarden, bridge all configured before any agent deploys - Promotion = user approval required after all 4 test levels pass (not auto-promote in v1) - Dedicated backfill session needed for all 48 existing automation_ideas rows (type + task_description) - claude -p uses SDK credits (Pro = $20/month hard limit) — use sparingly, Ollama does the heavy lifting - Local model: llama3.1:8b already pulled on server-01 (4.9GB, fits in RTX 2060 Super 8GB VRAM) ## Testing methodology - Four levels: Structure → Deployment → Smoke → Assertion - LLM outputs validated on structure/side-effects only, never exact string match - All results logged to agent_test_results table - NTFY notification on pass and fail - Full methodology: .claude/playbook_testing_methodology.md ## Agents ### Agent Builder Agent - Status: pending — prereqs not complete - Purpose: Receives automation spec from automation_ideas DB, uses Ollama to build claude_agent or script type automations, deploys to sandbox, runs automated tests, notifies user for promotion approval - Builds: claude agents (via claude -p) and Python scripts (Docker containers) ### N8N Builder Agent - Status: pending — prereqs not complete - Purpose: Receives automation spec from automation_ideas DB, uses Ollama to generate N8N workflow JSON using n8n_automations playbook as context, imports to sandbox N8N via API, assigns credentials, runs automated tests - Will be used to build: id=12 (Media Pipeline Learning), id=7 (Friday Research Session Prep) ## Related personal_projects DB rows - id=4: N8N Workflow Builder Script (pending, weekend_block1) ## Prereq checklist (must complete before any agent deployment) - [x] Schema: rename automation_ideas.description → task_description, add type, add builder_status, add priority - [x] Create agent_test_results table in api_business - [x] Sandbox Vault: set up AppRole auth method (credentials at /opt/appdata/docker/docker-compose/vault/approle/ on server-01) - [x] Sandbox Vault: store sandbox N8N API key at secret/sandbox/n8n (key name: claude-sandbox, verified working) - [x] Verify sandbox Bitwarden bridge ↔ Vaultwarden sandbox end-to-end (bridge on port 8080, returns [] for empty vault — correct) - [x] Write Agent Builder Agent playbook → .claude/playbook_agent_builder_agent.md - [x] Write N8N Builder Agent playbook → .claude/playbook_n8n_builder_agent.md - [~] Backfill session: Resume from priority 28 (id=34, CalDAV Auto-Refresh Trigger) next session. ~30 automations remain. This session reviewed ids 28, 40, 53, 4, 7, 5, 14, 11, 44, 33, 29, 20, 41, 12, 1, 19, 39, 6. Key changes: id=7+50 merged into id=18 (full 10-stage business pipeline); id=18 expanded with Business Research Agent + Development Agent + parked idea email-to-Tyler flow; id=14+20 blocked (already built by schedule workflows); id=12+1+41 blocked (redundant); id=19 blocked (pending Jenkins); id=5 blocked (Obsidian vault not set up yet); id=40 pending with Jenkins conditional note. New rows added: id=54 (NTFY Topic Provisioner, p54), id=55 (Business Research Agent, p51), id=56 (Business Development Agent, p52), id=57 (Sandbox Environment Deployment Completion, p9 — NOTE: conflicts with id=28 priority 9, fix next session). Calendar events pushed to Nextcloud: Thu 6/18 12:45-3PM backfill + 3-4:30PM readiness, Fri 6/19 12:45-4:30PM agent builds. ## Final readiness check items (scheduled June 18) - All 8 prereq checklist items verified complete - Sandbox mirrors production: AppRole, bridge, Vaultwarden all confirmed functional - **Sensitive output interception system** — design and implement before any agent goes live: - Agents must scan their own stdout/logs before writing/sending output and redact anything matching secret patterns (tokens, keys, passwords, API keys) - Pattern list at minimum: `hvs\.`, `eyJ`, bearer tokens, anything from known env var names (BRIDGE_API_KEY, VAULT_TOKEN, N8N_ENCRYPTION_KEY, etc.) - Root cause: `docker inspect --format '{{range .Config.Env}}...'` dumps all env vars including secrets; agents will reach for broad diagnostic commands without filtering — local models even more so - Production exposure is a serious risk; sandbox exposure is acceptable but still undesirable - This system needs to exist at the agent level (not just Claude Code rules) because once agents run autonomously the user will not be watching ## Update instructions Update at the end of every agent-builder session. Keep agent status, key decisions, and prereq checklist current.