project_name: agent-builder
# Agent Builder — Session Context

## What this project does
Design, build, and test autonomous N8N agents on server-01 sandbox before any production promotion.
First two agents: Agent Builder Agent + N8N Builder Agent.

## Scheduled work (2026-06-16, running behind — started ~6:38 PM)
1. Vision Alignment Grill-Me — Agent Builder + N8N Builder vision + testing methodology
2. Agent Builder Agent — Deploy + Test (server-01 sandbox)
3. N8N Builder Agent — Deploy + Test (server-01 sandbox)

## Architecture
- Agents run as N8N workflows on server-01 (n8n-sandbox, port 5679)
- Sandbox-first: all agents tested in sandbox before any production promotion
- server-01 sandbox stack: n8n-sandbox, postgres-sandbox, vault-sandbox, bitwarden-bridge-sandbox, vaultwarden-sandbox
- Sandbox N8N API key: prod Vault at secret/sandbox/n8n
- Sandbox reachable at 192.168.1.90

## Key decisions (set during vision grill-me — 2026-06-16)
- Agent Builder Agent: builds `claude_agent` and `script` types — Ollama (llama3.1:8b) does the building, `claude -p` is overseer/validator
- N8N Builder Agent: builds `n8n_automation` types — Ollama generates workflow JSON, imports via N8N API, assigns credentials
- automation_ideas schema changes needed: rename `description` → `task_description` (full structured spec), add `type` (n8n_automation/claude_agent/script), add `builder_status`
- New `agent_test_results` table needed in api_business DB
- Sandbox must mirror production: AppRole, Vaultwarden, bridge all configured before any agent deploys
- Promotion = user approval required after all 4 test levels pass (not auto-promote in v1)
- Dedicated backfill session needed for all 48 existing automation_ideas rows (type + task_description)
- claude -p uses SDK credits (Pro = $20/month hard limit) — use sparingly, Ollama does the heavy lifting
- Local model: llama3.1:8b already pulled on server-01 (4.9GB, fits in RTX 2060 Super 8GB VRAM)

## Testing methodology
- Four levels: Structure → Deployment → Smoke → Assertion
- LLM outputs validated on structure/side-effects only, never exact string match
- All results logged to agent_test_results table
- NTFY notification on pass and fail
- Full methodology: .claude/playbook_testing_methodology.md

## Agents
### Agent Builder Agent
- Status: pending — prereqs not complete
- Purpose: Receives automation spec from automation_ideas DB, uses Ollama to build claude_agent or script type automations, deploys to sandbox, runs automated tests, notifies user for promotion approval
- Builds: claude agents (via claude -p) and Python scripts (Docker containers)

### N8N Builder Agent
- Status: pending — prereqs not complete
- Purpose: Receives automation spec from automation_ideas DB, uses Ollama to generate N8N workflow JSON using n8n_automations playbook as context, imports to sandbox N8N via API, assigns credentials, runs automated tests
- Will be used to build: id=12 (Media Pipeline Learning), id=7 (Friday Research Session Prep)

## Related personal_projects DB rows
- id=4: N8N Workflow Builder Script (pending, weekend_block1)

## Prereq checklist (must complete before any agent deployment)
- [x] Schema: rename automation_ideas.description → task_description, add type, add builder_status, add priority
- [x] Create agent_test_results table in api_business
- [x] Sandbox Vault: set up AppRole auth method (credentials at /opt/appdata/docker/docker-compose/vault/approle/ on server-01)
- [x] Sandbox Vault: store sandbox N8N API key at secret/sandbox/n8n (key name: claude-sandbox, verified working)
- [x] Verify sandbox Bitwarden bridge ↔ Vaultwarden sandbox end-to-end (bridge on port 8080, returns [] for empty vault — correct)
- [x] Write Agent Builder Agent playbook → .claude/playbook_agent_builder_agent.md
- [x] Write N8N Builder Agent playbook → .claude/playbook_n8n_builder_agent.md
- [~] Backfill session: Resume from priority 28 (id=34, CalDAV Auto-Refresh Trigger) next session. ~30 automations remain. This session reviewed ids 28, 40, 53, 4, 7, 5, 14, 11, 44, 33, 29, 20, 41, 12, 1, 19, 39, 6. Key changes: id=7+50 merged into id=18 (full 10-stage business pipeline); id=18 expanded with Business Research Agent + Development Agent + parked idea email-to-Tyler flow; id=14+20 blocked (already built by schedule workflows); id=12+1+41 blocked (redundant); id=19 blocked (pending Jenkins); id=5 blocked (Obsidian vault not set up yet); id=40 pending with Jenkins conditional note. New rows added: id=54 (NTFY Topic Provisioner, p54), id=55 (Business Research Agent, p51), id=56 (Business Development Agent, p52), id=57 (Sandbox Environment Deployment Completion, p9 — NOTE: conflicts with id=28 priority 9, fix next session). Calendar events pushed to Nextcloud: Thu 6/18 12:45-3PM backfill + 3-4:30PM readiness, Fri 6/19 12:45-4:30PM agent builds.

## Final readiness check items (scheduled June 18)
- All 8 prereq checklist items verified complete
- Sandbox mirrors production: AppRole, bridge, Vaultwarden all confirmed functional
- **Sensitive output interception system** — design and implement before any agent goes live:
  - Agents must scan their own stdout/logs before writing/sending output and redact anything matching secret patterns (tokens, keys, passwords, API keys)
  - Pattern list at minimum: `hvs\.`, `eyJ`, bearer tokens, anything from known env var names (BRIDGE_API_KEY, VAULT_TOKEN, N8N_ENCRYPTION_KEY, etc.)
  - Root cause: `docker inspect --format '{{range .Config.Env}}...'` dumps all env vars including secrets; agents will reach for broad diagnostic commands without filtering — local models even more so
  - Production exposure is a serious risk; sandbox exposure is acceptable but still undesirable
  - This system needs to exist at the agent level (not just Claude Code rules) because once agents run autonomously the user will not be watching

## Update instructions
Update at the end of every agent-builder session. Keep agent status, key decisions, and prereq checklist current.