db6cbbdec1
Initial commit tracking session context, playbooks, and automation specs for claude-config and agent-builder Claude Code conversations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
74 lines
5.9 KiB
Markdown
74 lines
5.9 KiB
Markdown
project_name: agent-builder
|
|
# Agent Builder — Session Context
|
|
|
|
## What this project does
|
|
Design, build, and test autonomous N8N agents on server-01 sandbox before any production promotion.
|
|
First two agents: Agent Builder Agent + N8N Builder Agent.
|
|
|
|
## Scheduled work (2026-06-16, running behind — started ~6:38 PM)
|
|
1. Vision Alignment Grill-Me — Agent Builder + N8N Builder vision + testing methodology
|
|
2. Agent Builder Agent — Deploy + Test (server-01 sandbox)
|
|
3. N8N Builder Agent — Deploy + Test (server-01 sandbox)
|
|
|
|
## Architecture
|
|
- Agents run as N8N workflows on server-01 (n8n-sandbox, port 5679)
|
|
- Sandbox-first: all agents tested in sandbox before any production promotion
|
|
- server-01 sandbox stack: n8n-sandbox, postgres-sandbox, vault-sandbox, bitwarden-bridge-sandbox, vaultwarden-sandbox
|
|
- Sandbox N8N API key: prod Vault at secret/sandbox/n8n
|
|
- Sandbox reachable at 192.168.1.90
|
|
|
|
## Key decisions (set during vision grill-me — 2026-06-16)
|
|
- Agent Builder Agent: builds `claude_agent` and `script` types — Ollama (llama3.1:8b) does the building, `claude -p` is overseer/validator
|
|
- N8N Builder Agent: builds `n8n_automation` types — Ollama generates workflow JSON, imports via N8N API, assigns credentials
|
|
- automation_ideas schema changes needed: rename `description` → `task_description` (full structured spec), add `type` (n8n_automation/claude_agent/script), add `builder_status`
|
|
- New `agent_test_results` table needed in api_business DB
|
|
- Sandbox must mirror production: AppRole, Vaultwarden, bridge all configured before any agent deploys
|
|
- Promotion = user approval required after all 4 test levels pass (not auto-promote in v1)
|
|
- Dedicated backfill session needed for all 48 existing automation_ideas rows (type + task_description)
|
|
- claude -p uses SDK credits (Pro = $20/month hard limit) — use sparingly, Ollama does the heavy lifting
|
|
- Local model: llama3.1:8b already pulled on server-01 (4.9GB, fits in RTX 2060 Super 8GB VRAM)
|
|
|
|
## Testing methodology
|
|
- Four levels: Structure → Deployment → Smoke → Assertion
|
|
- LLM outputs validated on structure/side-effects only, never exact string match
|
|
- All results logged to agent_test_results table
|
|
- NTFY notification on pass and fail
|
|
- Full methodology: .claude/playbook_testing_methodology.md
|
|
|
|
## Agents
|
|
### Agent Builder Agent
|
|
- Status: pending — prereqs not complete
|
|
- Purpose: Receives automation spec from automation_ideas DB, uses Ollama to build claude_agent or script type automations, deploys to sandbox, runs automated tests, notifies user for promotion approval
|
|
- Builds: claude agents (via claude -p) and Python scripts (Docker containers)
|
|
|
|
### N8N Builder Agent
|
|
- Status: pending — prereqs not complete
|
|
- Purpose: Receives automation spec from automation_ideas DB, uses Ollama to generate N8N workflow JSON using n8n_automations playbook as context, imports to sandbox N8N via API, assigns credentials, runs automated tests
|
|
- Will be used to build: id=12 (Media Pipeline Learning), id=7 (Friday Research Session Prep)
|
|
|
|
## Related personal_projects DB rows
|
|
- id=4: N8N Workflow Builder Script (pending, weekend_block1)
|
|
|
|
## Prereq checklist (must complete before any agent deployment)
|
|
- [x] Schema: rename automation_ideas.description → task_description, add type, add builder_status, add priority
|
|
- [x] Create agent_test_results table in api_business
|
|
- [x] Sandbox Vault: set up AppRole auth method (credentials at /opt/appdata/docker/docker-compose/vault/approle/ on server-01)
|
|
- [x] Sandbox Vault: store sandbox N8N API key at secret/sandbox/n8n (key name: claude-sandbox, verified working)
|
|
- [x] Verify sandbox Bitwarden bridge ↔ Vaultwarden sandbox end-to-end (bridge on port 8080, returns [] for empty vault — correct)
|
|
- [x] Write Agent Builder Agent playbook → .claude/playbook_agent_builder_agent.md
|
|
- [x] Write N8N Builder Agent playbook → .claude/playbook_n8n_builder_agent.md
|
|
- [~] Backfill session: Resume from priority 28 (id=34, CalDAV Auto-Refresh Trigger) next session. ~30 automations remain. This session reviewed ids 28, 40, 53, 4, 7, 5, 14, 11, 44, 33, 29, 20, 41, 12, 1, 19, 39, 6. Key changes: id=7+50 merged into id=18 (full 10-stage business pipeline); id=18 expanded with Business Research Agent + Development Agent + parked idea email-to-Tyler flow; id=14+20 blocked (already built by schedule workflows); id=12+1+41 blocked (redundant); id=19 blocked (pending Jenkins); id=5 blocked (Obsidian vault not set up yet); id=40 pending with Jenkins conditional note. New rows added: id=54 (NTFY Topic Provisioner, p54), id=55 (Business Research Agent, p51), id=56 (Business Development Agent, p52), id=57 (Sandbox Environment Deployment Completion, p9 — NOTE: conflicts with id=28 priority 9, fix next session). Calendar events pushed to Nextcloud: Thu 6/18 12:45-3PM backfill + 3-4:30PM readiness, Fri 6/19 12:45-4:30PM agent builds.
|
|
|
|
## Final readiness check items (scheduled June 18)
|
|
- All 8 prereq checklist items verified complete
|
|
- Sandbox mirrors production: AppRole, bridge, Vaultwarden all confirmed functional
|
|
- **Sensitive output interception system** — design and implement before any agent goes live:
|
|
- Agents must scan their own stdout/logs before writing/sending output and redact anything matching secret patterns (tokens, keys, passwords, API keys)
|
|
- Pattern list at minimum: `hvs\.`, `eyJ`, bearer tokens, anything from known env var names (BRIDGE_API_KEY, VAULT_TOKEN, N8N_ENCRYPTION_KEY, etc.)
|
|
- Root cause: `docker inspect --format '{{range .Config.Env}}...'` dumps all env vars including secrets; agents will reach for broad diagnostic commands without filtering — local models even more so
|
|
- Production exposure is a serious risk; sandbox exposure is acceptable but still undesirable
|
|
- This system needs to exist at the agent level (not just Claude Code rules) because once agents run autonomously the user will not be watching
|
|
|
|
## Update instructions
|
|
Update at the end of every agent-builder session. Keep agent status, key decisions, and prereq checklist current.
|