commit db6cbbdec1ecbc0e8c49621e08436f18d1bdb68e
Author: Backtalk6858 <megafreeman11@tutamail.com>
Date:   Wed Jun 17 23:08:23 2026 -0500

    init: add claude-config and agent-builder context files
    
    Initial commit tracking session context, playbooks, and automation specs
    for claude-config and agent-builder Claude Code conversations.
    
    Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

diff --git a/.claude/context.md b/.claude/context.md
new file mode 100644
index 0000000..64c65ac
--- /dev/null
+++ b/.claude/context.md
@@ -0,0 +1,78 @@
+project_name: claude-config
+# Claude Config Audit — Context
+
+## What this project does
+Saturday/Sunday Claude configuration and improvement sessions — behavior_changes review, hook development, memory system upgrades, skills development, and config audit workflow.
+
+## Key files
+- `/opt/appdata/docker/.claude/hooks/`: all hooks live here (globally referenced)
+- `/opt/appdata/docker/.claude/scripts/session_log_insert.sh`: validated insert helper for session_log table
+- `/home/administrator/.claude/settings.json`: global settings, hook registration, plugin registration
+- `/home/administrator/.claude/projects/-opt-appdata-docker/memory/`: primary memory directory
+- `/home/administrator/.claude/projects/-home-administrator-Desktop-claude/memory/`: this session's memory
+- `project_claude_config_workflow.md`: Saturday workflow, behavior_changes schema
+- `AUDIT_CONTEXT.md` (Desktop/claude/): handoff file written at start of each audit session
+
+## Patterns to follow
+- Validate Python hooks with `python3 -c "import ast; ast.parse(open('file').read())"` before saving
+- All new memory files go in both project memory directories if globally accessible
+- behavior_changes INSERT: copy template from project_claude_config_workflow.md exactly
+- New rules go to behavior_changes DB with status='applied' after being written to memory
+- Evolution instructions required on every new rule, playbook, or system
+- Coolify API key: Vault `secret/coolify → api_key` via AppRole — NOT a file
+- Always backup hooks before modifying: `cp hook.sh hook.sh.bak.YYYYMMDD`
+
+## Known issues / gotchas
+- Semantic memory Phase 1 hooks NOT YET BUILT — Stop hook and SessionStart hook extensions pending
+- `/recall` skill IS built at `/opt/appdata/docker/.claude/skills/recall/SKILL.md` — registered under homelab-skills plugin; invoke via Skill tool with name "recall" (not "homelab-skills@homelab-skills-local:recall")
+- Feedback rule retrofit complete — all 35 files now have evolution instructions
+- session_log_insert.sh validates project_id before inserting — use --dry-run to test
+- pgvector ivfflat index warns about low recall with little data — normal until table is populated
+- N8N work that results from claude-config design decisions is IN SCOPE for Sunday dev session
+
+## What NOT to break
+- Hook registration in settings.json — all five hooks must remain wired
+- Memory file indexing in both MEMORY.md files — every new memory file must be indexed
+- Ollama on localhost:11434 (internal-only) — Coolify UUID: mbhuoyt968m23qt4x7lx01pc
+- pgvector extension in claude_config DB — required for claude_memories table
+
+## Current state
+2026-06-04 Thursday: git criteria universal playbook built (personal_projects id=38 → completed). playbook_git_criteria_universal.md created in both memory dirs — single authorized trigger (checklist only), partial commit on hard block, universal pre-stage registry, dynamic co-author line, two-layer evolution. playbook_git_commits.md updated to extend it. media-api.py + related files still uncommitted — containerized script, testing not confirmed. Next session: security vulnerability patching (~6:30 PM Thursday).
+
+## Sunday Dev Plan (13 items)
+**Hooks:**
+1. Stop hook — MEMORY_EMBED tag detection + Ollama embed (from prior session)
+2. SessionStart hook — semantic query + global handoff file + recent 48h summaries injection
+3. UserPromptSubmit hook — add [CURRENT TIME] injection + 60% context checkpoint threshold
+4. PostToolUse hook (new) — conflict-detector.py fires on Write/Edit to */memory/*.md
+
+**Skills:**
+5. /recall skill — confidence-gated semantic memory query
+
+**New files:**
+6. playbook_checklist_decision.md — merged checklist (grill-me finalized 2026-05-30):
+   DESIGN: old end-of-project + end-of-session checklists MERGED into one. Finishing a project = ending that conversation's session.
+   Trigger table:
+     - Built + verified working (project in DB)    → Full merged checklist, all steps
+     - Built + verified working (no DB entry)      → Full merged checklist, skip DB steps
+     - Planning/research only, nothing built       → Summaries + context.md only
+     - Switching conversations mid-project         → Summaries + context.md only
+     - 80% context hit, project mid-flight         → Lightweight checkpoint: WIP commit, context.md "stopped at X", log follow-ups, routing
+   "Project finished" = code/config/infra built AND basic functional check passes. "User says done" alone is not sufficient.
+   Also update feedback_end_of_project_checklist.md + feedback_end_of_session_checklist.md to point to new merged playbook.
+7. feedback_secrets_lookup_order.md — Vault before Bitwarden for any credential lookup
+8. session_handoff.md template + write step added to conversation routing checklist
+
+**DB / Schema:**
+9. schedule_actuals table DDL in claude_config DB
+10. proposed_schedule JSON template update (time_estimate_minutes + estimate_basis) + WF1b N8N node
+
+**Playbook update:**
+11. playbook_vault_token_rotation.md — add N8N manual rotation section
+
+**Behavior changes entries:**
+12. Pre-task context estimation rule (flag if >15% remaining context)
+13. Per-project time estimation at session start (query completed-today projects)
+
+## Update instructions
+Update at the end of every config audit session. Keep "Current state" section and pending items current.
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..26d064f
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,7 @@
+*.env
+*.key
+*.secret
+*credentials*
+*token*
+__pycache__/
+*.pyc
diff --git a/AUDIT_CONTEXT.md b/AUDIT_CONTEXT.md
new file mode 100644
index 0000000..a38cbf4
--- /dev/null
+++ b/AUDIT_CONTEXT.md
@@ -0,0 +1,55 @@
+# Saturday 2026-05-23 — Config Audit Context Handoff
+
+## What happened today (API Idea conversation)
+Extended Saturday session covering Friday's missed work (user was sick).
+
+**Completed:**
+- N8N AppRole migration — all 3 outreach workflows migrated from expired static Vault token to dynamic AppRole auth (30-min TTL, revoked after each execution). Execution 86 confirmed working.
+- Dry run gate restructure — all 3 workflows restructured so dry run tests the full AppRole + Vault path
+- Orchestrator audit — `amHPv6iQ5HmIexVt` uses `$vars` (paid feature, never worked); 3 rotation sub-workflows still on expired static token → logged as personal_projects id=42
+- Vault Policy Registry DB (`vault_registry`) — new Postgres DB with pgcrypto encryption on sensitive columns; all 8 known policies pre-populated; passphrase at Vault `secret/data/vault-registry`
+- Postgres DB creation playbook written: `playbook_postgres_databases.md` (data classification + encryption rules for all future DBs)
+- business_projects id=7 (Follow-up Outreach workflow) and id=12 (Brevo Unsubscribe Handler) logged for next week — not built today, out of time
+- unsubscribed_at column added to contacted_leads
+
+**Not done (ran out of time):**
+- Claude config audit ← YOU ARE HERE
+- Scheduling playbook finalization → Nextcloud conversation after this
+- Schedule generation agent Phase 1 → Nextcloud conversation after this
+
+---
+
+## Config Audit Agenda
+
+### 1. behavior_changes DB review (normal Saturday start)
+Query: `docker exec $(docker ps --format '{{.Names}}' | grep '^postgres-') psql -U postgres -d claude_config -c "SELECT * FROM behavior_changes ORDER BY session_date DESC LIMIT 20"`
+
+Review recent behavior changes — identify what's working, what to retire, what needs development.
+
+### 2. Memory organization problem (DISCUSS — don't pre-solve)
+Today's session spent significant tokens trying to find Friday's session summaries. Root cause: session summaries are named by project (e.g. `session_summary_api_business.md`), not by date. Finding "what happened on Friday May 22" required brute-force JSONL searching. 
+
+Discuss options during the audit, decide what to build tomorrow.
+
+### 3. Memory upgrade discussion
+User wants to discuss the broader memory upgrade (Obsidian, mem0, MemGPT candidates — see `project_claude_memory_upgrade.md`). May or may not be blocked on local AI hardware.
+
+### 4. CLAUDE.md GitHub repo (user will bring transcript)
+Someone posted their CLAUDE.md files to GitHub — repo got ~68k stars in 24 hours. User wants to review and see if any of the patterns are usable here. User will paste the transcript into the conversation.
+
+### 5. Tomorrow's development agenda (Sunday)
+Whatever the audit identifies as highest value to develop. Memory org fix is a strong candidate.
+
+---
+
+## Key project DB IDs for reference
+- personal_projects id=40: Vault Policy Registry DB — COMPLETED today
+- personal_projects id=42: N8N Rotation Workflows AppRole Migration — pending
+- personal_projects id=43: AI-Assisted DB Encryption Scanner — pending (blocked on local AI? user says Claude can do it now — discuss)
+- business_projects id=7: Follow-up Outreach Emails workflow — pending
+- business_projects id=12: Brevo Unsubscribe Handler — pending
+
+## After the audit
+Come back to the **Nextcloud conversation** (`~/Desktop` or wherever it lives) for:
+1. Scheduling playbook finalization (recurring_events DB table design + machine-readable rule format)
+2. Schedule generation agent Phase 1 build (N8N → NTFY approval → Nextcloud CalDAV push)
diff --git a/agent-builder/.claude/context.md b/agent-builder/.claude/context.md
new file mode 100644
index 0000000..c1661c6
--- /dev/null
+++ b/agent-builder/.claude/context.md
@@ -0,0 +1,73 @@
+project_name: agent-builder
+# Agent Builder — Session Context
+
+## What this project does
+Design, build, and test autonomous N8N agents on server-01 sandbox before any production promotion.
+First two agents: Agent Builder Agent + N8N Builder Agent.
+
+## Scheduled work (2026-06-16, running behind — started ~6:38 PM)
+1. Vision Alignment Grill-Me — Agent Builder + N8N Builder vision + testing methodology
+2. Agent Builder Agent — Deploy + Test (server-01 sandbox)
+3. N8N Builder Agent — Deploy + Test (server-01 sandbox)
+
+## Architecture
+- Agents run as N8N workflows on server-01 (n8n-sandbox, port 5679)
+- Sandbox-first: all agents tested in sandbox before any production promotion
+- server-01 sandbox stack: n8n-sandbox, postgres-sandbox, vault-sandbox, bitwarden-bridge-sandbox, vaultwarden-sandbox
+- Sandbox N8N API key: prod Vault at secret/sandbox/n8n
+- Sandbox reachable at 192.168.1.90
+
+## Key decisions (set during vision grill-me — 2026-06-16)
+- Agent Builder Agent: builds `claude_agent` and `script` types — Ollama (llama3.1:8b) does the building, `claude -p` is overseer/validator
+- N8N Builder Agent: builds `n8n_automation` types — Ollama generates workflow JSON, imports via N8N API, assigns credentials
+- automation_ideas schema changes needed: rename `description` → `task_description` (full structured spec), add `type` (n8n_automation/claude_agent/script), add `builder_status`
+- New `agent_test_results` table needed in api_business DB
+- Sandbox must mirror production: AppRole, Vaultwarden, bridge all configured before any agent deploys
+- Promotion = user approval required after all 4 test levels pass (not auto-promote in v1)
+- Dedicated backfill session needed for all 48 existing automation_ideas rows (type + task_description)
+- claude -p uses SDK credits (Pro = $20/month hard limit) — use sparingly, Ollama does the heavy lifting
+- Local model: llama3.1:8b already pulled on server-01 (4.9GB, fits in RTX 2060 Super 8GB VRAM)
+
+## Testing methodology
+- Four levels: Structure → Deployment → Smoke → Assertion
+- LLM outputs validated on structure/side-effects only, never exact string match
+- All results logged to agent_test_results table
+- NTFY notification on pass and fail
+- Full methodology: .claude/playbook_testing_methodology.md
+
+## Agents
+### Agent Builder Agent
+- Status: pending — prereqs not complete
+- Purpose: Receives automation spec from automation_ideas DB, uses Ollama to build claude_agent or script type automations, deploys to sandbox, runs automated tests, notifies user for promotion approval
+- Builds: claude agents (via claude -p) and Python scripts (Docker containers)
+
+### N8N Builder Agent
+- Status: pending — prereqs not complete
+- Purpose: Receives automation spec from automation_ideas DB, uses Ollama to generate N8N workflow JSON using n8n_automations playbook as context, imports to sandbox N8N via API, assigns credentials, runs automated tests
+- Will be used to build: id=12 (Media Pipeline Learning), id=7 (Friday Research Session Prep)
+
+## Related personal_projects DB rows
+- id=4: N8N Workflow Builder Script (pending, weekend_block1)
+
+## Prereq checklist (must complete before any agent deployment)
+- [x] Schema: rename automation_ideas.description → task_description, add type, add builder_status, add priority
+- [x] Create agent_test_results table in api_business
+- [x] Sandbox Vault: set up AppRole auth method (credentials at /opt/appdata/docker/docker-compose/vault/approle/ on server-01)
+- [x] Sandbox Vault: store sandbox N8N API key at secret/sandbox/n8n (key name: claude-sandbox, verified working)
+- [x] Verify sandbox Bitwarden bridge ↔ Vaultwarden sandbox end-to-end (bridge on port 8080, returns [] for empty vault — correct)
+- [x] Write Agent Builder Agent playbook → .claude/playbook_agent_builder_agent.md
+- [x] Write N8N Builder Agent playbook → .claude/playbook_n8n_builder_agent.md
+- [~] Backfill session: Resume from priority 28 (id=34, CalDAV Auto-Refresh Trigger) next session. ~30 automations remain. This session reviewed ids 28, 40, 53, 4, 7, 5, 14, 11, 44, 33, 29, 20, 41, 12, 1, 19, 39, 6. Key changes: id=7+50 merged into id=18 (full 10-stage business pipeline); id=18 expanded with Business Research Agent + Development Agent + parked idea email-to-Tyler flow; id=14+20 blocked (already built by schedule workflows); id=12+1+41 blocked (redundant); id=19 blocked (pending Jenkins); id=5 blocked (Obsidian vault not set up yet); id=40 pending with Jenkins conditional note. New rows added: id=54 (NTFY Topic Provisioner, p54), id=55 (Business Research Agent, p51), id=56 (Business Development Agent, p52), id=57 (Sandbox Environment Deployment Completion, p9 — NOTE: conflicts with id=28 priority 9, fix next session). Calendar events pushed to Nextcloud: Thu 6/18 12:45-3PM backfill + 3-4:30PM readiness, Fri 6/19 12:45-4:30PM agent builds.
+
+## Final readiness check items (scheduled June 18)
+- All 8 prereq checklist items verified complete
+- Sandbox mirrors production: AppRole, bridge, Vaultwarden all confirmed functional
+- **Sensitive output interception system** — design and implement before any agent goes live:
+  - Agents must scan their own stdout/logs before writing/sending output and redact anything matching secret patterns (tokens, keys, passwords, API keys)
+  - Pattern list at minimum: `hvs\.`, `eyJ`, bearer tokens, anything from known env var names (BRIDGE_API_KEY, VAULT_TOKEN, N8N_ENCRYPTION_KEY, etc.)
+  - Root cause: `docker inspect --format '{{range .Config.Env}}...'` dumps all env vars including secrets; agents will reach for broad diagnostic commands without filtering — local models even more so
+  - Production exposure is a serious risk; sandbox exposure is acceptable but still undesirable
+  - This system needs to exist at the agent level (not just Claude Code rules) because once agents run autonomously the user will not be watching
+
+## Update instructions
+Update at the end of every agent-builder session. Keep agent status, key decisions, and prereq checklist current.
diff --git a/agent-builder/.claude/playbook_agent_builder_agent.md b/agent-builder/.claude/playbook_agent_builder_agent.md
new file mode 100644
index 0000000..97bfbc8
--- /dev/null
+++ b/agent-builder/.claude/playbook_agent_builder_agent.md
@@ -0,0 +1,154 @@
+# Playbook: Agent Builder Agent
+
+## Purpose
+Builds `claude_agent` and `script` type automations from the `automation_ideas` table. Uses Ollama (llama3.1:8b) as the primary code generator and `claude -p` as the overseer/validator. Deploys to sandbox, runs all 4 test levels, notifies user for promotion approval.
+
+## Trigger
+Scheduled or manual. Queries `automation_ideas` for the next row where:
+- `type IN ('claude_agent', 'script')`
+- `status = 'ready_to_build'`
+- `builder_status = 'not_started'`
+- Ordered by `priority ASC NULLS LAST, id ASC`
+
+Only processes one automation per run.
+
+## Infrastructure
+- Runs on: server-01 (n8n-sandbox, port 5679)
+- Ollama endpoint: http://localhost:11434 (server-01 local)
+- Model: llama3.1:8b
+- Claude overseer: `claude -p` (non-interactive, SDK credits — use sparingly)
+- Vault: sandbox AppRole at /opt/appdata/docker/docker-compose/vault/approle/
+- Database: production api_business (read automation_ideas, write agent_test_results)
+- NTFY: production NTFY instance for notifications
+
+## Step-by-Step
+
+### Step 1 — Claim the automation
+```sql
+UPDATE automation_ideas
+SET builder_status = 'queued'
+WHERE id = <selected_id> AND builder_status = 'not_started';
+```
+If 0 rows updated: another builder claimed it — stop, notify, exit.
+
+### Step 2 — Build the prompt for Ollama
+Construct a generation prompt using all available fields from the automation row:
+- `name`: what the automation is called
+- `task_description`: full structured spec — this is the primary instruction
+- `type`: claude_agent or script
+- `infrastructure_requirement`: what infra it needs access to
+
+Prompt structure:
+```
+You are an expert automation engineer. Build a {type} automation with the following specification.
+
+Name: {name}
+Infrastructure: {infrastructure_requirement}
+
+Specification:
+{task_description}
+
+Requirements:
+- If type is claude_agent: output a complete shell-executable claude -p command with full system prompt and all logic. The agent must be self-contained.
+- If type is script: output a complete Python script. Include a Dockerfile if the script has dependencies beyond stdlib.
+- Output ONLY the code. No explanation, no markdown fences, no commentary.
+- The code must handle its own error cases and log to stdout.
+- Secrets must be fetched from Vault via AppRole — never hardcoded. AppRole credentials at /opt/appdata/docker/docker-compose/vault/approle/role-id and secret-id.
+```
+
+### Step 3 — Generate with Ollama
+```
+POST http://localhost:11434/api/generate
+{
+  "model": "llama3.1:8b",
+  "prompt": "<constructed prompt>",
+  "stream": false
+}
+```
+Set builder_status = 'building' before calling.
+
+If Ollama call fails or times out (>120s): set builder_status = 'failed', log error, notify via NTFY, stop.
+
+### Step 4 — Overseer validation with claude -p
+Pass the generated code to `claude -p` for structural review. Keep the prompt minimal to conserve SDK credits:
+
+```bash
+claude -p "Review this {type} automation code for the following only:
+1. Does it correctly fetch secrets from Vault via AppRole (never hardcoded)?
+2. Are there any obvious syntax errors or missing imports?
+3. Does the logic match this spec summary: {name} — {task_description[:200]}
+
+Respond with: PASS or FAIL, then one sentence explaining why.
+Do not rewrite the code."
+```
+
+If FAIL: log claude's reason, set builder_status = 'failed', notify via NTFY with the failure reason, stop.
+If PASS: proceed.
+
+### Step 5 — Deploy to sandbox
+**For `script` type:**
+1. Write the generated code to a temp directory on server-01
+2. If a Dockerfile was generated, build the image: `docker build -t agent-{id}-{slug} .`
+3. Run a test container: `docker run --rm agent-{id}-{slug}` (dry run, no side effects)
+
+**For `claude_agent` type:**
+1. Write the generated claude -p command to a shell script
+2. Make it executable
+3. Run it once with `--dry-run` flag if supported, or with a test input that produces no side effects
+
+If deployment fails: set builder_status = 'failed', log error, notify via NTFY, stop.
+
+### Step 6 — Run 4-level automated tests
+Run each level in order. Stop and fail if any level fails. Log every result to `agent_test_results`.
+
+**Level 1 — Structure**
+Validate the generated artifact:
+- For scripts: `python3 -m py_compile script.py` — must exit 0
+- For claude agents: verify the shell script is syntactically valid bash
+- For Dockerfiles: `docker build --check` if available, else verify FROM and key directives exist
+- Insert result: `INSERT INTO agent_test_results (automation_id, test_level, status, execution_log) VALUES ({id}, 1, 'pass'/'fail', '{log}')`
+
+**Level 2 — Deployment**
+- Verify the artifact can be deployed cleanly (no missing dependencies, image builds successfully, script runs without import errors)
+- Must complete without crashing
+- Insert result to agent_test_results
+
+**Level 3 — Smoke**
+- Execute the automation with minimal/test inputs
+- Must run to completion without an unhandled exception or non-zero exit
+- Insert result to agent_test_results
+
+**Level 4 — Assertion**
+- Verify the correct side effect occurred (not string matching — check the actual system state)
+- Examples: a file was created, a DB row was written, an API call returned 200, a container is running
+- Insert result to agent_test_results
+
+### Step 7 — Notify user for promotion approval
+If all 4 levels pass:
+1. Set builder_status = 'awaiting_approval'
+2. Send NTFY notification:
+   ```
+   Title: Agent Ready for Promotion — {name}
+   Body: All 4 test levels passed in sandbox. Automation id={id} ({type}) is ready for production promotion. Reply to approve or reject.
+   ```
+
+User must explicitly approve before any production deployment. No auto-promotion in v1.
+
+### Step 8 — On approval
+Set builder_status = 'approved', then 'deployed' after production deployment completes.
+Update automation_ideas status = 'deployed'.
+
+## Error handling
+- Any unhandled exception: set builder_status = 'failed', log to agent_test_results with test_level=0 and status='fail', send NTFY alert
+- Always release the claim (reset builder_status to 'not_started') if failing before Step 3 so another run can retry
+- After Step 3: leave as 'failed' — requires manual review before retry
+
+## NTFY notification patterns
+- Build started: `[Agent Builder] Building {name} (id={id}, type={type})`
+- Overseer FAIL: `[Agent Builder] FAIL — Overseer rejected {name}: {reason}`
+- Test level fail: `[Agent Builder] FAIL — {name} failed Level {n}: {error}`
+- Ready for approval: `[Agent Builder] READY — {name} passed all tests, awaiting your approval`
+- Unhandled error: `[Agent Builder] ERROR — {name}: {exception}`
+
+## SDK credit budget
+`claude -p` is called once per automation (Step 4 only). Keep the overseer prompt under 500 tokens. Do not call claude -p for retries or debugging — only for the initial validation pass.
diff --git a/agent-builder/.claude/playbook_n8n_builder_agent.md b/agent-builder/.claude/playbook_n8n_builder_agent.md
new file mode 100644
index 0000000..89d8243
--- /dev/null
+++ b/agent-builder/.claude/playbook_n8n_builder_agent.md
@@ -0,0 +1,209 @@
+# Playbook: N8N Builder Agent
+
+## Purpose
+Builds `n8n_automation` type automations from the `automation_ideas` table. Uses Ollama (llama3.1:8b) to generate N8N workflow JSON, imports to sandbox N8N via API, assigns credentials, runs all 4 test levels, notifies user for promotion approval.
+
+## Trigger
+Scheduled or manual. Queries `automation_ideas` for the next row where:
+- `type = 'n8n_automation'`
+- `status = 'ready_to_build'`
+- `builder_status = 'not_started'`
+- Ordered by `priority ASC NULLS LAST, id ASC`
+
+Only processes one automation per run.
+
+## Infrastructure
+- Runs on: server-01 (n8n-sandbox, port 5679)
+- Ollama endpoint: http://localhost:11434 (server-01 local)
+- Model: llama3.1:8b
+- Claude overseer: `claude -p` (non-interactive, SDK credits — use sparingly)
+- Sandbox N8N API: http://192.168.1.90:5679 — API key from Vault at secret/sandbox/n8n
+- Vault: sandbox AppRole at /opt/appdata/docker/docker-compose/vault/approle/
+- Database: production api_business (read automation_ideas, write agent_test_results)
+- NTFY: production NTFY instance for notifications
+
+## N8N Workflow JSON Structure (required knowledge)
+Every valid N8N workflow JSON must include:
+```json
+{
+  "name": "Workflow Name",
+  "nodes": [...],
+  "connections": {...},
+  "active": false,
+  "settings": {"executionOrder": "v1"},
+  "tags": []
+}
+```
+Nodes have: `id` (UUID), `name`, `type` (e.g. n8n-nodes-base.httpRequest), `typeVersion`, `position` ([x, y]), `parameters`.
+Connections map node outputs to node inputs by node name.
+All workflows imported as `active: false` — never activate automatically in sandbox.
+
+## Step-by-Step
+
+### Step 1 — Claim the automation
+```sql
+UPDATE automation_ideas
+SET builder_status = 'queued'
+WHERE id = <selected_id> AND builder_status = 'not_started';
+```
+If 0 rows updated: another builder claimed it — stop, notify, exit.
+
+### Step 2 — Fetch sandbox N8N API key from Vault
+Use sandbox AppRole to read secret/sandbox/n8n. Extract `api_key` and `base_url`.
+Never log the key value. Pass it in memory only.
+
+### Step 3 — Discover available N8N credentials
+Before generating, query the sandbox N8N for existing credentials so the generated workflow references them by name:
+```
+GET {base_url}/api/v1/credentials
+X-N8N-API-KEY: {api_key}
+```
+Extract credential names and types. Pass this list to the Ollama prompt so the generated workflow uses real credential names.
+
+### Step 4 — Build the prompt for Ollama
+```
+You are an expert N8N workflow engineer. Generate a valid N8N workflow JSON for the following automation.
+
+Name: {name}
+Infrastructure available: {infrastructure_requirement}
+Available N8N credentials: {credential_names_and_types}
+
+Specification:
+{task_description}
+
+Requirements:
+- Output ONLY valid N8N workflow JSON. No explanation, no markdown fences, no commentary.
+- The workflow must be importable via the N8N API without modification.
+- Set active: false.
+- Reference credentials by the exact names listed above — do not invent credential names.
+- Use realistic node positions (spread nodes 200px apart on x-axis starting at x=250).
+- Every node must have a unique UUID for its id field.
+- The workflow must fully implement the specification — do not stub or placeholder any steps.
+```
+
+### Step 5 — Generate with Ollama
+```
+POST http://localhost:11434/api/generate
+{
+  "model": "llama3.1:8b",
+  "prompt": "<constructed prompt>",
+  "stream": false
+}
+```
+Set builder_status = 'building' before calling.
+
+Extract JSON from response — strip any surrounding text if Ollama adds it.
+Validate it parses as JSON before proceeding. If invalid JSON: set builder_status = 'failed', log error, notify, stop.
+
+If Ollama call fails or times out (>120s): set builder_status = 'failed', log error, notify, stop.
+
+### Step 6 — Overseer validation with claude -p
+Pass the generated JSON to `claude -p` for structural review. Keep prompt minimal to conserve SDK credits:
+
+```bash
+claude -p "Review this N8N workflow JSON for the following only:
+1. Is it valid N8N workflow JSON with required fields (name, nodes, connections, active, settings)?
+2. Do all nodes have id, name, type, typeVersion, position, parameters?
+3. Do connections reference node names that exist in the nodes array?
+4. Does the workflow logic match this spec: {name} — {task_description[:200]}
+
+Respond with: PASS or FAIL, then one sentence explaining why.
+Do not rewrite the workflow."
+```
+
+If FAIL: log claude's reason, set builder_status = 'failed', notify via NTFY with reason, stop.
+If PASS: proceed.
+
+### Step 7 — Import to sandbox N8N
+```
+POST {base_url}/api/v1/workflows
+X-N8N-API-KEY: {api_key}
+Content-Type: application/json
+Body: {generated workflow JSON}
+```
+
+On success: capture the returned workflow `id` from N8N. Store in notes or a temp variable.
+On failure (non-2xx): set builder_status = 'failed', log the N8N error response, notify, stop.
+
+### Step 8 — Assign credentials
+For each node in the workflow that references a credential:
+```
+GET {base_url}/api/v1/workflows/{workflow_id}
+```
+Verify credential references resolved correctly. If any credential reference is broken (credential name not found), attempt to match by type — if unresolvable, set builder_status = 'failed', notify user with list of missing credentials, stop.
+
+### Step 9 — Run 4-level automated tests
+Run each level in order. Stop and fail if any level fails. Log every result to `agent_test_results`.
+
+**Level 1 — Structure**
+Validate the imported workflow via the N8N API:
+- `GET {base_url}/api/v1/workflows/{workflow_id}` returns 200
+- Response contains correct node count
+- All required fields present
+- Insert result to agent_test_results (test_level=1)
+
+**Level 2 — Deployment**
+- Verify workflow exists in sandbox N8N and is not active
+- Verify all credential references are valid (no broken credential links)
+- Insert result to agent_test_results (test_level=2)
+
+**Level 3 — Smoke**
+- Trigger a manual execution via N8N API:
+  ```
+  POST {base_url}/api/v1/workflows/{workflow_id}/run
+  ```
+- Poll execution status until complete or timeout (60s)
+- Must reach status 'success' or 'waiting' (not 'error' or 'crashed')
+- Insert result to agent_test_results (test_level=3)
+
+**Level 4 — Assertion**
+- Verify the correct side effect occurred based on what the workflow is supposed to do
+- Check system state, not output strings: DB row written, API called, file created, webhook fired, etc.
+- The specific assertion depends on the automation — derive it from task_description
+- Insert result to agent_test_results (test_level=4)
+
+### Step 10 — Notify user for promotion approval
+If all 4 levels pass:
+1. Set builder_status = 'awaiting_approval'
+2. Send NTFY notification:
+   ```
+   Title: N8N Workflow Ready for Promotion — {name}
+   Body: All 4 test levels passed in sandbox. Automation id={id} (n8n_automation) is ready for production promotion. Sandbox workflow id={n8n_workflow_id}. Reply to approve or reject.
+   ```
+
+User must explicitly approve before production import. No auto-promotion in v1.
+
+### Step 11 — On approval
+1. Import the same workflow JSON to production N8N (port 5678)
+2. Assign production credentials (different credential names from sandbox)
+3. Set builder_status = 'deployed'
+4. Update automation_ideas status = 'deployed'
+
+## Error handling
+- Any unhandled exception: set builder_status = 'failed', log to agent_test_results (test_level=0, status='fail'), send NTFY alert
+- Always release the claim (reset to 'not_started') if failing before Step 5 so another run can retry
+- After Step 5: leave as 'failed' — requires manual review before retry
+- If workflow was imported before failure: delete it from sandbox N8N to keep sandbox clean
+  ```
+  DELETE {base_url}/api/v1/workflows/{workflow_id}
+  ```
+
+## NTFY notification patterns
+- Build started: `[N8N Builder] Building {name} (id={id})`
+- Overseer FAIL: `[N8N Builder] FAIL — Overseer rejected {name}: {reason}`
+- Import FAIL: `[N8N Builder] FAIL — {name} failed N8N import: {error}`
+- Missing credentials: `[N8N Builder] BLOCKED — {name} needs credentials: {list}`
+- Test level fail: `[N8N Builder] FAIL — {name} failed Level {n}: {error}`
+- Ready for approval: `[N8N Builder] READY — {name} passed all tests, awaiting your approval`
+- Unhandled error: `[N8N Builder] ERROR — {name}: {exception}`
+
+## SDK credit budget
+`claude -p` is called once per automation (Step 6 only). Keep the overseer prompt under 500 tokens. Do not call claude -p for retries or debugging — only for the initial validation pass.
+
+## N8N credential naming convention
+Sandbox credentials must be named with a `-sandbox` suffix to distinguish from production:
+- `postgres-sandbox` (not `postgres`)
+- `vault-sandbox` (not `vault`)
+- `n8n-internal-sandbox` (not `n8n-internal`)
+
+This prevents the N8N Builder Agent from accidentally referencing production credentials when building sandbox workflows.
diff --git a/agent-builder/.claude/playbook_testing_methodology.md b/agent-builder/.claude/playbook_testing_methodology.md
new file mode 100644
index 0000000..03cec99
--- /dev/null
+++ b/agent-builder/.claude/playbook_testing_methodology.md
@@ -0,0 +1,156 @@
+---
+name: Agent & Automation Testing Methodology
+description: Mandatory testing methodology for all built automations before production promotion — covers N8N automations, claude agents, and scripts; self-evolving document updated after every test session
+type: project
+version: 1.0
+---
+
+# Playbook: Agent & Automation Testing Methodology
+
+**Self-evolution rule:** After every test session, update this playbook — add new known failure modes, refine assertion patterns, increment the version number. The methodology improves every time something breaks in a new way.
+
+**Applies to:** All automations in the `automation_ideas` table with types: `n8n_automation`, `claude_agent`, `script`
+
+---
+
+## Before You Test — Required Reading Gate
+
+| Task type | Read first |
+|---|---|
+| Testing any automation | Sandbox isolation rule · Four test levels · Type-specific section |
+| Promoting to production | Promotion gate checklist |
+| Adding a new failure mode | Known failure modes section + update rule |
+
+**Sandbox isolation rule (HARD):** All testing happens in sandbox (server-01, 192.168.1.90). Sandbox Vault, Postgres, and N8N are test-only. No production credentials, no production data. See `feedback_sandbox_isolation.md`.
+
+---
+
+## The Four Test Levels
+
+Every automation must pass all four levels before promotion. Run in order — stop and log failure at the first level that fails.
+
+### Level 1 — Structure Test
+Does the built artifact have valid structure?
+
+| Type | Check |
+|---|---|
+| `n8n_automation` | Workflow JSON is valid JSON; contains `nodes`, `connections`, `settings` keys; all node types exist in sandbox N8N |
+| `claude_agent` | The `claude -p` call string is syntactically valid; prompt references correct tools/paths; output schema is defined |
+| `script` | Python syntax check passes (`python3 -m py_compile script.py`); all imports are available in the target container image |
+
+**Pass criteria:** No structural errors. **Fail action:** Log to `agent_test_results`, NTFY user, do NOT proceed to Level 2.
+
+---
+
+### Level 2 — Deployment Test
+Does it deploy to sandbox without errors?
+
+| Type | Check |
+|---|---|
+| `n8n_automation` | `POST /api/v1/workflows` succeeds; workflow appears in sandbox N8N UI; all credentials are assigned (no empty credential IDs) |
+| `claude_agent` | Container builds and starts; `docker ps` shows healthy; `claude -p "echo ok"` returns without error from within the agent's execution context |
+| `script` | Docker image builds; container starts; first log line appears within 30 seconds; exit code is 0 for one-shot scripts or container stays running for daemon scripts |
+
+**Pass criteria:** No deployment errors, artifact is reachable. **Fail action:** Log to `agent_test_results`, NTFY user, tear down partial deployment in sandbox.
+
+---
+
+### Level 3 — Smoke Test
+Does it execute without crashing on minimal input?
+
+| Type | Check |
+|---|---|
+| `n8n_automation` | Trigger one manual execution via N8N API (`POST /api/v1/workflows/{id}/run`); execution completes with status `success` (not `error` or `crashed`) |
+| `claude_agent` | Run agent with a trivial, safe test input defined in the `task_description`; agent completes without exception; output is non-empty |
+| `script` | Run with `--dry-run` flag if supported, or with a clearly safe test input; exits 0; no unhandled exceptions in logs |
+
+**Pass criteria:** Execution completes, no crashes, no unhandled exceptions. **Fail action:** Capture full execution log, log to `agent_test_results`, NTFY user with error excerpt.
+
+---
+
+### Level 4 — Assertion Test
+Does it produce the correct side effects?
+
+This is the type-specific level. For each automation, the `task_description` must include at least one verifiable assertion. The builder agents are responsible for generating these assertions at build time.
+
+| Type | Assertion patterns |
+|---|---|
+| `n8n_automation` | DB row was written/updated · NTFY notification received · HTTP response status was 200 · File was created at expected path |
+| `claude_agent` | Output JSON contains required fields · Built artifact exists and passes Level 1 structure check of the artifact it built · Side-effect DB row exists |
+| `script` | Expected output file exists · DB was updated · Expected log line present |
+
+**LLM output validation rule (claude_agent):** Never assert exact string match on LLM output — outputs are non-deterministic. Assert on: JSON schema validity, presence of required keys, value types, side effects produced.
+
+**Pass criteria:** All assertions defined in `task_description` pass. **Fail action:** Log which assertions failed, NTFY user with details.
+
+---
+
+## Promotion Gate
+
+When all four levels pass, the following checklist must be completed before the automation goes to production.
+
+- [ ] All 4 test levels logged as `pass` in `agent_test_results`
+- [ ] NTFY notification sent to user with test summary
+- [ ] **User reviews and approves** (NTFY → user replies or confirms in next session)
+- [ ] For `n8n_automation`: all sandbox credentials re-pointed to production equivalents (see `project_sandbox_workflow_credential_rule.md`)
+- [ ] For `claude_agent`: production paths/URLs substituted for sandbox paths
+- [ ] For `script`: production env vars set in Coolify; no hardcoded sandbox values
+- [ ] Production deployment verified (Level 2 re-run against production)
+- [ ] `automation_ideas` status updated to `deployed`
+- [ ] `agent_test_results` promotion record written
+
+**Promotion is not automatic.** User approval is required after Level 4 pass. This is the v1.0 rule — can be relaxed to auto-promote for specific low-risk automation types after track record is established.
+
+---
+
+## Test Result Storage
+
+All test results write to `agent_test_results` table (to be created in `api_business` DB).
+
+**Required schema:**
+```sql
+CREATE TABLE agent_test_results (
+    id              SERIAL PRIMARY KEY,
+    automation_id   INTEGER NOT NULL REFERENCES automation_ideas(id),
+    test_level      INTEGER NOT NULL CHECK (test_level BETWEEN 1 AND 4),
+    status          TEXT NOT NULL CHECK (status IN ('pass', 'fail', 'skip')),
+    error_message   TEXT,
+    execution_log   TEXT,
+    tested_at       TIMESTAMPTZ NOT NULL DEFAULT now(),
+    promoted_at     TIMESTAMPTZ,
+    notes           TEXT
+);
+```
+
+---
+
+## NTFY Notification Patterns
+
+| Event | Topic | Message format |
+|---|---|---|
+| Level fail | `homelab-alerts` | `[AGENT TEST FAIL] {name} — Level {N}: {error excerpt}` |
+| All levels pass | `homelab-alerts` | `[AGENT TEST PASS] {name} — ready for your review and promotion` |
+| Promotion complete | `homelab-alerts` | `[AGENT DEPLOYED] {name} — now live in production` |
+
+---
+
+## Known Failure Modes
+
+*(Updated as new failures are discovered during testing)*
+
+| ID | Type | Failure | Root cause | Fix |
+|---|---|---|---|---|
+| — | — | None yet — first test session will populate this | — | — |
+
+---
+
+## Self-Evolution Instructions
+
+After every test session:
+1. Add any new failure mode to the Known Failure Modes table with ID, type, root cause, and fix
+2. If a Level assertion was too loose (passed but shouldn't have) or too strict (failed but should have passed), update the assertion pattern for that level and type
+3. Increment the version number in the frontmatter
+4. Note the date and what changed at the bottom of this file
+
+**Change log:**
+- v1.0 (2026-06-16): Initial methodology — four levels, user-approval promotion gate, NTFY notifications, self-evolution rule