init: add claude-config and agent-builder context files

Initial commit tracking session context, playbooks, and automation specs for claude-config and agent-builder Claude Code conversations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-17 23:08:23 -05:00
commit db6cbbdec1
7 changed files with 732 additions and 0 deletions
@@ -0,0 +1,156 @@
+---
+name: Agent & Automation Testing Methodology
+description: Mandatory testing methodology for all built automations before production promotion — covers N8N automations, claude agents, and scripts; self-evolving document updated after every test session
+type: project
+version: 1.0
+---
+
+# Playbook: Agent & Automation Testing Methodology
+
+**Self-evolution rule:** After every test session, update this playbook — add new known failure modes, refine assertion patterns, increment the version number. The methodology improves every time something breaks in a new way.
+
+**Applies to:** All automations in the `automation_ideas` table with types: `n8n_automation`, `claude_agent`, `script`
+
+---
+
+## Before You Test — Required Reading Gate
+
+| Task type | Read first |
+|---|---|
+| Testing any automation | Sandbox isolation rule · Four test levels · Type-specific section |
+| Promoting to production | Promotion gate checklist |
+| Adding a new failure mode | Known failure modes section + update rule |
+
+**Sandbox isolation rule (HARD):** All testing happens in sandbox (server-01, 192.168.1.90). Sandbox Vault, Postgres, and N8N are test-only. No production credentials, no production data. See `feedback_sandbox_isolation.md`.
+
+---
+
+## The Four Test Levels
+
+Every automation must pass all four levels before promotion. Run in order — stop and log failure at the first level that fails.
+
+### Level 1 — Structure Test
+Does the built artifact have valid structure?
+
+| Type | Check |
+|---|---|
+| `n8n_automation` | Workflow JSON is valid JSON; contains `nodes`, `connections`, `settings` keys; all node types exist in sandbox N8N |
+| `claude_agent` | The `claude -p` call string is syntactically valid; prompt references correct tools/paths; output schema is defined |
+| `script` | Python syntax check passes (`python3 -m py_compile script.py`); all imports are available in the target container image |
+
+**Pass criteria:** No structural errors. **Fail action:** Log to `agent_test_results`, NTFY user, do NOT proceed to Level 2.
+
+---
+
+### Level 2 — Deployment Test
+Does it deploy to sandbox without errors?
+
+| Type | Check |
+|---|---|
+| `n8n_automation` | `POST /api/v1/workflows` succeeds; workflow appears in sandbox N8N UI; all credentials are assigned (no empty credential IDs) |
+| `claude_agent` | Container builds and starts; `docker ps` shows healthy; `claude -p "echo ok"` returns without error from within the agent's execution context |
+| `script` | Docker image builds; container starts; first log line appears within 30 seconds; exit code is 0 for one-shot scripts or container stays running for daemon scripts |
+
+**Pass criteria:** No deployment errors, artifact is reachable. **Fail action:** Log to `agent_test_results`, NTFY user, tear down partial deployment in sandbox.
+
+---
+
+### Level 3 — Smoke Test
+Does it execute without crashing on minimal input?
+
+| Type | Check |
+|---|---|
+| `n8n_automation` | Trigger one manual execution via N8N API (`POST /api/v1/workflows/{id}/run`); execution completes with status `success` (not `error` or `crashed`) |
+| `claude_agent` | Run agent with a trivial, safe test input defined in the `task_description`; agent completes without exception; output is non-empty |
+| `script` | Run with `--dry-run` flag if supported, or with a clearly safe test input; exits 0; no unhandled exceptions in logs |
+
+**Pass criteria:** Execution completes, no crashes, no unhandled exceptions. **Fail action:** Capture full execution log, log to `agent_test_results`, NTFY user with error excerpt.
+
+---
+
+### Level 4 — Assertion Test
+Does it produce the correct side effects?
+
+This is the type-specific level. For each automation, the `task_description` must include at least one verifiable assertion. The builder agents are responsible for generating these assertions at build time.
+
+| Type | Assertion patterns |
+|---|---|
+| `n8n_automation` | DB row was written/updated · NTFY notification received · HTTP response status was 200 · File was created at expected path |
+| `claude_agent` | Output JSON contains required fields · Built artifact exists and passes Level 1 structure check of the artifact it built · Side-effect DB row exists |
+| `script` | Expected output file exists · DB was updated · Expected log line present |
+
+**LLM output validation rule (claude_agent):** Never assert exact string match on LLM output — outputs are non-deterministic. Assert on: JSON schema validity, presence of required keys, value types, side effects produced.
+
+**Pass criteria:** All assertions defined in `task_description` pass. **Fail action:** Log which assertions failed, NTFY user with details.
+
+---
+
+## Promotion Gate
+
+When all four levels pass, the following checklist must be completed before the automation goes to production.
+
+- [ ] All 4 test levels logged as `pass` in `agent_test_results`
+- [ ] NTFY notification sent to user with test summary
+- [ ] **User reviews and approves** (NTFY → user replies or confirms in next session)
+- [ ] For `n8n_automation`: all sandbox credentials re-pointed to production equivalents (see `project_sandbox_workflow_credential_rule.md`)
+- [ ] For `claude_agent`: production paths/URLs substituted for sandbox paths
+- [ ] For `script`: production env vars set in Coolify; no hardcoded sandbox values
+- [ ] Production deployment verified (Level 2 re-run against production)
+- [ ] `automation_ideas` status updated to `deployed`
+- [ ] `agent_test_results` promotion record written
+
+**Promotion is not automatic.** User approval is required after Level 4 pass. This is the v1.0 rule — can be relaxed to auto-promote for specific low-risk automation types after track record is established.
+
+---
+
+## Test Result Storage
+
+All test results write to `agent_test_results` table (to be created in `api_business` DB).
+
+**Required schema:**
+```sql
+CREATE TABLE agent_test_results (
+    id              SERIAL PRIMARY KEY,
+    automation_id   INTEGER NOT NULL REFERENCES automation_ideas(id),
+    test_level      INTEGER NOT NULL CHECK (test_level BETWEEN 1 AND 4),
+    status          TEXT NOT NULL CHECK (status IN ('pass', 'fail', 'skip')),
+    error_message   TEXT,
+    execution_log   TEXT,
+    tested_at       TIMESTAMPTZ NOT NULL DEFAULT now(),
+    promoted_at     TIMESTAMPTZ,
+    notes           TEXT
+);
+```
+
+---
+
+## NTFY Notification Patterns
+
+| Event | Topic | Message format |
+|---|---|---|
+| Level fail | `homelab-alerts` | `[AGENT TEST FAIL] {name} — Level {N}: {error excerpt}` |
+| All levels pass | `homelab-alerts` | `[AGENT TEST PASS] {name} — ready for your review and promotion` |
+| Promotion complete | `homelab-alerts` | `[AGENT DEPLOYED] {name} — now live in production` |
+
+---
+
+## Known Failure Modes
+
+*(Updated as new failures are discovered during testing)*
+
+| ID | Type | Failure | Root cause | Fix |
+|---|---|---|---|---|
+| — | — | None yet — first test session will populate this | — | — |
+
+---
+
+## Self-Evolution Instructions
+
+After every test session:
+1. Add any new failure mode to the Known Failure Modes table with ID, type, root cause, and fix
+2. If a Level assertion was too loose (passed but shouldn't have) or too strict (failed but should have passed), update the assertion pattern for that level and type
+3. Increment the version number in the frontmatter
+4. Note the date and what changed at the bottom of this file
+
+**Change log:**
+- v1.0 (2026-06-16): Initial methodology — four levels, user-approval promotion gate, NTFY notifications, self-evolution rule