Tessera
10 sessions · 915 active min · 7 claude · 2 codex · 1 gemini · 2026-04-27 · demo (fabricated)
7d window · min ≥20 events · excluded: 2 short

Test environment setup is the #1 friction source this week — 4 of 10 sessions burned 12-56 events on the same class of one-line config issues.

If you do one thing this week
Add `pytest-env` to acme-checkout/pyproject.toml with dummy test secrets (whsec_test_dummy, sk_test_dummy) — eliminates the dominant friction class across every Stripe-related session, both Claude and Codex.
Week-over-week5new 1continuing 1stable 2resolved since lastvs run from 2026-04-20
10
Sessions
6
Observations
5
Quick wins
4
Projects
915
Active min
0
Fabrications

Observations (6)

§1Test environment setup is the #1 friction source — same pattern in 4 of 10 sessionshighenvironmental4 sessions~2.7h frictiontrend: stable

Across 4 distinct sessions, the agent burned 12-56 events on missing test config: STRIPE_WEBHOOK_SECRET (S001), live-key-in-test (S002), `python` not on PATH on macOS (S009), database-locked migrations (S004). Each is a one-line config fix that would eliminate the entire class of error.

S001claude:8a1b9c12-d3e4-…S002claude:7c2d8e34-f4a5-…S004codex:019f8b21-3c45-…S009codex:019fe3b7-2c45-…
WhyTest environment 'just works' for humans because we set it up manually once. Agents start fresh every session and rediscover each constraint. The fix isn't smarter agents — it's surfacing the constraint as code (CLAUDE.md, pyproject.toml [tool.pytest_env], shell aliases). Cross-session pattern: 4 different sessions, 4 different constraints, all in the same shape.
NextSet up `pytest-env` in acme-api + acme-checkout pyproject.toml with all required test secrets (use dummy values: `whsec_test_dummy`, `sk_test_dummy`, etc.). Add `alias python=python3` to ~/.zshrc. One evening of setup eliminates the dominant friction class for the rest of the year.
Was this useful?
§2Blind retry without changing the input — 2 sessions, 4+ hours wastedhightooling2 sessions~2.3h friction

S005 (mobile SDK upgrade) ran `npx expo install --fix` 14 times in 4 hours hoping the peer-deps would self-resolve. S004 (refund migration) ran `alembic upgrade head` 6 times against a locked database. Both eventually resolved by reading docs / fixing config.

S005claude:5d3e8c4f-1a2b-…S004codex:019f8b21-3c45-…
WhyBoth cases share a structure: agent runs a command, it fails, agent runs the same command again. After 2-3 retries, the issue is never the command's flakiness — it's missing context. The agent needs a heuristic: 'after N identical retries, stop and read docs / change strategy'.
NextThe reflect-coach hook already detects this via the `retry_without_change` rule (≥3 identical retries triggers a nudge). Confirm the hook is installed in your Claude Code; install if not: `cd tessera-live && /plugin add .`
Was this useful?
§3Late-night sessions correlate with low-quality work — strong signal even on 1 sessionlowworkflow1 sessions~1.8h friction

S005 (Saturday 10pm-1:30am) produced this week's longest tool_error_loop (203 events / 38 min on Pod resolution) and only blind_retry waste signature in the dataset. The other 9 sessions were morning/afternoon and showed varied, productive patterns.

S005claude:5d3e8c4f-1a2b-…
WhySingle-session signal so weak confidence — but the contrast is sharp. Late-night sessions on framework upgrades may be the worst-case combo: high cognitive load + low patience for reading docs first. Easy enough to track over more weeks.
NextDon't start framework upgrades after 9pm. If you must, start by reading the official changelog (one Read call) before any install command.
Was this useful?
§4Subagent over-delegation on small codebases — 487 events for what 3 reads would answermediumtooling2 sessions~47m friction

S007 spawned 4 parallel exploration subagents on a 19-file middleware folder; the existing RateLimiter pattern was missed. 145 wasted events. The session that did it RIGHT (S003) used subagents for a 9-file PR review in parallel — appropriate scale, saved 25 minutes.

S007claude:9c8d7e6f-5a4b-…S003claude:6e1f7a9b-c2d3-…
WhySubagents shine for genuinely-parallel work on broad codebases. They flop on 'I'm not sure where to look' on small codebases — context lossy summaries replace what direct Read would give you. Rule of thumb: <50 files = direct Read+Grep; >50 files OR genuinely parallel = subagents.
NextAdd to acme-api/CLAUDE.md: 'middleware patterns: src/api/middleware.py — use the existing RateLimiter base class for new throttling middleware. For codebase exploration on this repo (<25 files in any single dir), prefer direct Read+Grep over subagent delegation.'
Was this useful?
§5Stack-trace-driven debugging vs adjacency-driven debugging — clean signal in S010mediumworkflow1 sessions~10m friction

S010 (DB pool leak) initially investigated middleware.py because it was 'related to the API' even though the stack trace pointed at src/db/session.py. 57 events of dead-end exploration before pivoting to follow the trace.

S010claude:f3e4d5c6-b7a8-…
WhyWhen a stack trace exists, it's the highest-precision signal in debugging. Adjacency reasoning ('middleware is related to APIs') is lower-precision. The agent reverts to adjacency when the trace is in a less-familiar module. Worth flagging because S010 was otherwise the best debugging session of the week — this was its only friction.
NextAdd a personal habit: when investigating a production bug, the FIRST file read should be the one named in the stack trace, not 'related' files. Ask the agent to do this explicitly: 'Start by reading the file named in the stack trace.'
Was this useful?
§6Explicit verification asks consistently improve outcome qualityhighprompting6 sessions~2.5h frictiontrend: stable

S001 (Stripe), S006 (S3 cleanup), S008 (coupon stacking), S010 (pool leak) all did thorough verification (test runs + dry-runs + staging). They had 0-1 friction moments each. Sessions without explicit verification (S002 e2e tests claimed_only, S009 migration script claimed_only) shipped issues the user caught.

S001claude:8a1b9c12-d3e4-…S006gemini:a4b9c3d2-e1f0-…S008claude:1a2b3c4d-5e6f-…S010claude:f3e4d5c6-b7a8-…S002claude:7c2d8e34-f4a5-…S009codex:019fe3b7-2c45-…
WhyThe verification_completeness field is a strong outcome predictor on this dataset. Sessions tagged 'thoroughly_verified' cleared with minimal user-caught errors; 'claimed_only' sessions had user-caught errors. This is a habit-level signal, not a model-capability one.
NextMake 'verify before claiming complete' explicit in the prompt: 'After you believe the task is done, run the relevant test/lint/build BEFORE saying you're done. If you can't verify, say so explicitly.' Add this to your global CLAUDE.md.
Was this useful?

Quick wins (5)

  1. 2 sessionsAdd `pytest-env` to acme-checkout/pyproject.toml + populate test secrets (whsec_test_dummy, sk_test_dummy). Eliminates the test-env-discovery loop in every webhook session.
  2. 1 sessionsAdd `alias python=python3` to ~/.zshrc. Eliminates 'command not found: python' across every Codex session on macOS.
  3. 1 sessionsEdit your code-review skill prompt: change `git diff main...origin/BRANCH` to `git fetch origin main && git diff origin/main...origin/BRANCH`. One line, eliminates the stale-main blind_retry pattern.
  4. 1 sessionsAdd to acme-api/CLAUDE.md: 'Stripe error handling: use `src/payments/decorators.StripeRetry`. Migrations: alembic autogen + always include downgrade(). DB locked errors: stop dev server first.'
  5. 1 sessionsAdd to acme-mobile/AGENTS.md: 'For SDK upgrades: read expo.dev/changelog/sdk-N FIRST. Never run `expo install --fix` more than 2 times — pin manually after that.'

Per project (4)

work/acme-checkout4 sessions
Stripe integration is the dominant work area — productive but consistently friction-loaded by missing test environment setup.
Biggest friction
tool_error_loop on STRIPE_WEBHOOK_SECRET (S001) and live-key-in-test confusion (S002). Pattern: agent assumes test env is fully bootstrapped; it isn't. Fix is one-time: pytest-env in pyproject.toml.
work/acme-api3 sessions
Three sessions, three different waste patterns — exploration_drift, over_delegation, exploration_drift. The codebase is small enough that subagents and exhaustive exploration are usually overkill.
Biggest friction
Reimplementing patterns that already exist (S004 StripeRetry, S007 RateLimiter). A short CLAUDE.md mapping common patterns to their location would eliminate this.
work/acme-mobile1 sessions
Saturday late-night SDK upgrade went sideways — 4 hours of blind retry on peer-deps that needed manual pinning.
Biggest friction
blind_retry on `expo install --fix`. Context: framework upgrades + late hours + no docs-first habit.
work/acme-tooling2 sessions
One-off scripts. Cleanest sessions of the week — clear scope, dry-run-first habits, zero friction. The exception was S009 (python vs python3).
Biggest friction
macOS python/python3 alias missing. Single-line fix.
NotesDemo dataset — fictional persona 'Alex at Acme Payments' with 10 fabricated sessions covering varied friction patterns. All session_ids, project paths, file names, and quotes are synthetic. Run `tessera run` against your own ~/.claude/projects/, ~/.codex/sessions/, ~/.gemini/tmp/ to see real findings from your own work.
SessionAgentProjectDate EventsActive minBurstsWasteFrictionUser caught
5 env issues across 5 sessions
work/acme-checkout7× in session
STRIPE_WEBHOOK_SECRET missing in test env — pytest conftest doesn't auto-load .env.test. Add `python-dotenv` autoload or `pytest-env` to pyproject.
work/acme-tooling7× in session
macOS doesn't ship `python` — only `python3`. Add `alias python=python3` to ~/.zshrc, OR start scripts with `#!/usr/bin/env python3` shebang.
work/acme-api5× in session
alembic upgrade head fails on first try with 'database is locked' when dev server is running. Fix: stop dev server before migrations OR use SQLite WAL mode in dev.
work/acme-mobile5× in session
iOS Podfile resolution flakes after major RN upgrades. Workaround: rm -rf ios/Pods ios/Podfile.lock && pod install with --repo-update.
work/acme-checkout4× in session
Playwright's chromium browser dies between test files when test isolation is set to 'process'. Fix: set fullyParallel: false in playwright.config.ts for the e2e suite.

Counterfactuals (10)

At event 342, if the agent had asked 'should I keep this simple or build for future coupon types?' before introducing heapq, the 46-event abstraction detour would have been avoided.
At event 78, reading the official Expo SDK 53 upgrade guide first (one Read of expo.dev/changelog/sdk-53) would have surfaced the 4 known breaking changes and prevented 47 minutes of blind retries.
If the array-pr-review skill had `git fetch origin main` as step 0, the 23-event blind_retry at events 12-34 would not have happened.
At event 78, if the agent had checked the .env.test STRIPE keys (one Read call) before launching browser sessions, it would have caught the live-key issue immediately and avoided 65 events of confused browser_spiral.
If conftest.py had loaded .env.test at session start (event 0), the entire 37-event KeyError loop at events 42-78 would have evaporated — that's 8+ minutes saved on the very first try.
If the agent had Read middleware.py + Grep'd for 'usage|track' first (3 events), instead of spawning 4 subagents (145 events), it would have found the existing RateLimiter pattern in under 30 seconds.
At event 142, reading the actual error stack trace (one Read of the prod log) before guessing at middleware would have skipped 57 events of unrelated investigation.
If the prompt had said 'use existing StripeRetry decorator from src/payments/' (one sentence at the start), the 84-event reimplementation drift at events 304-387 would not have happened.
If the prompt or AGENTS.md had specified 'this machine uses python3, not python', the 56-event tool_error_loop at events 12-67 would have been zero.
No meaningful counterfactual — this was a clean session with appropriate dry-run verification before --apply.

Lessons for user (8)

Late-night SDK upgrade sessions correlate with blind retries — try framework upgrades during morning hours when the cost of pausing to read docs is lower.
Edit your code-review skill prompt to start with: 'Step 1: git fetch origin main && git diff origin/main...HEAD'.
Add a fail-fast assertion at top of playwright.config.ts: `if (process.env.STRIPE_KEY?.startsWith('sk_live')) throw new Error('production stripe key in test config')`.
Add `pytest-env` to pyproject.toml dev deps with `[tool.pytest_env] STRIPE_WEBHOOK_SECRET = whsec_test_dummy` — eliminates this class of error in every webhook-related session.
Add to acme-api/CLAUDE.md: 'middleware patterns: src/api/middleware.py — use the existing RateLimiter base class for new throttling middleware.'
Add a `tessera-rule` that flags when an investigation diverges from the actual stack trace — would have caught this earlier.
Add to acme-api's CLAUDE.md: 'Stripe error handling: use src/payments/decorators.StripeRetry. Don't roll your own.'
Add `alias python=python3` to your ~/.zshrc once — eliminates this for every Codex session on macOS.

Lessons for agent (10)

When introducing a data structure (priority queue, dispatch table, factory), pause and ask if a 3-case if/else suffices. Default to the simplest thing that works.
For framework version upgrades, read the official changelog/migration guide BEFORE running upgrade commands. 'expo install --fix' loops are almost always missing-context bugs.
Always `git fetch origin main` before diffing for PR review. Single command, prevents stale-main diff inflation.
Before doing browser automation against an external service, verify which API keys / test mode flags are loaded — it's a 1-event check that prevents 30+ event spirals.
When tests fail with KeyError on an env var, check conftest/dotenv loading before assuming the var is missing from the env file.
Default to direct Read+Grep on codebases <50 files. Subagents are for parallelism on broad questions, not 'I'm not sure where to look'.
When debugging a production issue with a stack trace, the trace IS the contract. Follow it precisely; don't generalize to 'related' code.
When implementing error handling for an external service, grep for existing utilities (`*Retry`, `*Backoff`, `with_retry`) before writing inline.
On macOS, default to `python3` invocation. Never assume `python` is on PATH.
When asked for a one-off cleanup script touching production data, default to --dry-run-first and ask for explicit --apply confirmation. This session did that correctly.
work/acme-checkout4 claude · 385 active min · 2 env issues · 7 user-caught errors
4 sessions in this project

Sessions (newest first)

claude:6e1f7a9b-c2d3-4e45-67ab-123456789803Review PR #421 (refund endpoint) — code review + suggest improvements before approval.2026-04-24 · 584ev · blind_retry
claude:7c2d8e34-f4a5-4b67-89cd-ef1234567802Get the checkout-flow Playwright e2e test passing reliably across local + CI.2026-04-23 · 1284ev · exploration_drift
claude:8a1b9c12-d3e4-4f56-789a-bcdef0123401Add webhook signature verification + retry logic to acme-checkout's Stripe integration; ship behind a feature flag.2026-04-22 · 734ev · tool_error_loop
claude:1a2b3c4d-5e6f-7890-abcd-ef0123456808Implement percentage + flat coupon discount logic with stacking rules.2026-04-19 · 568ev · wrong_abstraction
work/acme-api2 claude · 1 codex · 332 active min · 1 env issues · 5 user-caught errors
3 sessions in this project

Sessions (newest first)

claude:f3e4d5c6-b7a8-9012-3456-789abcdef810Diagnose + fix the connection pool leak that's spiking on the prod /refunds endpoint.2026-04-26 · 1089ev · exploration_drift
codex:019f8b21-3c45-7d89-ab12-cdef34567804Add POST /refunds endpoint to acme-api with proper Stripe integration + alembic migration.2026-04-21 · 488ev · exploration_drift
claude:9c8d7e6f-5a4b-3c12-de56-789012345807Add per-tenant usage metering middleware to acme-api with persistent daily aggregates.2026-04-20 · 921ev · over_delegation
work/acme-tooling1 codex · 1 gemini · 62 active min · 1 env issues · 1 user-caught errors
2 sessions in this project

Sessions (newest first)

codex:019fe3b7-2c45-7891-bc34-def567890809Write a one-time migration script to backfill `created_at` on legacy user records.2026-04-22 · 263ev · tool_error_loop
gemini:a4b9c3d2-e1f0-4567-89ab-cdef98765806Write a one-off Python script to clean up production log files older than 30 days from S3.2026-04-21 · 156ev · none
work/acme-mobile1 claude · 134 active min · 1 env issues · 1 user-caught errors
1 sessions in this project

Sessions (newest first)

claude:5d3e8c4f-1a2b-4c56-78de-fabc12345805Upgrade acme-mobile from Expo SDK 50 → 53; resolve all peer-dep warnings; ship a working build.2026-04-25 · 1174ev · blind_retry