Tessera

Test environment setup is the #1 friction source this week — 4 of 10 sessions burned 12-56 events on the same class of one-line config issues.

If you do one thing this week

Add `pytest-env` to acme-checkout/pyproject.toml with dummy test secrets (whsec_test_dummy, sk_test_dummy) — eliminates the dominant friction class across every Stripe-related session, both Claude and Codex.

Week-over-week5new 1continuing 1stable 2resolved since lastvs run from 2026-04-20

Sessions

Observations

Quick wins

Projects

915

Active min

Fabrications

Observations (6)

§1Test environment setup is the #1 friction source — same pattern in 4 of 10 sessionshighenvironmental4 sessions~2.7h frictiontrend: stable

Across 4 distinct sessions, the agent burned 12-56 events on missing test config: STRIPE_WEBHOOK_SECRET (S001), live-key-in-test (S002), `python` not on PATH on macOS (S009), database-locked migrations (S004). Each is a one-line config fix that would eliminate the entire class of error.

Evidence (4 sessions — click to drill in)

S001claude:8a1b9c12-d3e4-…S002claude:7c2d8e34-f4a5-…S004codex:019f8b21-3c45-…S009codex:019fe3b7-2c45-…

WhyTest environment 'just works' for humans because we set it up manually once. Agents start fresh every session and rediscover each constraint. The fix isn't smarter agents — it's surfacing the constraint as code (CLAUDE.md, pyproject.toml [tool.pytest_env], shell aliases). Cross-session pattern: 4 different sessions, 4 different constraints, all in the same shape.

NextSet up `pytest-env` in acme-api + acme-checkout pyproject.toml with all required test secrets (use dummy values: `whsec_test_dummy`, `sk_test_dummy`, etc.). Add `alias python=python3` to ~/.zshrc. One evening of setup eliminates the dominant friction class for the rest of the year.

Was this useful?

§2Blind retry without changing the input — 2 sessions, 4+ hours wastedhightooling2 sessions~2.3h friction

S005 (mobile SDK upgrade) ran `npx expo install --fix` 14 times in 4 hours hoping the peer-deps would self-resolve. S004 (refund migration) ran `alembic upgrade head` 6 times against a locked database. Both eventually resolved by reading docs / fixing config.

Evidence (2 sessions — click to drill in)

S005claude:5d3e8c4f-1a2b-…S004codex:019f8b21-3c45-…

WhyBoth cases share a structure: agent runs a command, it fails, agent runs the same command again. After 2-3 retries, the issue is never the command's flakiness — it's missing context. The agent needs a heuristic: 'after N identical retries, stop and read docs / change strategy'.

NextThe reflect-coach hook already detects this via the `retry_without_change` rule (≥3 identical retries triggers a nudge). Confirm the hook is installed in your Claude Code; install if not: `cd tessera-live && /plugin add .`

Was this useful?

§3Late-night sessions correlate with low-quality work — strong signal even on 1 sessionlowworkflow1 sessions~1.8h friction

S005 (Saturday 10pm-1:30am) produced this week's longest tool_error_loop (203 events / 38 min on Pod resolution) and only blind_retry waste signature in the dataset. The other 9 sessions were morning/afternoon and showed varied, productive patterns.

Evidence (1 sessions — click to drill in)

S005claude:5d3e8c4f-1a2b-…

WhySingle-session signal so weak confidence — but the contrast is sharp. Late-night sessions on framework upgrades may be the worst-case combo: high cognitive load + low patience for reading docs first. Easy enough to track over more weeks.

NextDon't start framework upgrades after 9pm. If you must, start by reading the official changelog (one Read call) before any install command.

Was this useful?

§4Subagent over-delegation on small codebases — 487 events for what 3 reads would answermediumtooling2 sessions~47m friction

S007 spawned 4 parallel exploration subagents on a 19-file middleware folder; the existing RateLimiter pattern was missed. 145 wasted events. The session that did it RIGHT (S003) used subagents for a 9-file PR review in parallel — appropriate scale, saved 25 minutes.

Evidence (2 sessions — click to drill in)

S007claude:9c8d7e6f-5a4b-…S003claude:6e1f7a9b-c2d3-…

WhySubagents shine for genuinely-parallel work on broad codebases. They flop on 'I'm not sure where to look' on small codebases — context lossy summaries replace what direct Read would give you. Rule of thumb: <50 files = direct Read+Grep; >50 files OR genuinely parallel = subagents.

NextAdd to acme-api/CLAUDE.md: 'middleware patterns: src/api/middleware.py — use the existing RateLimiter base class for new throttling middleware. For codebase exploration on this repo (<25 files in any single dir), prefer direct Read+Grep over subagent delegation.'

Was this useful?

§5Stack-trace-driven debugging vs adjacency-driven debugging — clean signal in S010mediumworkflow1 sessions~10m friction

S010 (DB pool leak) initially investigated middleware.py because it was 'related to the API' even though the stack trace pointed at src/db/session.py. 57 events of dead-end exploration before pivoting to follow the trace.

Evidence (1 sessions — click to drill in)

S010claude:f3e4d5c6-b7a8-…

WhyWhen a stack trace exists, it's the highest-precision signal in debugging. Adjacency reasoning ('middleware is related to APIs') is lower-precision. The agent reverts to adjacency when the trace is in a less-familiar module. Worth flagging because S010 was otherwise the best debugging session of the week — this was its only friction.

NextAdd a personal habit: when investigating a production bug, the FIRST file read should be the one named in the stack trace, not 'related' files. Ask the agent to do this explicitly: 'Start by reading the file named in the stack trace.'

Was this useful?

§6Explicit verification asks consistently improve outcome qualityhighprompting6 sessions~2.5h frictiontrend: stable

S001 (Stripe), S006 (S3 cleanup), S008 (coupon stacking), S010 (pool leak) all did thorough verification (test runs + dry-runs + staging). They had 0-1 friction moments each. Sessions without explicit verification (S002 e2e tests claimed_only, S009 migration script claimed_only) shipped issues the user caught.

Evidence (6 sessions — click to drill in)

S001claude:8a1b9c12-d3e4-…S006gemini:a4b9c3d2-e1f0-…S008claude:1a2b3c4d-5e6f-…S010claude:f3e4d5c6-b7a8-…S002claude:7c2d8e34-f4a5-…S009codex:019fe3b7-2c45-…

WhyThe verification_completeness field is a strong outcome predictor on this dataset. Sessions tagged 'thoroughly_verified' cleared with minimal user-caught errors; 'claimed_only' sessions had user-caught errors. This is a habit-level signal, not a model-capability one.

NextMake 'verify before claiming complete' explicit in the prompt: 'After you believe the task is done, run the relevant test/lint/build BEFORE saying you're done. If you can't verify, say so explicitly.' Add this to your global CLAUDE.md.

Was this useful?

Quick wins (5)

2 sessionsAdd `pytest-env` to acme-checkout/pyproject.toml + populate test secrets (whsec_test_dummy, sk_test_dummy). Eliminates the test-env-discovery loop in every webhook session.
1 sessionsAdd `alias python=python3` to ~/.zshrc. Eliminates 'command not found: python' across every Codex session on macOS.
1 sessionsEdit your code-review skill prompt: change `git diff main...origin/BRANCH` to `git fetch origin main && git diff origin/main...origin/BRANCH`. One line, eliminates the stale-main blind_retry pattern.
1 sessionsAdd to acme-api/CLAUDE.md: 'Stripe error handling: use `src/payments/decorators.StripeRetry`. Migrations: alembic autogen + always include downgrade(). DB locked errors: stop dev server first.'
1 sessionsAdd to acme-mobile/AGENTS.md: 'For SDK upgrades: read expo.dev/changelog/sdk-N FIRST. Never run `expo install --fix` more than 2 times — pin manually after that.'

Per project (4)

work/acme-checkout4 sessions

Stripe integration is the dominant work area — productive but consistently friction-loaded by missing test environment setup.

Biggest friction

tool_error_loop on STRIPE_WEBHOOK_SECRET (S001) and live-key-in-test confusion (S002). Pattern: agent assumes test env is fully bootstrapped; it isn't. Fix is one-time: pytest-env in pyproject.toml.

work/acme-api3 sessions

Three sessions, three different waste patterns — exploration_drift, over_delegation, exploration_drift. The codebase is small enough that subagents and exhaustive exploration are usually overkill.

Biggest friction

Reimplementing patterns that already exist (S004 StripeRetry, S007 RateLimiter). A short CLAUDE.md mapping common patterns to their location would eliminate this.

work/acme-mobile1 sessions

Saturday late-night SDK upgrade went sideways — 4 hours of blind retry on peer-deps that needed manual pinning.

Biggest friction

blind_retry on `expo install --fix`. Context: framework upgrades + late hours + no docs-first habit.

work/acme-tooling2 sessions

One-off scripts. Cleanest sessions of the week — clear scope, dry-run-first habits, zero friction. The exception was S009 (python vs python3).

Biggest friction

macOS python/python3 alias missing. Single-line fix.

NotesDemo dataset — fictional persona 'Alex at Acme Payments' with 10 fabricated sessions covering varied friction patterns. All session_ids, project paths, file names, and quotes are synthetic. Run `tessera run` against your own ~/.claude/projects/, ~/.codex/sessions/, ~/.gemini/tmp/ to see real findings from your own work.

Test environment setup is the #1 friction source this week — 4 of 10 sessions burned 12-56 events on the same class of one-line config issues.

Observations (6)

Quick wins (5)

Per project (4)

Counterfactuals (10)

Lessons for user (8)

Lessons for agent (10)

Sessions (newest first)

Sessions (newest first)

Sessions (newest first)

Sessions (newest first)

Test environment setup is the #1 friction source this week — 4 of 10 sessions burned 12-56 events on the same class of one-line config issues.

Observations (6)

Quick wins (5)

Per project (4)

Counterfactuals (10)

Lessons for user (8)

Lessons for agent (10)

Sessions (newest first)

Sessions (newest first)

Sessions (newest first)

Sessions (newest first)

Save your ratings to history