10 sessions · 915 active min · 7 claude · 2 codex · 1 gemini · 2026-04-27 · demo (fabricated) 7d window · min ≥20 events · excluded: 2 short
showing 10 sessions · 915 active min · 7 claude · 2 codex · 1 gemini
Test environment setup is the #1 friction source this week — 4 of 10 sessions burned 12-56 events on the same class of one-line config issues.
If you do one thing this week
Add `pytest-env` to acme-checkout/pyproject.toml with dummy test secrets (whsec_test_dummy, sk_test_dummy) — eliminates the dominant friction class across every Stripe-related session, both Claude and Codex.
Week-over-week5new1continuing1stable2resolved since lastvs run from 2026-04-20
10
Sessions
6
Observations
5
Quick wins
4
Projects
915
Active min
0
Fabrications
Observations (6)
§1Test environment setup is the #1 friction source — same pattern in 4 of 10 sessionshighenvironmental4 sessions~2.7h frictiontrend: stable
Across 4 distinct sessions, the agent burned 12-56 events on missing test config: STRIPE_WEBHOOK_SECRET (S001), live-key-in-test (S002), `python` not on PATH on macOS (S009), database-locked migrations (S004). Each is a one-line config fix that would eliminate the entire class of error.
WhyTest environment 'just works' for humans because we set it up manually once. Agents start fresh every session and rediscover each constraint. The fix isn't smarter agents — it's surfacing the constraint as code (CLAUDE.md, pyproject.toml [tool.pytest_env], shell aliases). Cross-session pattern: 4 different sessions, 4 different constraints, all in the same shape.
NextSet up `pytest-env` in acme-api + acme-checkout pyproject.toml with all required test secrets (use dummy values: `whsec_test_dummy`, `sk_test_dummy`, etc.). Add `alias python=python3` to ~/.zshrc. One evening of setup eliminates the dominant friction class for the rest of the year.
Was this useful?
§2Blind retry without changing the input — 2 sessions, 4+ hours wastedhightooling2 sessions~2.3h friction
S005 (mobile SDK upgrade) ran `npx expo install --fix` 14 times in 4 hours hoping the peer-deps would self-resolve. S004 (refund migration) ran `alembic upgrade head` 6 times against a locked database. Both eventually resolved by reading docs / fixing config.
WhyBoth cases share a structure: agent runs a command, it fails, agent runs the same command again. After 2-3 retries, the issue is never the command's flakiness — it's missing context. The agent needs a heuristic: 'after N identical retries, stop and read docs / change strategy'.
NextThe reflect-coach hook already detects this via the `retry_without_change` rule (≥3 identical retries triggers a nudge). Confirm the hook is installed in your Claude Code; install if not: `cd tessera-live && /plugin add .`
Was this useful?
§3Late-night sessions correlate with low-quality work — strong signal even on 1 sessionlowworkflow1 sessions~1.8h friction
S005 (Saturday 10pm-1:30am) produced this week's longest tool_error_loop (203 events / 38 min on Pod resolution) and only blind_retry waste signature in the dataset. The other 9 sessions were morning/afternoon and showed varied, productive patterns.
Evidence (1 sessions — click to drill in)
S005claude:5d3e8c4f-1a2b-…
WhySingle-session signal so weak confidence — but the contrast is sharp. Late-night sessions on framework upgrades may be the worst-case combo: high cognitive load + low patience for reading docs first. Easy enough to track over more weeks.
NextDon't start framework upgrades after 9pm. If you must, start by reading the official changelog (one Read call) before any install command.
Was this useful?
§4Subagent over-delegation on small codebases — 487 events for what 3 reads would answermediumtooling2 sessions~47m friction
S007 spawned 4 parallel exploration subagents on a 19-file middleware folder; the existing RateLimiter pattern was missed. 145 wasted events. The session that did it RIGHT (S003) used subagents for a 9-file PR review in parallel — appropriate scale, saved 25 minutes.
WhySubagents shine for genuinely-parallel work on broad codebases. They flop on 'I'm not sure where to look' on small codebases — context lossy summaries replace what direct Read would give you. Rule of thumb: <50 files = direct Read+Grep; >50 files OR genuinely parallel = subagents.
NextAdd to acme-api/CLAUDE.md: 'middleware patterns: src/api/middleware.py — use the existing RateLimiter base class for new throttling middleware. For codebase exploration on this repo (<25 files in any single dir), prefer direct Read+Grep over subagent delegation.'
Was this useful?
§5Stack-trace-driven debugging vs adjacency-driven debugging — clean signal in S010mediumworkflow1 sessions~10m friction
S010 (DB pool leak) initially investigated middleware.py because it was 'related to the API' even though the stack trace pointed at src/db/session.py. 57 events of dead-end exploration before pivoting to follow the trace.
Evidence (1 sessions — click to drill in)
S010claude:f3e4d5c6-b7a8-…
WhyWhen a stack trace exists, it's the highest-precision signal in debugging. Adjacency reasoning ('middleware is related to APIs') is lower-precision. The agent reverts to adjacency when the trace is in a less-familiar module. Worth flagging because S010 was otherwise the best debugging session of the week — this was its only friction.
NextAdd a personal habit: when investigating a production bug, the FIRST file read should be the one named in the stack trace, not 'related' files. Ask the agent to do this explicitly: 'Start by reading the file named in the stack trace.'
WhyThe verification_completeness field is a strong outcome predictor on this dataset. Sessions tagged 'thoroughly_verified' cleared with minimal user-caught errors; 'claimed_only' sessions had user-caught errors. This is a habit-level signal, not a model-capability one.
NextMake 'verify before claiming complete' explicit in the prompt: 'After you believe the task is done, run the relevant test/lint/build BEFORE saying you're done. If you can't verify, say so explicitly.' Add this to your global CLAUDE.md.
Was this useful?
Quick wins (5)
2 sessionsAdd `pytest-env` to acme-checkout/pyproject.toml + populate test secrets (whsec_test_dummy, sk_test_dummy). Eliminates the test-env-discovery loop in every webhook session.
1 sessionsAdd `alias python=python3` to ~/.zshrc. Eliminates 'command not found: python' across every Codex session on macOS.
1 sessionsEdit your code-review skill prompt: change `git diff main...origin/BRANCH` to `git fetch origin main && git diff origin/main...origin/BRANCH`. One line, eliminates the stale-main blind_retry pattern.
1 sessionsAdd to acme-api/CLAUDE.md: 'Stripe error handling: use `src/payments/decorators.StripeRetry`. Migrations: alembic autogen + always include downgrade(). DB locked errors: stop dev server first.'
1 sessionsAdd to acme-mobile/AGENTS.md: 'For SDK upgrades: read expo.dev/changelog/sdk-N FIRST. Never run `expo install --fix` more than 2 times — pin manually after that.'
Per project (4)
work/acme-checkout4 sessions
Stripe integration is the dominant work area — productive but consistently friction-loaded by missing test environment setup.
Biggest friction
tool_error_loop on STRIPE_WEBHOOK_SECRET (S001) and live-key-in-test confusion (S002). Pattern: agent assumes test env is fully bootstrapped; it isn't. Fix is one-time: pytest-env in pyproject.toml.
work/acme-api3 sessions
Three sessions, three different waste patterns — exploration_drift, over_delegation, exploration_drift. The codebase is small enough that subagents and exhaustive exploration are usually overkill.
Biggest friction
Reimplementing patterns that already exist (S004 StripeRetry, S007 RateLimiter). A short CLAUDE.md mapping common patterns to their location would eliminate this.
work/acme-mobile1 sessions
Saturday late-night SDK upgrade went sideways — 4 hours of blind retry on peer-deps that needed manual pinning.
Biggest friction
blind_retry on `expo install --fix`. Context: framework upgrades + late hours + no docs-first habit.
work/acme-tooling2 sessions
One-off scripts. Cleanest sessions of the week — clear scope, dry-run-first habits, zero friction. The exception was S009 (python vs python3).
Biggest friction
macOS python/python3 alias missing. Single-line fix.
NotesDemo dataset — fictional persona 'Alex at Acme Payments' with 10 fabricated sessions covering varied friction patterns. All session_ids, project paths, file names, and quotes are synthetic. Run `tessera run` against your own ~/.claude/projects/, ~/.codex/sessions/, ~/.gemini/tmp/ to see real findings from your own work.
Session
Agent
Project
Date ▼
Events
Active min
Bursts
Waste
Friction
User caught
5 env issues across 5 sessions
work/acme-checkout7× in session
STRIPE_WEBHOOK_SECRET missing in test env — pytest conftest doesn't auto-load .env.test. Add `python-dotenv` autoload or `pytest-env` to pyproject.
alembic upgrade head fails on first try with 'database is locked' when dev server is running. Fix: stop dev server before migrations OR use SQLite WAL mode in dev.
Playwright's chromium browser dies between test files when test isolation is set to 'process'. Fix: set fullyParallel: false in playwright.config.ts for the e2e suite.
At event 342, if the agent had asked 'should I keep this simple or build for future coupon types?' before introducing heapq, the 46-event abstraction detour would have been avoided.
At event 78, reading the official Expo SDK 53 upgrade guide first (one Read of expo.dev/changelog/sdk-53) would have surfaced the 4 known breaking changes and prevented 47 minutes of blind retries.
At event 78, if the agent had checked the .env.test STRIPE keys (one Read call) before launching browser sessions, it would have caught the live-key issue immediately and avoided 65 events of confused browser_spiral.
If conftest.py had loaded .env.test at session start (event 0), the entire 37-event KeyError loop at events 42-78 would have evaporated — that's 8+ minutes saved on the very first try.
If the agent had Read middleware.py + Grep'd for 'usage|track' first (3 events), instead of spawning 4 subagents (145 events), it would have found the existing RateLimiter pattern in under 30 seconds.
At event 142, reading the actual error stack trace (one Read of the prod log) before guessing at middleware would have skipped 57 events of unrelated investigation.
If the prompt had said 'use existing StripeRetry decorator from src/payments/' (one sentence at the start), the 84-event reimplementation drift at events 304-387 would not have happened.
Late-night SDK upgrade sessions correlate with blind retries — try framework upgrades during morning hours when the cost of pausing to read docs is lower.
Add a fail-fast assertion at top of playwright.config.ts: `if (process.env.STRIPE_KEY?.startsWith('sk_live')) throw new Error('production stripe key in test config')`.
Add `pytest-env` to pyproject.toml dev deps with `[tool.pytest_env] STRIPE_WEBHOOK_SECRET = whsec_test_dummy` — eliminates this class of error in every webhook-related session.
When introducing a data structure (priority queue, dispatch table, factory), pause and ask if a 3-case if/else suffices. Default to the simplest thing that works.
For framework version upgrades, read the official changelog/migration guide BEFORE running upgrade commands. 'expo install --fix' loops are almost always missing-context bugs.
Before doing browser automation against an external service, verify which API keys / test mode flags are loaded — it's a 1-event check that prevents 30+ event spirals.
When asked for a one-off cleanup script touching production data, default to --dry-run-first and ask for explicit --apply confirmation. This session did that correctly.
Copy the command below and paste it in your terminal. Your ratings get applied to the latest synthesis run, and the next tessera run will use them as prior context.