← back to the timeline
Jun 8 2026

six things got done: Taught the app to actually grow more sure of itself over time. Tap any of these to read the whole thing.

Taught the app to actually grow more sure of itself over time

The app keeps a "how sure am I about this?" rating for every exercise, every movement pattern (squat, hinge, the presses and pulls), and ev…

Done
the problem

The app keeps a "how sure am I about this?" rating for every exercise, every movement pattern (squat, hinge, the presses and pulls), and every muscle group. The rating is supposed to climb from "no idea yet" → "getting a feel for it" → "I know this now" as you train. Except it never climbed. Every rating was stuck on "no idea yet" forever, the steps that would move it up were designed a long time ago but never actually built. So the coach treated even someone with months of history as a total stranger, and a downstream feature (letting you review your projected targets) was stuck waiting for ratings that never moved.

what I changed

I built the whole "grow more sure" system for all three: exercises, patterns, and muscles.

  • Exercises become "known" once you've done them enough times and your estimated strength has settled down (stopped bouncing around).
  • Patterns become "known" after about six sessions with a real, data-backed read on how they're trending. This is the important one: once enough of your main patterns are "known," the app can finally offer to review your targets, the thing that was blocked.
  • Muscles don't judge themselves; they inherit confidence from the patterns that train them. A nice catch here: biceps only get trained by isolation moves, so an earlier "only count the big patterns" idea would have left biceps permanently stuck, the final rule counts all the patterns a muscle actually uses.

A rating only ever goes up, never sneaks back down (going backwards would yank away targets you'd already been shown). And the whole thing is conservative on purpose: it would rather say "not sure yet" than wrongly claim "I know this" off thin data.

how I tested it

Built in five small, separately-shipped pieces, each test-first. 81 fast unit tests plus end-to-end tests that drive real sessions and watch the ratings climb. Every piece passed the reliable server test before merging. No database change and no iPhone-app change needed, it's all server logic.

How it was decided. Before writing code I ran a long structured design interview, and for every question I got three independent takes, my own, a reviewer starting fresh each time, and a reviewer that remembered the whole conversation, then picked the best. All the decisions are written down in a new decision record (ADR-0020).

Done. All five pieces merged (287–291), umbrella 166 closed. This unblocks the target-review feature (269). Two related volume items (164/165) stay separate on purpose. Found and filed one pre-existing bug along the way (292: a list that grows without limit).

Turned on two coaching rules the AI already had the data for

Two bits of coaching guidance had been written months ago but never actually switched on in the live AI.

Done
the problem

Two bits of coaching guidance had been written months ago but never actually switched on in the live AI. The interesting part: in both cases the app was already sending the AI the data it needed, it just wasn't told to use it. So we were paying to ship the data and getting nothing back.

what I changed
  • #221, react to what you tell it mid-workout. If you flag "that hurt" or "my form broke down" on a set, or you swap to a different set type than it suggested, the AI now reacts on your next set. Pain is one-strike: it backs off the weight and flags it, rather than pushing you through. This was close to a safety gap before, the app knew you'd flagged pain but never told the coach to do anything about it. (One small correction baked in: the old draft pointed the AI at the wrong part of the data; I fixed it to read the arrays that actually carry those signals.)
  • #222, stop prescribing impossible machine weights. Gym weight-stack machines go up in 5 kg steps, but the AI's rules wrongly let it ask for things like 37.5 kg on a leg press, a weight that physically isn't on the stack. I split machines into their own "round to the nearest 5 kg" rule. This matches what the app already knows in its own weight tables.
how I tested it

I had two fresh agents research each one first, confirming the data already flows and the change is low-risk, then you made the product calls (adopt the full pain/form/deviation set; prompt-fix the machines now). Both changes are golden-locked with tests that assert the new rules are actually in the live prompt. Tests green.

both merged and closed, 221 (PR 278), 222 (PR 280). Filed a follow-up (279) to make the machine-weight rounding bulletproof in code (right now it's the AI's best effort). Also flagged that cable machines have the same 5-kg-vs-2.5-kg question, left that one for a separate decision since it's genuinely debatable.

Swept an old dead label out of saved user data

Way back, each saved "trainee model" carried a label called reassessmentRecords.

Shipped
the problem

Way back, each saved "trainee model" carried a label called reassessmentRecords. The app stopped using it months ago and removed it from the code, but the label could still be sitting inside existing users' saved rows in the database, harmless clutter, but clutter.

what I changed

A one-time database cleanup that deletes that dead label from any rows that still have it. I copied an already-proven cleanup we'd done before (same exact shape, just a different label name), so there were no surprises. I left one copy of the old label alone on purpose, it lives in a test file where it actually does a job: it proves the app safely ignores old labels it no longer understands.

how I tested it
  • One agent traced every place the label is used and confirmed nothing reads or writes it anymore, truly dead.
  • A second agent did an adversarial review of the cleanup, poking at eight different ways it could go wrong (could it run twice safely? could it touch the wrong data? etc.), all clean.
  • The build system spun up a real throwaway database, applied the cleanup, and it worked.
  • Then the live deploy applied it to the real database, I watched the "apply" step go green (it needed one re-run because an unrelated setup step flaked the first time).
merged and live (PR 276). Backlog P5-D07 ticked done.

Finished the prompt-loader cleanup (the 6th copy)

Earlier today I merged five copies of "find and read a prompt file" into one shared helper, but left a sixth, odder copy in the exercise-sw…

Done
the problem

Earlier today I merged five copies of "find and read a prompt file" into one shared helper, but left a sixth, odder copy in the exercise-swap code for its own ticket. This finishes that, folds the sixth one in too, so there's now exactly one place that does it.

what I changed

Pointed the exercise-swap prompt at the shared PromptLoader. The tricky part: unlike the other five, this one is meant to not error, if the file is missing it quietly falls back to a short default prompt. I kept that exact behaviour (both "file missing" and "couldn't read it" still land on the fallback), along with its little habit of stripping comment lines. Before changing it I checked where the prompt files actually live in the built app, they sit flat at the top, not in a sub-folder, which confirmed the swap doesn't change which file gets loaded.

how I tested it

App builds clean. A grep confirms the low-level "open a bundled file" call now appears in only one file (the shared loader); the shared loader already has its own tests from earlier today.

merged (PR 274). Backlog P5-D08 ticked done, the prompt-loader consolidation is fully complete across all six places.

Two small cleanups: tidy lift names, and one prompt loader

Two bits of leftover duplication, both loose ends from earlier work.

Done
the problem

Two bits of leftover duplication, both loose ends from earlier work.

  1. Movement names like "hip_hinge" were being turned into "Hip Hinge" by hand in two different screens, even though we'd just built one proper place to do that.
  2. Five different services each kept their own near-identical copy of the code that finds and reads a prompt file out of the app bundle.
what I changed
  • #268, lift names. Pointed both screens at the shared displayName and deleted the two hand-rolled helpers. Same words on screen, less code.
  • #220, one prompt loader. Made one small PromptLoader that finds and reads a bundled prompt, and had all five services call it instead of repeating themselves. Each service keeps its own error message and any extra tweaks it makes to the text, so nothing behaves differently. The ticket only named three services; I found two more identical copies and folded them in too (you okayed doing all five). While in there I spotted a sixth, odder copy in the exercise-swap code, it quietly falls back to a default instead of erroring, so I left that one for its own ticket (P5-D08) rather than force it into the same mould.
how I tested it

Both built clean. #268: app builds, no test depended on the old text. #220: a new little test (a real prompt loads; a missing one returns nothing instead of crashing) plus the existing prompt-content tests stayed green, 97 + 9 tests passing. (One gotcha: the test target doesn't auto-pick-up new test files like the app does, so I had to register the new test in the Xcode project by hand before it would run.)

both merged and closed, 268 (PR 271), 220 (PR 272).

Built the "your training leveled up" goal check-in

When someone's lifts all move up around the same time, the app treats it as a milestone, a good moment to step back and rethink what you're…

Done
the problem

When someone's lifts all move up around the same time, the app treats it as a milestone, a good moment to step back and rethink what you're training for. But the old version was broken: the coach would nudge you every single workout for about six sessions to "go revisit your targets," except there was no screen to do that, no numeric targets to set, and no way to tell the app "okay, got it." So it just nagged. This was the whole job: build the real check-in, end to end.

what I changed
  • Remembering you dealt with it. Added a record of which level-up moments you've already acknowledged, so the coach stops bringing it up once you've handled it. (A)
  • Saving that on the server. When you update your goal, the server now also files the acknowledgment. (B)
  • Human-readable lift names. "Hip Hinge" instead of "hip_hinge" for anything shown to you. (C)
  • The banner. A friendly "your training has leveled up" card on the pre-workout screen that names the lifts that moved up, in its own colour so it doesn't blur into the other cards. (D+E1)
  • Hiding it instantly. The logic that makes the banner disappear the moment you save, without waiting on the server. (F1)
  • The goal-review screen itself. Edit your goal in plain words, pick your focus areas, see your current strength numbers for context, and save. (F2)
  • Connecting the button. Wired the banner's "Review goals" button to open that screen, and made the banner vanish after you save (a plain cancel leaves it up). (E2)
  • Fixing the coach's script. Pointed the AI at the real "Review goals" button instead of nonexistent "targets," taught it what to say when there aren't specific lifts to name, and made it stay gentle (not pushy) if you haven't gotten around to it yet. (G)
how I tested it

Every step was written test-first and merged green on its own. Then, because the pieces had each been built against a moving baseline, I rebuilt the whole app with all eight together and ran every related test suite at once: 122 tests plus the banner-copy and lift-name suites, all passing, app builds clean.

All eight slices merged, PRs 259, 260, 261, 262, 263, 264, 265, 266. The umbrella issue 258 now has every box ticked and is ready to close once you've confirmed it feels right in the app. Built from the plan we grilled out and wrote up in 258.