six things got done: Taught the app to actually grow more sure of itself over time. Tap any of these to read the whole thing.
The app keeps a "how sure am I about this?" rating for every exercise, every movement pattern (squat, hinge, the presses and pulls), and every muscle group. The rating is supposed to climb from "no idea yet" → "getting a feel for it" → "I know this now" as you train. Except it never climbed. Every rating was stuck on "no idea yet" forever, the steps that would move it up were designed a long time ago but never actually built. So the coach treated even someone with months of history as a total stranger, and a downstream feature (letting you review your projected targets) was stuck waiting for ratings that never moved.
I built the whole "grow more sure" system for all three: exercises, patterns, and muscles.
A rating only ever goes up, never sneaks back down (going backwards would yank away targets you'd already been shown). And the whole thing is conservative on purpose: it would rather say "not sure yet" than wrongly claim "I know this" off thin data.
Built in five small, separately-shipped pieces, each test-first. 81 fast unit tests plus end-to-end tests that drive real sessions and watch the ratings climb. Every piece passed the reliable server test before merging. No database change and no iPhone-app change needed, it's all server logic.
How it was decided. Before writing code I ran a long structured design interview, and for every question I got three independent takes, my own, a reviewer starting fresh each time, and a reviewer that remembered the whole conversation, then picked the best. All the decisions are written down in a new decision record (ADR-0020).
Two bits of coaching guidance had been written months ago but never actually switched on in the live AI. The interesting part: in both cases the app was already sending the AI the data it needed, it just wasn't told to use it. So we were paying to ship the data and getting nothing back.
I had two fresh agents research each one first, confirming the data already flows and the change is low-risk, then you made the product calls (adopt the full pain/form/deviation set; prompt-fix the machines now). Both changes are golden-locked with tests that assert the new rules are actually in the live prompt. Tests green.
Way back, each saved "trainee model" carried a label called reassessmentRecords. The app
stopped using it months ago and removed it from the code, but the label could still be
sitting inside existing users' saved rows in the database, harmless clutter, but clutter.
A one-time database cleanup that deletes that dead label from any rows that still have it. I copied an already-proven cleanup we'd done before (same exact shape, just a different label name), so there were no surprises. I left one copy of the old label alone on purpose, it lives in a test file where it actually does a job: it proves the app safely ignores old labels it no longer understands.
Earlier today I merged five copies of "find and read a prompt file" into one shared helper, but left a sixth, odder copy in the exercise-swap code for its own ticket. This finishes that, folds the sixth one in too, so there's now exactly one place that does it.
Pointed the exercise-swap prompt at the shared PromptLoader. The tricky part: unlike the
other five, this one is meant to not error, if the file is missing it quietly falls back
to a short default prompt. I kept that exact behaviour (both "file missing" and "couldn't
read it" still land on the fallback), along with its little habit of stripping comment lines.
Before changing it I checked where the prompt files actually live in the built app, they sit
flat at the top, not in a sub-folder, which confirmed the swap doesn't change which file
gets loaded.
App builds clean. A grep confirms the low-level "open a bundled file" call now appears in only one file (the shared loader); the shared loader already has its own tests from earlier today.
Two bits of leftover duplication, both loose ends from earlier work.
displayName and deleted the
two hand-rolled helpers. Same words on screen, less code.PromptLoader that finds and reads a
bundled prompt, and had all five services call it instead of repeating themselves. Each
service keeps its own error message and any extra tweaks it makes to the text, so nothing
behaves differently. The ticket only named three services; I found two more identical
copies and folded them in too (you okayed doing all five). While in there I spotted a
sixth, odder copy in the exercise-swap code, it quietly falls back to a default instead
of erroring, so I left that one for its own ticket (P5-D08) rather than force it into the
same mould.Both built clean. #268: app builds, no test depended on the old text. #220: a new little test (a real prompt loads; a missing one returns nothing instead of crashing) plus the existing prompt-content tests stayed green, 97 + 9 tests passing. (One gotcha: the test target doesn't auto-pick-up new test files like the app does, so I had to register the new test in the Xcode project by hand before it would run.)
When someone's lifts all move up around the same time, the app treats it as a milestone, a good moment to step back and rethink what you're training for. But the old version was broken: the coach would nudge you every single workout for about six sessions to "go revisit your targets," except there was no screen to do that, no numeric targets to set, and no way to tell the app "okay, got it." So it just nagged. This was the whole job: build the real check-in, end to end.
Every step was written test-first and merged green on its own. Then, because the pieces had each been built against a moving baseline, I rebuilt the whole app with all eight together and ran every related test suite at once: 122 tests plus the banner-copy and lift-name suites, all passing, app builds clean.