three things got done: Closed out four "is the AI being honest?" decisions. Tap any of these to read the whole thing.
Four loose ends were all about the same theme: when the AI fails or behaves oddly, does the app stay honest about it? A couple were real decisions, not just code edits, so I talked through each one before changing anything.
Each code fix has its own test (the day-label and honest-summary fixes were written test-first, fail, then fix, then pass). The full app test suite passes (237 tests). The two "decision" items (#42 comments, #241 rule) changed no behavior.
A bunch of small things needed tidying up. A few server functions had no tests, so we couldn't be sure they saved data correctly. A past bug, where saving a workout set could silently lose its "intent" and "date", had been fixed but never had a test guarding it. Workout day-labels in one older code path weren't being cleaned up, so they could drift away from your history. And the set-logs table had a fake placeholder date ("1970-01-01") it would quietly fall back on if a real date was ever missing.
All the tests pass locally (the local run is what we trust). Two of the five showed a red mark in the automated cloud checks, I looked at both and confirmed they were unrelated hiccups (a known flaky phone-build job, and a one-off Supabase tool-install failure), not real problems with the changes.
When the app sends a workout to the server, each set can carry a "pattern" label, like "horizontal_push" for a bench press. The server keeps a fixed list of valid pattern names. But the old code trusted whatever the app sent, even made-up words. Those junk words got dropped into the "what you trained today" list. That list helps the app decide when to quietly clear an old injury or form note, so a junk label could trip that safety check by accident.
The server now only accepts a pattern if it's on the real list of valid names. If the app sends junk, the server ignores it and works out the correct pattern from the exercise name instead. So bad data can't sneak in, and it also can't hide the right answer.
I wrote three tests first. Two of them failed before the fix (which proved the bug was real), then passed after. All the important tests pass (91 out of 91).