Use cases · Fitness & wellness
Push timing per habit pattern.
7 a.m. wins on average. It destroys engagement on a third of your users.
Fitness apps run a fixed-time workout-reminder push. The morning cohort is large, so its lift dominates the global metric and the team ships the morning slot to 100%. The evening cohort, typically a quarter to a third of active users, gets prompted at the time when they're least likely to act, and the engagement loss compounds silently. Per-segment CATE makes the loss visible on the data that already exists.
Worked audit
Habit · Series-C subscription fitness app · ~$48M ARR · 2.43M push-opted active users
HABIT-2026Q2-AUDIT-003
Projected impact
+$3.8M / yr LTV uplift
1 · What the team reported
Workout-reminder push: 6 p.m. (control) vs 7 a.m. (variant) on 24-hour session-completion. 18.2% → 19.0%, p = 0.04.
Team called it "ship 7 a.m. globally" and rolled out to 100% of push-opted users.
2 · What our re-analysis found
Doubly-robust re-evaluation keyed on each user's implicit exercise window (derived from session-completion timestamps over the prior 90 days) shows a sharp asymmetry by cohort.
Pre-dawn (5–7 a.m.) and morning (7–9 a.m.) cohorts each show clear positive lift (+9.7%, +6.4%). Lunchtime is flat. The evening (5–8 p.m.) cohort shows a −3.2% lift (CI [−6.7, −0.1]), clear negative. That cohort is 31% of the active base. The 7 a.m. push is destroying their engagement.
The morning cohort is 4× the size of evening, so the global average reads positive. The loss on evening users compounds silently into LTV.
3 · Why the t-test missed it
The implicit-exercise-window covariate isn't part of the assignment scheme; the team treated push timing as a population-level lever. The aggregate session-completion metric is dominated by the morning users who actually respond to the morning push.
CATE keyed on the implicit window separates the populations and makes the negative cohort visible on the same logged data, no rerun. The late-evening (8 p.m.+) cohort comes out ESS-limited and we flag it for re-test rather than claim the negative.
4 · What we'd recommend
Replace the fixed-time push with a contextual-bandit policy keyed on the user's implicit exercise window. Pre-dawn users get 5 a.m. Morning users get 7 a.m. Evening users get 6 p.m. The bandit handles drift.
Estimated +22% session-completion on positive cohorts · ~$3.8M / yr LTV uplift.
Doubly-robust readout · 7 a.m. vs 6 p.m. · bootstrap 1,000 reps
| Cohort (implicit window) | DR estimate | 95% CI | ESS | Verdict |
|---|---|---|---|---|
| All users | +2.1% rel. | [+0.5, +3.7] | 0.62 | small positive, confirms t-test |
| Pre-dawn (5–7 a.m.) | +9.7% rel. | [+5.8, +13.6] | 0.53 | strong positive |
| Morning (7–9 a.m.) | +6.4% rel. | [+3.1, +9.8] | 0.51 | positive |
| Lunchtime (11 a.m. – 2 p.m.) | +0.4% rel. | [−2.8, +3.6] | 0.46 | inconclusive |
| Evening (5–8 p.m.) | −3.2% rel. | [−6.7, −0.1] | 0.42 | clear negative, 7 a.m. wrong for them |
| Late-evening (8 p.m.+) | −5.8% rel. | [−10.1, −1.4] | 0.34 | overlap-limited; re-test |
Read the full audit, then audit your own test.
Same shape we'll send back on your last A/B test — free, in three business days.