Use cases · Mental health & meditation
Session-completion nudge cadence.
Daily nudges win the A/B by 0.7pt. They push a vulnerable 9% out of the app.
Mental-health products run reminder cadence A/Bs the same way any other consumer app does, measure completion rate, ship the winner. The asymmetry: in this category, the cohort the cadence is hurting is also the cohort most likely to need the product. Per-segment CATE doesn't just protect revenue; it protects users.
Worked audit
Stillpoint · Series-C meditation app · ~$22M ARR · 854,210 push-opted active users
STILLPOINT-2026Q1-AUDIT-004
Projected impact
+$1.7M / yr · zero ethics escalations
1 · What the team reported
Session-reminder cadence test: every-3-day (control) vs daily (variant). Completion 22.1% → 22.8%, p = 0.03. Team called it "winner, ship daily."
Internal note flagged extra user-support tickets in the daily-cadence cohort. The metric was the metric; the cadence shipped.
2 · What our re-analysis found
Doubly-robust re-evaluation keyed on engagement signal × streak length × recent-low-mood flag shows the global +3.5% lift hides a sharp negative cell.
A cohort of recent-low-mood signal × short streak users (~9% of base) shows a −6.1% lift (CI [−10.4, −1.8]) on daily cadence. The pattern correlates with the support-ticket bump the team flagged. In aggregate the high-engagement / long-streak cohort's +8.2% lift dominated and the support pattern was attributed to noise.
In a mental-health product this is an ethics-and-trust risk, not just a revenue one.
3 · Why the t-test missed it
Engagement and streak length are not part of the assignment scheme. The aggregate completion-rate metric is dominated by the high-engagement cohort that responds well to daily nudges. The vulnerable cohort is small enough to disappear into the mean.
CATE keyed on the vulnerability proxy separates the populations and surfaces the negative cell on the same logged data. ESS guardrail keeps the recently-churned-and-returned cohort flagged for re-test, not claimed.
4 · What we'd recommend
Per-cohort cadence with ESS-guardrailed exclusion of the recent-low-mood × short-streak cohort from daily pushes. The cohort continues on every-3-day cadence (or opt-in only).
Estimated +19% completion on positive segments · +$1.7M / yr ARR · 0 ethics escalations.
Doubly-robust readout · daily vs every-3-day · bootstrap 1,000 reps
| Cohort | DR estimate | 95% CI | ESS | Verdict |
|---|---|---|---|---|
| All push-opted users | +3.5% rel. | [+1.4, +5.6] | 0.64 | positive, confirms t-test |
| High engagement · long streaks | +8.2% rel. | [+5.0, +11.4] | 0.59 | strong positive |
| New users (<14d tenure) | +4.1% rel. | [+1.2, +7.0] | 0.51 | positive |
| Recent low-mood × short streak | −6.1% rel. | [−10.4, −1.8] | 0.44 | clear negative, pushed away |
| Recently churned-and-returned | −2.3% rel. | [−5.8, +1.2] | 0.39 | overlap-limited; re-test |
Read the full audit, then audit your own test.
Same shape we'll send back on your last A/B test — free, in three business days.