Session-completion nudge cadence.

Daily nudges win the A/B by 0.7pt. They push a vulnerable 9% out of the app.

Mental-health products run reminder cadence A/Bs the same way any other consumer app does, measure completion rate, ship the winner. The asymmetry: in this category, the cohort the cadence is hurting is also the cohort most likely to need the product. Per-segment CATE doesn't just protect revenue; it protects users.

Worked audit

Stillpoint · Series-C meditation app · ~$22M ARR · 854,210 push-opted active users

STILLPOINT-2026Q1-AUDIT-004

Projected impact

+$1.7M / yr · zero ethics escalations

1 · What the team reported

Session-reminder cadence test: every-3-day (control) vs daily (variant). Completion 22.1% → 22.8%, p = 0.03. Team called it "winner, ship daily."

Internal note flagged extra user-support tickets in the daily-cadence cohort. The metric was the metric; the cadence shipped.

2 · What our re-analysis found

Doubly-robust re-evaluation keyed on engagement signal × streak length × recent-low-mood flag shows the global +3.5% lift hides a sharp negative cell.

A cohort of recent-low-mood signal × short streak users (~9% of base) shows a −6.1% lift (CI [−10.4, −1.8]) on daily cadence. The pattern correlates with the support-ticket bump the team flagged. In aggregate the high-engagement / long-streak cohort's +8.2% lift dominated and the support pattern was attributed to noise.

In a mental-health product this is an ethics-and-trust risk, not just a revenue one.

3 · Why the t-test missed it

Engagement and streak length are not part of the assignment scheme. The aggregate completion-rate metric is dominated by the high-engagement cohort that responds well to daily nudges. The vulnerable cohort is small enough to disappear into the mean.

CATE keyed on the vulnerability proxy separates the populations and surfaces the negative cell on the same logged data. ESS guardrail keeps the recently-churned-and-returned cohort flagged for re-test, not claimed.

4 · What we'd recommend

Per-cohort cadence with ESS-guardrailed exclusion of the recent-low-mood × short-streak cohort from daily pushes. The cohort continues on every-3-day cadence (or opt-in only).

Estimated +19% completion on positive segments · +$1.7M / yr ARR · 0 ethics escalations.

Doubly-robust readout · daily vs every-3-day · bootstrap 1,000 reps

Cohort	DR estimate	95% CI	ESS	Verdict
All push-opted users	+3.5% rel.	[+1.4, +5.6]	0.64	positive, confirms t-test
High engagement · long streaks	+8.2% rel.	[+5.0, +11.4]	0.59	strong positive
New users (<14d tenure)	+4.1% rel.	[+1.2, +7.0]	0.51	positive
Recent low-mood × short streak	−6.1% rel.	[−10.4, −1.8]	0.44	clear negative, pushed away
Recently churned-and-returned	−2.3% rel.	[−5.8, +1.2]	0.39	overlap-limited; re-test

Read the full audit, then audit your own test.

Same shape we'll send back on your last A/B test — free, in three business days.

Read audit PDF Send my test

Related use cases

Premium upsell timing for sensitive users→Subscription tier upgrade prompt timing→Push timing per habit pattern→