Industries, Mental health & meditation
Decisions in a category where the wrong nudge has a cost.
Holdout-validated lift, every cohort, every readout. No experiment runs without a defensible safety story.
Mental-health apps live with sensitivities most consumer apps never face. Brand risk on manipulative-feeling experimentation is real. Ethics review on personalisation is real. Cohort heterogeneity is everything, the average treatment effect is meaningless when the variance is the story. Metapolicy is the substrate that makes those constraints workable rather than paralysing.

What agents handle
Workflow
Session-completion nudge
Before
Same prompt to every user who pauses a session
After
CATE-targeted: positive-uplift cohorts only, with frozen-context audit trail
Result
+13% session completion, zero increase in opt-outs
Workflow
Plan downgrade vs. pause selection
Before
Both options shown equally on cancel intent
After
Bandit selects sequence + framing per propensity-to-return
Result
+9% pause (vs. churn), protects LTV without coercion
Workflow
Content-recommendation slate ordering
Before
Editorial-picked, same to all
After
Bandit-ranked per user's history and reported mood state
Result
+21% next-session start within 24 hours
Workflow
Premium upsell timing
Before
Day-7 modal regardless of engagement
After
Bandit picks timing only on users with positive engagement uplift
Result
+15% conversion, half the impressions

The challenges we solve
- Brand sensitivity to anything that feels manipulative
- Ethics review on personalisation requires audit-grade decision history
- Cohort heterogeneity is the story, the average is misleading
- Negative-effect cohorts must be findable and stoppable per-arm
- Holdout discipline must survive product-team pressure to ship the variant
Frozen context snapshots
Every decision row carries the X the policy saw at decision time, immutably embedded. Ethics review can replay any user's experience for any date in their history.
Per-arm rollouts gated by ESS
Effective sample size below n/10 on any arm triggers a flag, the system refuses to claim lift it cannot defend. Negative-effect cohorts are visible, not hidden inside an average.
User-level holdout for the cautious cohort
Deterministic hash-based holdout assignment is stable across re-serves. The control population is structurally protected from the policy's drift.
Worked audit example
What a doubly-robust re-analysis surfaced for Stillpoint.
Series-C meditation app · ~$22M ARR · 854,210 push-opted active users · case ID STILLPOINT-2026Q1-AUDIT-004
What their team reported
Session-reminder cadence: daily vs every-3-day on completion. 22.1% → 22.8%, p = 0.03. Team called it "ship daily." Support flagged extra tickets; the metric was the metric.
What our re-analysis found
Doubly-robust per-cohort: +8.2% on high-engagement / long-streak users, but −6.1% on a recent-low-mood × short-streak cohort (~9% of base) being pushed out of the app. The support-ticket pattern correlated. In a mental-health product this is an ethics risk, not just a revenue one.
Recommendation · projected annualised impact
Per-cohort cadence with ESS-guardrailed exclusion of the vulnerable cohort from daily pushes. +19% completion on positive segments and zero ethics escalations.
+$1.7M / yr · zero ethics escalationsSame shape we'll send back on your last A/B test, in three business days.
Read the full audit PDF →Audit your last paywall test — free.
One CSV, one experiment config. Same-shape readout back in three business days.