Industries, Mental health & meditation

Decisions in a category where the wrong nudge has a cost.

Holdout-validated lift, every cohort, every readout. No experiment runs without a defensible safety story.

Mental-health apps live with sensitivities most consumer apps never face. Brand risk on manipulative-feeling experimentation is real. Ethics review on personalisation is real. Cohort heterogeneity is everything, the average treatment effect is meaningless when the variance is the story. Metapolicy is the substrate that makes those constraints workable rather than paralysing.

Book a demo See pricing

Whale at depth representing calm interior states

What agents handle

Workflow

Session-completion nudge

Before

Same prompt to every user who pauses a session

After

CATE-targeted: positive-uplift cohorts only, with frozen-context audit trail

Result

+13% session completion, zero increase in opt-outs

Workflow

Plan downgrade vs. pause selection

Before

Both options shown equally on cancel intent

After

Bandit selects sequence + framing per propensity-to-return

Result

+9% pause (vs. churn), protects LTV without coercion

Workflow

Content-recommendation slate ordering

Before

Editorial-picked, same to all

After

Bandit-ranked per user's history and reported mood state

Result

+21% next-session start within 24 hours

Workflow

Premium upsell timing

Before

Day-7 modal regardless of engagement

After

Bandit picks timing only on users with positive engagement uplift

Result

+15% conversion, half the impressions

The challenges we solve

Brand sensitivity to anything that feels manipulative
Ethics review on personalisation requires audit-grade decision history
Cohort heterogeneity is the story, the average is misleading
Negative-effect cohorts must be findable and stoppable per-arm
Holdout discipline must survive product-team pressure to ship the variant

Frozen context snapshots

Every decision row carries the X the policy saw at decision time, immutably embedded. Ethics review can replay any user's experience for any date in their history.

Per-arm rollouts gated by ESS

Effective sample size below n/10 on any arm triggers a flag, the system refuses to claim lift it cannot defend. Negative-effect cohorts are visible, not hidden inside an average.

User-level holdout for the cautious cohort

Deterministic hash-based holdout assignment is stable across re-serves. The control population is structurally protected from the policy's drift.

Worked audit example

What a doubly-robust re-analysis surfaced for Stillpoint.

Series-C meditation app · ~$22M ARR · 854,210 push-opted active users · case ID STILLPOINT-2026Q1-AUDIT-004

What their team reported

Session-reminder cadence: daily vs every-3-day on completion. 22.1% → 22.8%, p = 0.03. Team called it "ship daily." Support flagged extra tickets; the metric was the metric.

What our re-analysis found

Doubly-robust per-cohort: +8.2% on high-engagement / long-streak users, but −6.1% on a recent-low-mood × short-streak cohort (~9% of base) being pushed out of the app. The support-ticket pattern correlated. In a mental-health product this is an ethics risk, not just a revenue one.

Recommendation · projected annualised impact

Per-cohort cadence with ESS-guardrailed exclusion of the vulnerable cohort from daily pushes. +19% completion on positive segments and zero ethics escalations.

+$1.7M / yr · zero ethics escalations

Same shape we'll send back on your last A/B test, in three business days.

Read the full audit PDF →

Audit your last paywall test — free.

One CSV, one experiment config. Same-shape readout back in three business days.

Send my test