The decision layer for AI-first apps. We decide paywalls.
Plug into Statsig, Klaviyo, GrowthBook, Iterable, Braze, Shopify, BigQuery or Snowflake and get a doubly-robust readout on your own data in a day. Upgrade individual experiments to our SDK when you want billable lift. We sit on top of your stack, we don’t replace it.
One CSV. One experiment config. Same-shape readout back in three business days.
One decision engine, many surfaces
Metapolicy controls the parameters behind your website, push, mobile app and lifecycle email. Switch from one-size-fits-all to a contextual bandit that serves each user the experience predicted to convert them — and measures the lift it causes.
● each cable is one user's decision · every surface personalised per segment, thousands in flight



SOC 2 Type II certified. ISO 27001 certified. GDPR compliant. Hosted in the EU. Your data stays where it belongs.
Beyond
A/B tests.
A/B tests report one number: the average lift across your whole base. That average quietly hides the cohort the winner is prompting into churn, and the cohort that would have paid more. Doubly-robust evaluation finds both — on your own logs, without rerunning anything.
- Travel
- Fintech
- Fitness
- Mental Health
- Dating
- EdTech
- Media
Causal, not correlated.
T-tests average lift across the whole base, hiding the cohorts where the winner actually wins (or loses). Our doubly-robust off-policy evaluation surfaces per-segment CATE with bootstrap confidence intervals and an ESS guardrail — the readout your CFO will sign off on.
Outcome-priced. Always.
We are paid a share of the incremental revenue we cause, not a software fee. No lift number that fails the ESS guardrail is invoiced. Your CFO sleeps because the math itself caps how much we can over-claim.
One-day integration.
A single SDK call: decide(experiment, user_id, context). Five native clients, Python, TypeScript, Kotlin, Swift, Flutter, same wire format, same six methods. Your engineers ship the integration in an afternoon.
What the bandit decides on every surface.
Pick your vertical. Each use case is a decision your team is making today with a static rule or a global A/B test. Each one is one SDK call and one logged propensity away from a contextual-bandit policy with a doubly-robust readout.
Booking apps, OTAs, mobility platforms.
ICP · Klook · GetYourGuide · Trainline · Hopper · Booking
Peak-season paywall pricing per route
Hard-coded fare per region, sometimes a seasonal multiplier.
Contextual bandit over (route popularity × seasonality × user LTV decile × device). Propensity logged per assignment.
+8–15% trial-start lift on high-elasticity routes, no margin erosion on low-elasticity ones.
Who feels it · VP Growth, the paywall is their P&L.
We've already re-analysed this exact decision in a worked audit.
Read the worked auditNone of these is hypothetical, each is a live decision your team is already making with a static rule. We just log the propensity and return a doubly-robust readout.
Audit my last 3 A/B tests, freeFive audits we've already handed back.
Aerial's Q4 paywall A/B looked dead on a t-test (p = 0.34). Per-cohort doubly-robust found +9.3% lift on top-quartile-route × peak season, a slice carrying 23% of trial-start volume. Shipped just to that cohort, +$2.1M / yr.
Worked audit
Travel & mobility · AERIAL-2026Q1-AUDIT-001, Aerial · travel
Decide.Log.Reward. Read.
Value before the bandit has learned anything.
The online bandit policy takes 4–8 weeks to converge. The lift readouts work from day zero on data you already have. Propensities are deterministically reconstructible from your experiment configs, no logging upgrade required to get a first answer.
Free 1-hour A/B audit
A 30-minute video walkthrough of one of your recent paywall or onboarding A/B tests, re-analysed through doubly-robust math. Original t-test number, our DR number, ESS, per-segment CATE.
A defensible number for your last A/B test, plus the cohort the average hid, usually worth a recoverable six-figure ARR slice.
90-day historical replay
A 3–6 page report of your entire experiment program re-evaluated under DR + ESS. One row per experiment. Per-segment CATE where overlap is defensible.
A one-page register of which past “wins” hold, which “no effects” actually had a per-segment win, and which decisions to roll back this quarter.
Schema upgrade plan
1-page recommendation to add propensity logging into your existing experimentation framework. Free, even if you never sign with us, every future test becomes replay-able.
An exact column-level diff against your current experiment schema. Ship it once and every future decision becomes auditable, with us or without us.
One bandit, one surface
LinUCB on 3–5 arms for one decision (typically paywall or push). Propensity logged. Sticky-by-user 5% holdout. Weekly doubly-robust readout.
A live, doubly-robust lift number on real users within six weeks, gated by a 5% holdout your CFO can verify against revenue.
CATE-only discovery report
For regulated buyers (fintech, healthcare) who can't move fast on a new SDK. X-learner / T-learner / Causal Forest on your existing experiment data, “your hidden cohort effects.”
A heterogeneity report your growth team can act on internally next sprint, no SDK, no engineering ticket, no procurement cycle in the way.
Online bandit convergence4–8 weeks for paywall-class decisions; 8–16 weeks for retention with long outcome windows.
OPE · CATE · replayDoubly-robust readouts and heterogeneity analysis run on your existing assignment + outcome data.
Propensity from configsA 50/50 split is p = 0.5; a stratified test is the stratum weight. Reconstruct, replay, report.
Send your last paywall test.
We'll send back a doubly-robust readout with ESS, bootstrap CIs, and per-segment CATE in three business days. Free.
Causal,
not correlated.
Every assignment is propensity-logged. Every reward is idempotent. Every lift readout is doubly-robust and ESS-guardrailed. The math itself caps how much we can over-claim.
Works with
your stack.
Five native SDKs with identical wire formats. Plus drop-in adapters for the mobile-growth tools you already run.
Outcome-priced.
DR + ESS gated.
We charge a share of measured incremental revenue. Doubly-robust off-policy evaluation gates every invoice, no DR + ESS guardrail, no bill.
Audit-mode
Connect-in-an-hour. We read your existing Statsig / Klaviyo / GrowthBook / Iterable / Braze / Snowflake stack and produce doubly-robust readouts on your own data. Advisory, never billable.
- Six Tier-1 connectors + CSV upload
- DR + ESS readout on imported experiments
- CATE grid the incumbent doesn't ship
- Disclosed propensity-quality badge
- Cut-over-to-decision-mode CTA per experiment
- Quarterly billing, fixed fee
Floor
Decision-mode SDK with logged propensity. Infrastructure cost; caps customer downside if measured lift is small.
- Full SDK access (5 languages)
- Doubly-robust OPE worker
- ESS-guardrailed readouts
- Per-tenant API key issuance
- Append-only decision logs
- Mailto-grade support
Typical
Series-B/C subscription app, 5–10% of incremental ARR. We grow as you grow.
- Everything in Floor
- Contextual bandits (LinUCB / TS)
- CATE refresh + bandit-prior seeding
- Bootstrap CIs on every readout
- Replayable policy snapshots
- Founder-direct support
Ceiling
Caps our share so the customer's CFO sleeps. Outcome-priced, never more.
- Everything in Typical
- Dedicated OPE windows
- Custom estimator pinning
- Compliance and audit support
- Quarterly causal-readout review
- Direct line to the founder
Every consumer app
needs a decision layer.
Send me your last three A/B tests. I will tell you which ones survive a doubly-robust readout on your own logs, free, no obligation.
No pitch deck. No NDA. A real conversation about the math.


