Policy Gateway overview - abliteration.ai

Policy Gateway is a paid add-on. Every feature described here and in the subsections requires an active Policy Gateway plan. See pricing.

Policy Gateway is abliteration.ai’s governance layer. It sits between your application and the model, evaluates every request and response, and emits a structured policy event.

Plans

Three plans scale by usage volume, not by feature. Every plan gets the full Policy Gateway surface — projects, policies, all rollout modes, every connector, streaming metadata, policy events:

Plan	Volume vs base
Control	6×
Advanced	20×
Enterprise	60×

See the pricing page for current prices.

What a policy decides

Each request resolves to exactly one policy. The policy returns one of five decisions:

Decision	Meaning
`allow`	Pass through unchanged
`rewrite`	Apply a rewrite before calling the model
`summary`	Replace the output with a short summary
`escalate`	Forward to the escalation path (email or URL) for human review
`refuse`	Block the request

The policy-level enforcement_action is one of rewrite | block | summarize | escalate. When a rule fires, the action maps to the decision (summarize → summary, block → refuse, others pass through). Every decision has a corresponding reason_code in uppercase: ALLOW, REWRITE, SUMMARY, ESCALATE, REFUSE.

What a rule looks at

Rules are flat — there’s no nested match: DSL.

Field	Effect
`allowlist`	If non-empty, the message must contain at least one listed term, otherwise the decision is forced to `refuse`.
`denylist`	Any listed term triggers `enforcement_action`.
`flagged_categories`	OpenAI-moderation categories (`harassment`, `hate`, `sexual`, `illicit`, and child-safety variants). Only evaluated on chat-completions and messages targets.
`redact_pii`	Boolean. Strips PII patterns from the message text before upstream call.

Rollout modes

Rollout is per policy, not per rule.

Mode	`enabled`	`percentage`	Behavior
`shadow`	`false`	n/a	Evaluate and log. Never block.
`canary`	`true`	`< 100`	Each request has (`percentage`/100) chance of being enforced.
`enforced`	`true`	`100`	Always enforced.
`rollback`	(derived)	n/a	Auto-rollback fired — policy is temporarily demoted to shadow-like behavior until `cooldown_minutes` elapses.

Auto-rollback

Every policy can auto-demote itself if the rate of negative decisions spikes:

Field	Purpose
`threshold_pct`	Rate (0–100) of matching decisions that triggers rollback
`window_minutes`	Sliding window for the rate calculation
`min_requests`	Minimum sample size before rollback can fire
`cooldown_minutes`	How long to stay in rollback before resuming
`rollback_decisions`	Which decisions count (e.g. `["refuse", "escalate"]`)

Data classification

Every policy carries a classification field: public | internal | confidential | restricted. It doesn’t change behavior — it’s metadata for audit and access reviews.

Caveats

Shadow mode still runs rules. Every allowlist / denylist / category check happens even in shadow — it just doesn’t block. This is what makes dry-run measurement possible.
No per-rule rollout overrides. Rollout mode is a single knob at the policy level. Adding a rule doesn’t let you ramp that rule independently.
Allowlist exclusivity. A non-empty allowlist forces refuse on any message without an allowlist hit, regardless of denylist/category.
Canary is probabilistic. Two identical requests at percentage=50 can have different outcomes.

Onboarding

Create a project, write a policy, attach a key

Policy endpoints

/policy/* surface and headers

Streaming metadata

The policy field on every SSE frame

Connectors

Stream events to your SIEM, log pipeline, or data lake

​Plans

​What a policy decides

​What a rule looks at

​Rollout modes

​Auto-rollback

​Data classification

​Caveats

​Next

Onboarding

Policy endpoints

Streaming metadata

Connectors

Plans

What a policy decides

What a rule looks at

Rollout modes

Auto-rollback

Data classification

Caveats

Next