Routing Policy

Omegon's routing layer selects concrete provider/model combinations for each inference request based on three constraints: model grade intent, thinking level, and context class.

Selection Flow

Filter to authenticated providers — only models with configured API keys or active OAuth sessions are considered
Match model grade intent — filter to models that satisfy the requested grade (higher grades can serve lower requests)
Check context floor — exclude models whose context ceiling is below the session's required minimum
Apply provider preference — order by the operator's provider preference (default: Anthropic first)
Check cooldowns — exclude providers/models that are in a cooldown period from recent failures
Select best candidate — first viable match wins

Downgrade Classification

When a model switch would change the context class, the harness classifies the transition:

Classification	Action	Example
Compatible	Auto-reroute (silent)	Massive → Extended (1 class, within floor)
Compatible with Compaction	Auto-compact if safe	Floor bridgeable, no pin crossed
Degrading	Operator confirmation required	Massive → Compact (3-class drop)
Ineligible	Excluded from candidates	Grade mismatch, thinking constraint

Failure Recovery

Upstream failures are classified into recovery actions:

Retryable flake (5xx, timeout) → retry same model once
Rate limit (429) → cooldown provider, failover to alternative
Auth failure (401/403) → surface to operator
Context overflow → handled by compaction system
User abort → no recovery action (not a failure)

Route Matrix

A reviewed snapshot of provider/model context ceilings is embedded in the binary at compile time. Runtime routing never trusts live provider claims. The matrix is updated through a scheduled refresh pipeline that detects drift, classifies changes, and requires human review for context decreases or route removals.