2026 / Position paper
vLLM Semantic Router: Signal Driven Decision Routing for Mixture-of-Modality Models
Signal-driven routing across model pools, safety plugins, privacy policies, and cost-aware selection.
Research
Routing, evaluation, serving, policy, memory, and verification research behind Brain and open-source vLLM Semantic Router.
0015 documents / synced index
Synced from Agentic Intelligence Lab research index.
2026 / Position paper
Signal-driven routing across model pools, safety plugins, privacy policies, and cost-aware selection.
2026 / Vision paper
A synthesis of routing, fleet planning, multimodal, and governance results into one deployment architecture.
2026 / Security
A defense-oriented treatment of perception failures in computer-use agents and click/action guardrails.
2026 / Tool routing
Latency-constrained learning for tool ranking under single-digit millisecond CPU budgets.
2026 / VLM routing
Estimates action difficulty and routes each computer-use step to the cheapest model that meets reliability targets.
2026 / Latency
Flash attention, prompt compression, and near-streaming reduce routing latency from seconds to tens of milliseconds.
2026 / Fleet planning
A queueing-theory-grounded fleet planner for sizing multi-pool GPU fleets against P99 TTFT targets.
2026 / Fleet planning
An analytical method for deriving minimum-cost two-pool fleets from workload CDFs and P99 TTFT targets.
2026 / Energy
Context-length routing topology can matter more than pure GPU generation upgrades for tokens per watt.
2026 / Policy
A framework for conflict detection when probabilistic ML predicates can silently co-fire in routing policy languages.
2026 / Agent orchestration
A cross-layer extension of the Semantic Router DSL from stateless request routing into multi-step agent workflows.
2026 / Memory
Conversational memory and retrieval-grounded routing recover most of a 235B model's performance while cutting effective inference cost.
2026 / Verification
A real-time verification component for long-document RAG that preserves grounding checks without truncated validation.
2025 / Reasoning routing
A semantic router that classifies queries by reasoning need and selectively applies reasoning only when beneficial.
2025 / Semantic caching
A category-aware semantic caching architecture where similarity thresholds, TTLs, and quotas vary by workload class.