vllm
vLLM plugin for Stained Glass Output Protection.
Consists of several major components, each of which has its own module. See each module for more details.
As of vLLM 0.21.0, the vllm.general_plugins entry point system loads plugins in all vLLM processes, including the main OpenAI-compatible
API server process. Specifically, EngineArgs.create_engine_config (vllm/engine/arg_utils.py:731) calls load_general_plugins() in the
API server process, and EngineCore.__init__ (vllm/v1/engine/core.py:105) calls it again in the engine subprocess.
An alternative entrypoint is still required for launching a vLLM OpenAI-compatible RESTful API server with Stained Glass Output Protection
enabled, because the output-protection patches (FastAPI middleware, ephemeral key rotation, encrypted RequestOutput) target the HTTP layer
and cannot be applied via the vllm.general_plugins mechanism.
Warning
vLLM will not run properly if the stainedglass_output_protection package is installed (i.e. the plugin is installed), but vLLM is not
launched via the alternative entrypoint. Likewise, if the plugin is not installed, but the alternative entrypoint is invoked, vLLM will
also not run properly.
Warning
Under almost no circumstances should you need to import this package directly. If stainedglass_output_protection is installed, and
you launch vLLM via the alternative entrypoint, this module will be automatically applied.
Modules:
| Name | Description |
|---|---|
entrypoint |
Alternative entrypoint for launching a vLLM OpenAI-compatible RESTful API server with Stained Glass Output Protection enabled. |
middleware |
Middleware for the vLLM OpenAI-compatible RESTful API server that reads a user-provided public key from the request headers, registers |
parsers |
Disable server-side tool/reasoning parsing under Output Protection (OP) Encryption. |
registry |
User key registry for Stained Glass Output Protection in vLLM, shared across all vLLM processes. |
request_output |
Patched vLLM |
server_keys |
Utilities for managing ephemeral server keys in a FastAPI application. |
turboquant_plugin |
vLLM general plugin that patches prompt-embed loading to decode TurboQuant-compressed payloads. |