vllm

vLLM plugin for Stained Glass Output Protection.

Consists of several major components, each of which has its own module. See each module for more details.

As of vLLM 0.21.0, the vllm.general_plugins entry point system loads plugins in all vLLM processes, including the main OpenAI-compatible API server process. Specifically, EngineArgs.create_engine_config (vllm/engine/arg_utils.py:731) calls load_general_plugins() in the API server process, and EngineCore.__init__ (vllm/v1/engine/core.py:105) calls it again in the engine subprocess.

An alternative entrypoint is still required for launching a vLLM OpenAI-compatible RESTful API server with Stained Glass Output Protection enabled, because the output-protection patches (FastAPI middleware, ephemeral key rotation, encrypted RequestOutput) target the HTTP layer and cannot be applied via the vllm.general_plugins mechanism.

Warning

vLLM will not run properly if the stainedglass_output_protection package is installed (i.e. the plugin is installed), but vLLM is not launched via the alternative entrypoint. Likewise, if the plugin is not installed, but the alternative entrypoint is invoked, vLLM will also not run properly.

Warning

Under almost no circumstances should you need to import this package directly. If stainedglass_output_protection is installed, and you launch vLLM via the alternative entrypoint, this module will be automatically applied.

Modules:

Name	Description
`entrypoint`	Alternative entrypoint for launching a vLLM OpenAI-compatible RESTful API server with Stained Glass Output Protection enabled.
`middleware`	Middleware for the vLLM OpenAI-compatible RESTful API server that reads a user-provided public key from the request headers, registers
`parsers`	Disable server-side tool/reasoning parsing under Output Protection (OP) Encryption.
`registry`	User key registry for Stained Glass Output Protection in vLLM, shared across all vLLM processes.
`request_output`	Patched vLLM `RequestOutput` class with encrypted text fields.
`server_keys`	Utilities for managing ephemeral server keys in a FastAPI application.
`turboquant_plugin`	vLLM general plugin that patches prompt-embed loading to decode TurboQuant-compressed payloads.