Skip to content

vllm

vLLM plugin for Stained Glass Output Protection.

Consists of several major components, each of which has its own module. See each module for more details.

As of vLLM 0.21.0, the vllm.general_plugins entry point system loads plugins in all vLLM processes, including the main OpenAI-compatible API server process. Specifically, EngineArgs.create_engine_config (vllm/engine/arg_utils.py:731) calls load_general_plugins() in the API server process, and EngineCore.__init__ (vllm/v1/engine/core.py:105) calls it again in the engine subprocess.

An alternative entrypoint is still required for launching a vLLM OpenAI-compatible RESTful API server with Stained Glass Output Protection enabled, because the output-protection patches (FastAPI middleware, ephemeral key rotation, encrypted RequestOutput) target the HTTP layer and cannot be applied via the vllm.general_plugins mechanism.

Warning

vLLM will not run properly if the stainedglass_output_protection package is installed (i.e. the plugin is installed), but vLLM is not launched via the alternative entrypoint. Likewise, if the plugin is not installed, but the alternative entrypoint is invoked, vLLM will also not run properly.

Warning

Under almost no circumstances should you need to import this package directly. If stainedglass_output_protection is installed, and you launch vLLM via the alternative entrypoint, this module will be automatically applied.

Modules:

Name Description
entrypoint

Alternative entrypoint for launching a vLLM OpenAI-compatible RESTful API server with Stained Glass Output Protection enabled.

middleware

Middleware for the vLLM OpenAI-compatible RESTful API server that reads a user-provided public key from the request headers, registers

parsers

Disable server-side tool/reasoning parsing under Output Protection (OP) Encryption.

registry

User key registry for Stained Glass Output Protection in vLLM, shared across all vLLM processes.

request_output

Patched vLLM RequestOutput class with encrypted text fields.

server_keys

Utilities for managing ephemeral server keys in a FastAPI application.

turboquant_plugin

vLLM general plugin that patches prompt-embed loading to decode TurboQuant-compressed payloads.