Skip to content

entrypoint

Alternative entrypoint for launching a vLLM OpenAI-compatible RESTful API server with Stained Glass Output Protection enabled.

This entrypoint performs all of the patches necessary to integrate the Stained Glass Output Protection plugin into vLLM's OpenAI-compatible RESTful API server (running in the main process, not covered by the vLLM plugin system).

This entrypoint can be launched via the command line as follows:

export HUGGING_FACE_HUB_TOKEN=<secret>
export SG_REGISTRY_CONNECTION_SECRET=<secret>
python -m stainedglass_output_protection.vllm.entrypoint     --no-enable-chunked-prefill     --enable-prompt-embeds     --model meta-llama/Meta-Llama-3.1-8B-Instruct

The resulting vLLM server can be available at http://localhost:8000/, and will expose an OpenAI compatible API that accepts prompt embeds.

Any CLI arguments that are valid for vllm serve can be passed to the container in this command, except for --enable-chunked-prefill, which is not compatible with --enable-prompt-embeds.

Warning

This entrypoint must be used to launch a vLLM OpenAI-compatible RESTful API server with Stained Glass Output Protection enabled. If the stainedglass_output_protection package is not installed, but this entrypoint is used, vLLM will not run properly.

Functions:

Name Description
launch_vllm_with_output_protection

Register the vLLM plugin, then launch a vLLM OpenAI-compatible RESTful API server.

register_vllm_plugin

Register the vLLM Plugin that patches in the Middleware and EncryptedRequestOutput.

launch_vllm_with_output_protection

launch_vllm_with_output_protection() -> None

Register the vLLM plugin, then launch a vLLM OpenAI-compatible RESTful API server.

register_vllm_plugin

register_vllm_plugin() -> None

Register the vLLM Plugin that patches in the Middleware and EncryptedRequestOutput.