entrypoint
Alternative entrypoint for launching a vLLM OpenAI-compatible RESTful API server with Stained Glass Output Protection enabled.
This entrypoint performs all of the patches necessary to integrate the Stained Glass Output Protection plugin into vLLM's OpenAI-compatible RESTful API server (running in the main process, not covered by the vLLM plugin system).
This entrypoint can be launched via the command line as follows:
export HUGGING_FACE_HUB_TOKEN=<secret>
export SG_REGISTRY_CONNECTION_SECRET=<secret>
python -m stainedglass_output_protection.vllm.entrypoint --no-enable-chunked-prefill --enable-prompt-embeds --model meta-llama/Meta-Llama-3.1-8B-Instruct
The resulting vLLM server can be available at http://localhost:8000/
, and will expose an OpenAI compatible API that accepts prompt embeds.
Any CLI arguments that are valid for vllm serve
can be passed
to the container in this command, except for --enable-chunked-prefill
, which is not compatible with --enable-prompt-embeds
.
Warning
This entrypoint must be used to launch a vLLM OpenAI-compatible RESTful API server with Stained Glass Output Protection enabled. If the
stainedglass_output_protection
package is not installed, but this entrypoint is used, vLLM will not run properly.
Functions:
Name | Description |
---|---|
launch_vllm_with_output_protection |
Register the vLLM plugin, then launch a vLLM OpenAI-compatible RESTful API server. |
register_vllm_plugin |
Register the vLLM Plugin that patches in the Middleware and EncryptedRequestOutput. |