Stained Glass Transform Proxy¶
A forward proxy service that accepts OpenAI API specification chat completion requests, transforms the chat message contents using Stained Glass Transform (SGT), forwards the transformed (obfuscated) input embeddings to a vLLM inference server and serves back the response.
Note
Since the proxy sends transformed embeddings as input to the vLLM inference server, the server must be started with chunked prefill disabled via the --no-enable-chunked-prefill
CLI option to enable support for prompt embeddings.
sequenceDiagram
autonumber
participant Client
participant SGP as Stained Glass Transform Proxy
Note right of SGP: Validate request,<br/> Transform message prompts<br/> and return/stream response from LLM API.
participant LLM as LLM API
Client->>SGP: OpenAI API Spec Request
SGP->>LLM: Inference Request
LLM-->>SGP: Inference Response
SGP-->>Client: OpenAI API Spec Response
Deployment¶
- Procure the docker image tag for
stainedglass_proxy
. -
For kubernetes (k8s), update the kubernetes deployment config with the
stainedglass_proxy
image tag: -
Set the following environment variables for your container deployment (for k8s, this needs to be done via the manifest file):
SGP_INFERENCE_SERVICE_HOST
: URL for the inference service.SGP_SGT_PATH
: SGT model file path. For the typical docker container,SGP_SGT_PATH
should be set tosgt_model.pt
.SGP_DEVICE
: Device type to match that of the SGT file. Can be either "cpu" or "cuda".SGP_API_USERNAME
: Username for the LLM API.SGP_API_PASSWORD
: Password for the LLM API.SGP_LOGGING_CONFIG_FILE
: Path to your customYAML
file for logging configuration[logging_configuration.md]SGP_MAX_NEW_TOKENS
: The default value for maximum number of tokens to be generated in the response. This value is ignored ifmax_tokens
is specified in the requests.SGP_OUTPUT_DECRYPTION
: Set this toTrue
to enable the proxy service to handle (decrypt) encrypted output tokens from thevLLM
inference server protected by Stained Glass Output Protection.- If set to
True
, the proxy will use its own keys to for output protection and will handle decryption of responses. User can interact with the standard inference APIs normally in this case, i.e., as they would if there was no output protection. - If set to
False
:- If Stained Glass Output Protection is not being used, then users can use the inference APIs as normal OpenAI spec APIs.
- If Stained Glass Output Protection is being used, then users need to generate their own keys and set the
x-client-public-key
header value to their public key in all inference requests. Users then need to handle decryption of the responses on their own. More details here.
- If set to
SGP_EPHEMERAL_KEY_REFRESH_TIME_SECONDS
: IfSGP_OUTPUT_DECRYPTION
is set toTrue
, the proxy generates cryptographic keys for secure LLM output generation and decryption. Set this env var to the periodic interval (in seconds) after which the proxy will refresh its keys during its lifetime.
Warning
Do not set SGP_API_USERNAME
and SGP_API_PASSWORD
env vars if you intend to use an API key in your requests for authentication as the proxy will overwrite the key with a value based on the username and password provided.
Warning
If SGP_OUTPUT_DECRYPTION
is set to True
when running proxy in conjunction with an inference service without Stained Glass Output Protection, proxy will respond to any inference request with an internal server error (500
status code). Please ensure that SGP_OUTPUT_DECRYPTION
is set to False
if the upstream server isn't protected.
- Deploy the container.
Usage¶
The service listens on the port 8600
. You can access the API endpoint documentation either as a JSON export via the /openapi.json
endpoint, or as a web page via the /docs
endpoint on your browser.
Chat Completions Example¶
Tip
If you'd like to self manage the encryption/decryption of generated output instead of relying on the proxy for it:
- Set
SGP_OUTPUT_DECRYPTION
toFalse
when running proxy. - Please refer to the usage documentation for Stained Glass Output Protection for instructions on how to use the
stainedglass_output_protection
python library to handle the encryption keys and output decryption. - The proxy will forward the
x-client-public-key
header key/value passed in an inference request (/v1/completions
,/v1/chat/completions
, etc) to the downstream vLLM inference server with Stained Glass Output Protection.
curl
Request¶
curl --location 'http://127.0.0.1:8600/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "mistral-7b-instruct",
"messages": [
{
"role": "user",
"content": "Write me a poem."
}
],
"max_tokens": 3000,
"temperature": 1.7,
"seed": 123456
}'
Response¶
{
"id": "9325c716-deb8-46b5-bfca-1a904a7af0c7",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"content": "In the quiet of the twilight, where the sun's last rays reside,\n\nA symphony of colors paint the sky, a breathtaking, wondrous tide.\n\nThe day has ended, and the night awakes, in hues of pink and gold,\n\nA gentle breeze caresses the earth, a whisper soft and bold.\n\nThe stars begin to twinkle, like diamonds strewn across the black,\n\nA canvas painted by the hand of God, a masterpiece to track.\n\nThe moon ascends her throne, a beacon in the night,\n\nA silver glow that bathes the world in soft, ethereal light.\n\nThe crickets sing their lullabies, a chorus of the night,\n\nA melody that soothes the soul, a balm for all that's right.\n\nIn the quiet of the twilight, where the world is still and calm,\n\nA moment of peace, a time to dream, a moment to reclaim.\n\nSo let us bask in the beauty of this twilight hour,\n\nA reminder of the magic that lies within the power,\n\nOf the simple, yet profound, enchantment of the night.",
"tool_calls": null
},
"logprobs": null
}
],
"created": 1724875449,
"model": "mistral-7b-instruct",
"service_tier": null,
"system_fingerprint": null,
"object": "chat.completion",
"usage": null
}
Next Steps¶
-
API Reference
View detailed descriptions of the endpoints and models supported by Stained Glass Transform Proxy.
-
Logging Configuration
Read about Stained Glass Transform Proxy's logging configuration.
-
Tutorials
Learn how to create and deploy a Stained Glass Transform.