Stained Glass Transform Proxy¶

A forward proxy service that accepts OpenAI API specification chat completion requests, transforms the chat message contents using Stained Glass Transform (SGT), forwards the transformed (obfuscated) input embeddings to a vLLM inference server and serves back the response.

Note

Since the proxy sends transformed embeddings as input to the vLLM inference server, the server must be started with chunked prefill disabled via the --no-enable-chunked-prefill CLI option to enable support for prompt embeddings.

sequenceDiagram
    autonumber
    participant Client
    participant SGP as Stained Glass Transform Proxy
    Note right of SGP: Validate request,<br/> Transform message prompts<br/> and return/stream response from LLM API.
    participant LLM as LLM API
    Client->>SGP: OpenAI API Spec Request
    SGP->>LLM: Inference Request
    LLM-->>SGP: Inference Response
    SGP-->>Client: OpenAI API Spec Response

Deployment¶

Procure the docker image tag for stainedglass_proxy.
For kubernetes (k8s), update the kubernetes deployment config with the stainedglass_proxy image tag:
```
image: stainedglass_proxy:<your-image-tag-here>
```
Set the following environment variables for your container deployment (for k8s, this needs to be done via the manifest file):
- SGP_INFERENCE_SERVICE_HOST: URL for the inference service.
- SGP_SGT_PATH: SGT model file path. For the typical docker container, SGP_SGT_PATH should be set to sgt_model.pt.
- SGP_DEVICE: Device type to match that of the SGT file. Can be either "cpu" or "cuda".
- SGP_API_USERNAME: Username for the LLM API.
- SGP_API_PASSWORD: Password for the LLM API.
- SGP_LOGGING_CONFIG_FILE: Path to your custom YAML file for logging configuration[logging_configuration.md]
- SGP_MAX_NEW_TOKENS: The default value for maximum number of tokens to be generated in the response. This value is ignored if max_tokens is specified in the requests.
- SGP_OUTPUT_DECRYPTION: Set this to True to enable the proxy service to handle (decrypt) encrypted output tokens from the vLLM inference server protected by Stained Glass Output Protection.
  - If set to True, the proxy will use its own keys to for output protection and will handle decryption of responses. User can interact with the standard inference APIs normally in this case, i.e., as they would if there was no output protection.
  - If set to False:
    - If Stained Glass Output Protection is not being used, then users can use the inference APIs as normal OpenAI spec APIs.
    - If Stained Glass Output Protection is being used, then users need to generate their own keys and set the x-client-public-key header value to their public key in all inference requests. Users then need to handle decryption of the responses on their own. More details here.
- SGP_EPHEMERAL_KEY_REFRESH_TIME_SECONDS: If SGP_OUTPUT_DECRYPTION is set to True, the proxy generates cryptographic keys for secure LLM output generation and decryption. Set this env var to the periodic interval (in seconds) after which the proxy will refresh its keys during its lifetime.

Warning

Do not set SGP_API_USERNAME and SGP_API_PASSWORD env vars if you intend to use an API key in your requests for authentication as the proxy will overwrite the key with a value based on the username and password provided.

Warning

If SGP_OUTPUT_DECRYPTION is set to True when running proxy in conjunction with an inference service without Stained Glass Output Protection, proxy will respond to any inference request with an internal server error (500 status code). Please ensure that SGP_OUTPUT_DECRYPTION is set to False if the upstream server isn't protected.

Deploy the container.

Usage¶

The service listens on the port 8600. You can access the API endpoint documentation either as a JSON export via the /openapi.json endpoint, or as a web page via the /docs endpoint on your browser.

Chat Completions Example¶

Tip

If you'd like to self manage the encryption/decryption of generated output instead of relying on the proxy for it:

Set SGP_OUTPUT_DECRYPTION to False when running proxy.
Please refer to the usage documentation for Stained Glass Output Protection for instructions on how to use the stainedglass_output_protection python library to handle the encryption keys and output decryption.
The proxy will forward the x-client-public-key header key/value passed in an inference request (/v1/completions, /v1/chat/completions, etc) to the downstream vLLM inference server with Stained Glass Output Protection.

`curl` Request¶

curl --location 'http://127.0.0.1:8600/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "model": "mistral-7b-instruct",
    "messages": [
        {
            "role": "user",
            "content": "Write me a poem."
        }
    ],
    "max_tokens": 3000,
    "temperature": 1.7,
    "seed": 123456
}'

Response¶

{
  "id": "9325c716-deb8-46b5-bfca-1a904a7af0c7",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "In the quiet of the twilight, where the sun's last rays reside,\n\nA symphony of colors paint the sky, a breathtaking, wondrous tide.\n\nThe day has ended, and the night awakes, in hues of pink and gold,\n\nA gentle breeze caresses the earth, a whisper soft and bold.\n\nThe stars begin to twinkle, like diamonds strewn across the black,\n\nA canvas painted by the hand of God, a masterpiece to track.\n\nThe moon ascends her throne, a beacon in the night,\n\nA silver glow that bathes the world in soft, ethereal light.\n\nThe crickets sing their lullabies, a chorus of the night,\n\nA melody that soothes the soul, a balm for all that's right.\n\nIn the quiet of the twilight, where the world is still and calm,\n\nA moment of peace, a time to dream, a moment to reclaim.\n\nSo let us bask in the beauty of this twilight hour,\n\nA reminder of the magic that lies within the power,\n\nOf the simple, yet profound, enchantment of the night.",
        "tool_calls": null
      },
      "logprobs": null
    }
  ],
  "created": 1724875449,
  "model": "mistral-7b-instruct",
  "service_tier": null,
  "system_fingerprint": null,
  "object": "chat.completion",
  "usage": null
}

Next Steps¶

API Reference

View detailed descriptions of the endpoints and models supported by Stained Glass Transform Proxy.

API Reference
Logging Configuration

Read about Stained Glass Transform Proxy's logging configuration.

Logging Configuration
Tutorials

Learn how to create and deploy a Stained Glass Transform.

Tutorials