Helm Chart Deployment¶

Stained Glass Transform Proxy can be deployed using the provided Helm Chart. It will deploy both the SGT Proxy and the SGT LLM API (powered by vLLM) in the same namespace, allowing for easy configuration and management for a testing or development environment.

If you are looking to install the SGT Proxy via Helm from AWS Marketplace, please instead refer to the AWS Marketplace Deployment Guide.

Prerequisites¶

Before deploying the Helm Chart, you must first prepare a few things.

Prepare Containers¶

Along with the Helm Chart, you should be provided two containers, one for the SGT Proxy and one for the SGT LLM API. These containers should be stored in a container registry that your Kubernetes cluster can access. For example, this may require pushing the containers to a private instance of AWS Elastic Container Registry (ECR) if your kubernetes cluster is Elastic Kubernetes Service (EKS). The details of the container registry depends on your Kubernetes cluster configuration.

As the container images are provided as tar.gz files, you will need to extract them and push them to your container registry. The first step is usually to use docker load to load the image into your local Docker daemon. You can then tag the image and push it to your container registry.

docker load -i sgt-proxy.tar.gz
docker tag stained-glass-transform-llama-3.1-8b-instruct-v0.1:0.19.2-c2efa8a <your-registry>/stained-glass-transform-llama-3.1-8b-instruct-v0.1:0.19.2-c2efa8a
docker push <your-registry>/stained-glass-transform-llama-3.1-8b-instruct-v0.1:0.19.2-c2efa8a

docker load -i llm-api.tar.gz
docker tag llm-api-with-vllm:0.1.0 <your-registry>/llm-api-with-vllm:0.1.0
docker push <your-registry>/llm-api-with-vllm:0.1.0

Details may vary depending on your container registry and Kubernetes cluster configuration.

The SGT Proxy container has a name like stained-glass-transform-llama-3.1-8b-instruct-v0.1:0.19.2-c2efa8a, where the bundled Stained Glass Transform model name may vary. Additionally the tag (including the version number and commit hash) may also vary.

The SGT LLM API container has a name like llm-api-with-vllm:0.1.0, where the tag (including the version number and commit hash) may vary.

Kubernetes Permissions¶

Ensure that you have the necessary permissions to deploy the Helm Chart to your Kubernetes cluster. Consult your Kubernetes administrator if you are unsure of the necessary permissions.

Helm¶

Deploying a Helm Chart requires installing the Helm CLI on your local machine. You can install Helm by following the instructions in the Helm documentation.

Hugging Face HUB API Token¶

You will need a Hugging Face Hub API token to download the model weights for the Llama 3.1 8B model. For directions on how to obtain a Hugging Face Hub API token, see the Hugging Face Hub documentation for Authentication and User Access Tokens.

After obtaining your Hugging Face Hub API token, you must request access to the Llama 3.1 8B model from the Hugging Face Hub model card. Go to the model page on the Hugging Face Hub and click the "Request access" button. Once you have been granted access to the model, you can use your Hugging Face Hub API token to download the model weights.

After you have obtained your Hugging Face Hub API token, you can pass it to the helm chart as the value --set llmApi.secrets.HF_TOKEN=<your token>. For example:

helm install stained-glass-engine \
    709825985650.dkr.ecr.us-east-1.amazonaws.com/protopia-ai/stained-glass-engine:0.1.0 \
    --set llmApi.secrets.HF_TOKEN=<your token>

Alternatively, you can set the HF_TOKEN environment variable using Kubernetes secrets or ConfigMaps. If no HF_TOKEN value is set in helm, the deployment will attempt to use the value of the HF_TOKEN value in Kubernetes secrets. This is one way to set the HF_TOKEN value in Kubernetes secrets:

Create the secret: Use the kubectl create secret command to create the secret in Kubernetes.

kubectl create secret generic stained-glass-engine-api-secrets \
    --from-literal=HF_TOKEN=<your-token>

Verify the secret: Ensure the secret has been created successfully.
```
kubectl get secret stained-glass-engine-api-secrets -o yaml
```

Deploying the Helm Chart¶

Unzip the Helm Chart¶

The Helm Chart is provided as a compressed file. Unzip the Helm Chart to a directory on your local machine. This step is optional if you are not modifying the chart or values files.

tar -xvf stained-glass-engine.tgz

Configuration¶

Ensure that the llmApi.image and sgProxy.image values in the values.yaml file are set to the correct container images, described in the Prepare Containers section.

See the unzipped Helm Chart directory for the values.yaml file to view the values that can be modified, including (but not limited to) resources, replicas, environment variables, and image names.

You can override the default values, create a new partial values file, or set individual values using the --values or --set flags when deploying the Helm Chart.

Deploying¶

To deploy the Helm Chart, run the following command:

helm install stained-glass-engine ./stained-glass-engine

You can also specify custom values files or individual values using the --values or --set flags. For example:

helm install stained-glass-engine ./stained-glass-engine --values my-values.yaml

helm install stained-glass-engine ./stained-glass-engine --set llmApi.image=<your-registry>/llm-api-with-vllm:0.1.0 --set sgProxy.image=<your-registry>/stained-glass-transform-llama-3.1-8b-instruct-v0.1:0.19.2-c2efa8a

Inference using Stained Glass Proxy¶

Connecting to Stained Glass Proxy¶

The Helm Chart deployment does not include an ingress controller. You must set up an ingress or forward the service port to access the Stained Glass Proxy service.

You can test your connection using its built-in Swagger UI at the /docs endpoint.

Interacting with the Stained Glass Proxy API¶

Once you can connect to the Stained Glass Proxy service, you can interact with its REST API to perform inference (see the API Reference for more details). The REST API is OpenAI-compatible, so you can use tools such as OpenAI's client or LangChain to interact with the service. See Tutorials for examples of how to use the service.