Inference with Stained Glass Transform (SGT) Proxy and vLLM¶
This notebook demonstrates:
- Various use-cases of inference from a vLLM instance running a Llama base model via OpenAI Chat Completions API compatible clients while using Stained Glass Transform Proxy to protect user input prompts.
- Accessing the input embeddings (transformed and otherwise) and the reconstructed prompt from the transformed embeddings.
Inference¶
Pre-requisites¶
- A live instance of vLLM (>=v0.9.1) OpenAI-Compatible Server, with prompt embeddings enabled.
- A live instance of SGT Proxy (Please refer to the deployment instructions).
Chat Completions¶
We can perform inference on the vLLM instance by hitting the SGT Proxy's OpenAI Chat Completions API compatible endpoint via the following common interfaces. Let's walk through these methods.
Configuration Required
Update these parameters for your specific setup:
- PROXY_URL: Your proxy server endpoint
- MODEL_NAME: The base model you want to test
- API_KEY: Your authentication key
In [ ]:
Copied!
# Set proxy access parameters.
PROXY_URL = "http://127.0.0.1:8601/v1"
MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct"
API_KEY = "<overwrite-with-your-api-key>"
# Set proxy access parameters.
PROXY_URL = "http://127.0.0.1:8601/v1"
MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct"
API_KEY = ""
OpenAI Client¶
- Install
openai
python package.
In [2]:
Copied!
%pip install openai
%pip install openai
Requirement already satisfied: openai in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (1.107.3) Requirement already satisfied: anyio<5,>=3.5.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai) (4.9.0) Requirement already satisfied: distro<2,>=1.7.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai) (1.9.0) Requirement already satisfied: httpx<1,>=0.23.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai) (0.28.1) Requirement already satisfied: jiter<1,>=0.4.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai) (0.10.0) Requirement already satisfied: pydantic<3,>=1.9.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai) (2.12.0) Requirement already satisfied: sniffio in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai) (1.3.1) Requirement already satisfied: tqdm>4 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai) (4.67.1) Requirement already satisfied: typing-extensions<5,>=4.11 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai) (4.15.0) Requirement already satisfied: idna>=2.8 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from anyio<5,>=3.5.0->openai) (3.10) Requirement already satisfied: certifi in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from httpx<1,>=0.23.0->openai) (2025.1.31) Requirement already satisfied: httpcore==1.* in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from httpx<1,>=0.23.0->openai) (1.0.7) Requirement already satisfied: h11<0.15,>=0.13 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from httpcore==1.*->httpx<1,>=0.23.0->openai) (0.14.0) Requirement already satisfied: annotated-types>=0.6.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from pydantic<3,>=1.9.0->openai) (0.7.0) Requirement already satisfied: pydantic-core==2.41.1 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from pydantic<3,>=1.9.0->openai) (2.41.1) Requirement already satisfied: typing-inspection>=0.4.2 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from pydantic<3,>=1.9.0->openai) (0.4.2) Note: you may need to restart the kernel to use updated packages.
- Perform inference.
In [3]:
Copied!
import openai
client = openai.OpenAI(base_url=PROXY_URL, api_key=API_KEY)
response = client.chat.completions.create(
model=MODEL_NAME,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{
"role": "assistant",
"content": "The Los Angeles Dodgers won the World Series in 2020.",
},
{"role": "user", "content": "Where was it played?"},
],
)
print(response.choices[0].message.content)
import openai
client = openai.OpenAI(base_url=PROXY_URL, api_key=API_KEY)
response = client.chat.completions.create(
model=MODEL_NAME,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{
"role": "assistant",
"content": "The Los Angeles Dodgers won the World Series in 2020.",
},
{"role": "user", "content": "Where was it played?"},
],
)
print(response.choices[0].message.content)
The 2020 World Series was played at Globe Life Field in Arlington, Texas
LangChain¶
- Install
langchain-openai
python package.
In [4]:
Copied!
%pip install langchain-openai
%pip install langchain-openai
Requirement already satisfied: langchain-openai in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (0.3.27) Requirement already satisfied: langchain-core<1.0.0,>=0.3.66 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from langchain-openai) (0.3.67) Requirement already satisfied: openai<2.0.0,>=1.86.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from langchain-openai) (1.107.3) Requirement already satisfied: tiktoken<1,>=0.7 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from langchain-openai) (0.9.0) Requirement already satisfied: langsmith>=0.3.45 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from langchain-core<1.0.0,>=0.3.66->langchain-openai) (0.4.4) Requirement already satisfied: tenacity!=8.4.0,<10.0.0,>=8.1.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from langchain-core<1.0.0,>=0.3.66->langchain-openai) (9.1.2) Requirement already satisfied: jsonpatch<2.0,>=1.33 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from langchain-core<1.0.0,>=0.3.66->langchain-openai) (1.33) Requirement already satisfied: PyYAML>=5.3 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from langchain-core<1.0.0,>=0.3.66->langchain-openai) (6.0.2) Requirement already satisfied: packaging<25,>=23.2 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from langchain-core<1.0.0,>=0.3.66->langchain-openai) (24.2) Requirement already satisfied: typing-extensions>=4.7 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from langchain-core<1.0.0,>=0.3.66->langchain-openai) (4.15.0) Requirement already satisfied: pydantic>=2.7.4 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from langchain-core<1.0.0,>=0.3.66->langchain-openai) (2.12.0) Requirement already satisfied: jsonpointer>=1.9 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from jsonpatch<2.0,>=1.33->langchain-core<1.0.0,>=0.3.66->langchain-openai) (3.0.0) Requirement already satisfied: anyio<5,>=3.5.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai<2.0.0,>=1.86.0->langchain-openai) (4.9.0) Requirement already satisfied: distro<2,>=1.7.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai<2.0.0,>=1.86.0->langchain-openai) (1.9.0) Requirement already satisfied: httpx<1,>=0.23.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai<2.0.0,>=1.86.0->langchain-openai) (0.28.1) Requirement already satisfied: jiter<1,>=0.4.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai<2.0.0,>=1.86.0->langchain-openai) (0.10.0) Requirement already satisfied: sniffio in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai<2.0.0,>=1.86.0->langchain-openai) (1.3.1) Requirement already satisfied: tqdm>4 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai<2.0.0,>=1.86.0->langchain-openai) (4.67.1) Requirement already satisfied: idna>=2.8 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from anyio<5,>=3.5.0->openai<2.0.0,>=1.86.0->langchain-openai) (3.10) Requirement already satisfied: certifi in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from httpx<1,>=0.23.0->openai<2.0.0,>=1.86.0->langchain-openai) (2025.1.31) Requirement already satisfied: httpcore==1.* in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from httpx<1,>=0.23.0->openai<2.0.0,>=1.86.0->langchain-openai) (1.0.7) Requirement already satisfied: h11<0.15,>=0.13 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from httpcore==1.*->httpx<1,>=0.23.0->openai<2.0.0,>=1.86.0->langchain-openai) (0.14.0) Requirement already satisfied: annotated-types>=0.6.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from pydantic>=2.7.4->langchain-core<1.0.0,>=0.3.66->langchain-openai) (0.7.0) Requirement already satisfied: pydantic-core==2.41.1 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from pydantic>=2.7.4->langchain-core<1.0.0,>=0.3.66->langchain-openai) (2.41.1) Requirement already satisfied: typing-inspection>=0.4.2 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from pydantic>=2.7.4->langchain-core<1.0.0,>=0.3.66->langchain-openai) (0.4.2) Requirement already satisfied: regex>=2022.1.18 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from tiktoken<1,>=0.7->langchain-openai) (2024.11.6) Requirement already satisfied: requests>=2.26.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from tiktoken<1,>=0.7->langchain-openai) (2.32.3) Requirement already satisfied: orjson<4.0.0,>=3.9.14 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from langsmith>=0.3.45->langchain-core<1.0.0,>=0.3.66->langchain-openai) (3.11.0) Requirement already satisfied: requests-toolbelt<2.0.0,>=1.0.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from langsmith>=0.3.45->langchain-core<1.0.0,>=0.3.66->langchain-openai) (1.0.0) Requirement already satisfied: zstandard<0.24.0,>=0.23.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from langsmith>=0.3.45->langchain-core<1.0.0,>=0.3.66->langchain-openai) (0.23.0) Requirement already satisfied: charset-normalizer<4,>=2 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from requests>=2.26.0->tiktoken<1,>=0.7->langchain-openai) (3.4.1) Requirement already satisfied: urllib3<3,>=1.21.1 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from requests>=2.26.0->tiktoken<1,>=0.7->langchain-openai) (2.3.0) Note: you may need to restart the kernel to use updated packages.
- Perform inference.
In [5]:
Copied!
import langchain_openai
from langchain_core import output_parsers, prompts
llm = langchain_openai.ChatOpenAI(
model=MODEL_NAME, base_url=PROXY_URL, api_key=API_KEY
)
prompt = prompts.ChatPromptTemplate.from_messages(
[
("system", "You are a helpful assistant."),
("user", "Who won the world series in 2020?"),
("assistant", "The Los Angeles Dodgers won the World Series in 2020."),
("user", "{input}"),
]
)
output_parser = output_parsers.StrOutputParser()
chain = prompt | llm | output_parser
print(chain.invoke({"input": "Where was it played?"}))
import langchain_openai
from langchain_core import output_parsers, prompts
llm = langchain_openai.ChatOpenAI(
model=MODEL_NAME, base_url=PROXY_URL, api_key=API_KEY
)
prompt = prompts.ChatPromptTemplate.from_messages(
[
("system", "You are a helpful assistant."),
("user", "Who won the world series in 2020?"),
("assistant", "The Los Angeles Dodgers won the World Series in 2020."),
("user", "{input}"),
]
)
output_parser = output_parsers.StrOutputParser()
chain = prompt | llm | output_parser
print(chain.invoke({"input": "Where was it played?"}))
The 2020 World Series was played at Globe Life Field in Arlington, Texas
LiteLLM¶
- Install
litellm
python package.
In [6]:
Copied!
%pip install litellm
%pip install litellm
Collecting litellm Downloading litellm-1.77.7-py3-none-any.whl.metadata (42 kB) Requirement already satisfied: aiohttp>=3.10 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from litellm) (3.12.15) Requirement already satisfied: click in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from litellm) (8.1.8) Collecting fastuuid>=0.13.0 (from litellm) Downloading fastuuid-0.13.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.0 kB) Requirement already satisfied: httpx>=0.23.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from litellm) (0.28.1) Requirement already satisfied: importlib-metadata>=6.8.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from litellm) (8.7.0) Requirement already satisfied: jinja2<4.0.0,>=3.1.2 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from litellm) (3.1.6) Requirement already satisfied: jsonschema<5.0.0,>=4.22.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from litellm) (4.24.0) Requirement already satisfied: openai>=1.99.5 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from litellm) (1.107.3) Requirement already satisfied: pydantic<3.0.0,>=2.5.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from litellm) (2.12.0) Requirement already satisfied: python-dotenv>=0.2.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from litellm) (1.1.1) Requirement already satisfied: tiktoken>=0.7.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from litellm) (0.9.0) Requirement already satisfied: tokenizers in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from litellm) (0.22.0) Requirement already satisfied: MarkupSafe>=2.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from jinja2<4.0.0,>=3.1.2->litellm) (2.1.5) Requirement already satisfied: attrs>=22.2.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from jsonschema<5.0.0,>=4.22.0->litellm) (23.2.0) Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from jsonschema<5.0.0,>=4.22.0->litellm) (2025.4.1) Requirement already satisfied: referencing>=0.28.4 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from jsonschema<5.0.0,>=4.22.0->litellm) (0.36.2) Requirement already satisfied: rpds-py>=0.7.1 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from jsonschema<5.0.0,>=4.22.0->litellm) (0.25.1) Requirement already satisfied: annotated-types>=0.6.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from pydantic<3.0.0,>=2.5.0->litellm) (0.7.0) Requirement already satisfied: pydantic-core==2.41.1 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from pydantic<3.0.0,>=2.5.0->litellm) (2.41.1) Requirement already satisfied: typing-extensions>=4.14.1 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from pydantic<3.0.0,>=2.5.0->litellm) (4.15.0) Requirement already satisfied: typing-inspection>=0.4.2 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from pydantic<3.0.0,>=2.5.0->litellm) (0.4.2) Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from aiohttp>=3.10->litellm) (2.6.1) Requirement already satisfied: aiosignal>=1.4.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from aiohttp>=3.10->litellm) (1.4.0) Requirement already satisfied: frozenlist>=1.1.1 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from aiohttp>=3.10->litellm) (1.7.0) Requirement already satisfied: multidict<7.0,>=4.5 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from aiohttp>=3.10->litellm) (6.6.3) Requirement already satisfied: propcache>=0.2.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from aiohttp>=3.10->litellm) (0.3.2) Requirement already satisfied: yarl<2.0,>=1.17.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from aiohttp>=3.10->litellm) (1.20.1) Requirement already satisfied: idna>=2.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from yarl<2.0,>=1.17.0->aiohttp>=3.10->litellm) (3.10) Requirement already satisfied: anyio in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from httpx>=0.23.0->litellm) (4.9.0) Requirement already satisfied: certifi in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from httpx>=0.23.0->litellm) (2025.1.31) Requirement already satisfied: httpcore==1.* in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from httpx>=0.23.0->litellm) (1.0.7) Requirement already satisfied: h11<0.15,>=0.13 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from httpcore==1.*->httpx>=0.23.0->litellm) (0.14.0) Requirement already satisfied: zipp>=3.20 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from importlib-metadata>=6.8.0->litellm) (3.23.0) Requirement already satisfied: distro<2,>=1.7.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai>=1.99.5->litellm) (1.9.0) Requirement already satisfied: jiter<1,>=0.4.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai>=1.99.5->litellm) (0.10.0) Requirement already satisfied: sniffio in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai>=1.99.5->litellm) (1.3.1) Requirement already satisfied: tqdm>4 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai>=1.99.5->litellm) (4.67.1) Requirement already satisfied: regex>=2022.1.18 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from tiktoken>=0.7.0->litellm) (2024.11.6) Requirement already satisfied: requests>=2.26.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from tiktoken>=0.7.0->litellm) (2.32.3) Requirement already satisfied: charset-normalizer<4,>=2 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from requests>=2.26.0->tiktoken>=0.7.0->litellm) (3.4.1) Requirement already satisfied: urllib3<3,>=1.21.1 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from requests>=2.26.0->tiktoken>=0.7.0->litellm) (2.3.0) Requirement already satisfied: huggingface-hub<1.0,>=0.16.4 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from tokenizers->litellm) (0.34.4) Requirement already satisfied: filelock in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from huggingface-hub<1.0,>=0.16.4->tokenizers->litellm) (3.18.0) Requirement already satisfied: fsspec>=2023.5.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from huggingface-hub<1.0,>=0.16.4->tokenizers->litellm) (2025.3.0) Requirement already satisfied: packaging>=20.9 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from huggingface-hub<1.0,>=0.16.4->tokenizers->litellm) (24.2) Requirement already satisfied: pyyaml>=5.1 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from huggingface-hub<1.0,>=0.16.4->tokenizers->litellm) (6.0.2) Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from huggingface-hub<1.0,>=0.16.4->tokenizers->litellm) (1.1.5) Downloading litellm-1.77.7-py3-none-any.whl (9.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.2/9.2 MB 74.3 MB/s eta 0:00:00 Downloading fastuuid-0.13.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (272 kB) Installing collected packages: fastuuid, litellm ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2/2 [litellm]m1/2 [litellm] Successfully installed fastuuid-0.13.5 litellm-1.77.7 Note: you may need to restart the kernel to use updated packages.
- Perform inference.
In [7]:
Copied!
import litellm
response = litellm.completion(
model=f"openai/{MODEL_NAME}",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{
"role": "assistant",
"content": "The Los Angeles Dodgers won the World Series in 2020.",
},
{"role": "user", "content": "Where was it played?"},
],
base_url=PROXY_URL,
api_key=API_KEY,
)
print(response.choices[0].message.content)
import litellm
response = litellm.completion(
model=f"openai/{MODEL_NAME}",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{
"role": "assistant",
"content": "The Los Angeles Dodgers won the World Series in 2020.",
},
{"role": "user", "content": "Where was it played?"},
],
base_url=PROXY_URL,
api_key=API_KEY,
)
print(response.choices[0].message.content)
The 2020 World Series was played at Globe Life Field in Arlington, Texas
Magentic¶
- Install
magentic
python package.
In [8]:
Copied!
%pip install magentic
%pip install magentic
Collecting magentic Downloading magentic-0.40.0-py3-none-any.whl.metadata (21 kB) Collecting filetype>=1.2.0 (from magentic) Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB) Collecting logfire-api>=0.1.0 (from magentic) Downloading logfire_api-4.12.0-py3-none-any.whl.metadata (972 bytes) Requirement already satisfied: openai>=1.56.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from magentic) (1.107.3) Requirement already satisfied: pydantic-settings>=2.0.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from magentic) (2.11.0) Requirement already satisfied: pydantic>=2.10.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from magentic) (2.12.0) Requirement already satisfied: typing-extensions>=4.5.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from magentic) (4.15.0) Requirement already satisfied: anyio<5,>=3.5.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai>=1.56.0->magentic) (4.9.0) Requirement already satisfied: distro<2,>=1.7.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai>=1.56.0->magentic) (1.9.0) Requirement already satisfied: httpx<1,>=0.23.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai>=1.56.0->magentic) (0.28.1) Requirement already satisfied: jiter<1,>=0.4.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai>=1.56.0->magentic) (0.10.0) Requirement already satisfied: sniffio in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai>=1.56.0->magentic) (1.3.1) Requirement already satisfied: tqdm>4 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from openai>=1.56.0->magentic) (4.67.1) Requirement already satisfied: idna>=2.8 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from anyio<5,>=3.5.0->openai>=1.56.0->magentic) (3.10) Requirement already satisfied: certifi in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from httpx<1,>=0.23.0->openai>=1.56.0->magentic) (2025.1.31) Requirement already satisfied: httpcore==1.* in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from httpx<1,>=0.23.0->openai>=1.56.0->magentic) (1.0.7) Requirement already satisfied: h11<0.15,>=0.13 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from httpcore==1.*->httpx<1,>=0.23.0->openai>=1.56.0->magentic) (0.14.0) Requirement already satisfied: annotated-types>=0.6.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from pydantic>=2.10.0->magentic) (0.7.0) Requirement already satisfied: pydantic-core==2.41.1 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from pydantic>=2.10.0->magentic) (2.41.1) Requirement already satisfied: typing-inspection>=0.4.2 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from pydantic>=2.10.0->magentic) (0.4.2) Requirement already satisfied: python-dotenv>=0.21.0 in /home/caleb/.conda/envs/proxy312/lib/python3.12/site-packages (from pydantic-settings>=2.0.0->magentic) (1.1.1) Downloading magentic-0.40.0-py3-none-any.whl (54 kB) Downloading filetype-1.2.0-py2.py3-none-any.whl (19 kB) Downloading logfire_api-4.12.0-py3-none-any.whl (94 kB) Installing collected packages: filetype, logfire-api, magentic ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3/3 [magentic] Successfully installed filetype-1.2.0 logfire-api-4.12.0 magentic-0.40.0 Note: you may need to restart the kernel to use updated packages.
- Perform inference.
In [9]:
Copied!
import magentic
@magentic.chatprompt(
magentic.SystemMessage("You are a helpful assistant."),
magentic.UserMessage("Who won the world series in 2020?"),
magentic.AssistantMessage(
"The Los Angeles Dodgers won the World Series in 2020."
),
magentic.UserMessage("{prompt}"),
)
def get_response(prompt: str) -> str:
"""Use magentic to get a response to the chat history and prompt.
Magentic will automatically fill in the appropriate OpenAI API calls, which
is why this function definition is empty.
Args:
prompt: The prompt to ask the model as the final user message.
Returns:
The response from the model to the prompt and chat history.
"""
with magentic.OpenaiChatModel(MODEL_NAME, api_key=API_KEY, base_url=PROXY_URL):
response = get_response("Where was it played?")
print(response)
import magentic
@magentic.chatprompt(
magentic.SystemMessage("You are a helpful assistant."),
magentic.UserMessage("Who won the world series in 2020?"),
magentic.AssistantMessage(
"The Los Angeles Dodgers won the World Series in 2020."
),
magentic.UserMessage("{prompt}"),
)
def get_response(prompt: str) -> str:
"""Use magentic to get a response to the chat history and prompt.
Magentic will automatically fill in the appropriate OpenAI API calls, which
is why this function definition is empty.
Args:
prompt: The prompt to ask the model as the final user message.
Returns:
The response from the model to the prompt and chat history.
"""
with magentic.OpenaiChatModel(MODEL_NAME, api_key=API_KEY, base_url=PROXY_URL):
response = get_response("Where was it played?")
print(response)
The 2020 World Series was played at Globe Life Field in Arlington, Texas
curl¶
Request
In [ ]:
Copied!
%%bash
curl --location 'http://127.0.0.1:8601/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
{"role": "user", "content": "Where was it played?"}
],
"max_tokens": 3000,
"temperature": 1.7,
"seed": 123456
}' | python -m json.tool
%%bash
curl --location 'http://127.0.0.1:8601/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
{"role": "user", "content": "Where was it played?"}
],
"max_tokens": 3000,
"temperature": 1.7,
"seed": 123456
}' | python -m json.tool
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1183 100 735 100 448 592 361 0:00:01 0:00:01 --:--:-- 0:-- 447:00:01 --:--:-- 954
{ "id": "chatcmpl-e24cd5ddfc264b0ab0717885ee46e988", "object": "chat.completion", "created": 1760031272, "model": "meta-llama/Llama-3.1-8B-Instruct", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The 2020 Major League Baseball (MLB) World Series took place from October 20th to October 27th at EMPTY Public gam\u00e9szones.", "refusal": null, "annotations": null, "audio": null, "function_call": null, "tool_calls": [], "reasoning_content": null }, "logprobs": null, "finish_reason": "stop", "stop_reason": null, "token_ids": null } ], "service_tier": null, "system_fingerprint": null, "usage": { "prompt_tokens": 79, "total_tokens": 112, "completion_tokens": 33, "prompt_tokens_details": null }, "prompt_logprobs": null, "prompt_token_ids": null, "kv_transfer_params": null }
Embeddings¶
We can hit the /stainedglass
endpoint to fetch:
- Plain-text (un-transformed) embeddings
- Transformed embeddings
- Text prompt reconstructed from the transformed embeddings
As a custom endpoint in SGT Proxy,/stainedglass
can be accessed in the following ways:
- curl
- Python
Python¶
Send a POST
request and write the response to a json file.
In [12]:
Copied!
import json
import requests
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {API_KEY}",
}
INPUT_PROMPT_MESSAGES = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{
"role": "assistant",
"content": "The Los Angeles Dodgers won the World Series in 2020.",
},
{"role": "user", "content": "Where was it played?"},
]
data = {
"messages": INPUT_PROMPT_MESSAGES,
"return_plain_text_embeddings": True,
"return_transformed_embeddings": True,
"return_reconstructed_prompt": True,
"skip_special_tokens": True,
}
with requests.post(
f"{PROXY_URL}/stainedglass",
headers=headers,
json=data,
stream=False,
timeout=120,
) as response:
response.raise_for_status()
with open("response.json", "w") as json_file:
json.dump(response.json(), json_file)
import json
import requests
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {API_KEY}",
}
INPUT_PROMPT_MESSAGES = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{
"role": "assistant",
"content": "The Los Angeles Dodgers won the World Series in 2020.",
},
{"role": "user", "content": "Where was it played?"},
]
data = {
"messages": INPUT_PROMPT_MESSAGES,
"return_plain_text_embeddings": True,
"return_transformed_embeddings": True,
"return_reconstructed_prompt": True,
"skip_special_tokens": True,
}
with requests.post(
f"{PROXY_URL}/stainedglass",
headers=headers,
json=data,
stream=False,
timeout=120,
) as response:
response.raise_for_status()
with open("response.json", "w") as json_file:
json.dump(response.json(), json_file)
curl¶
Send a curl
POST
request and pipe the response to a json file.
In [ ]:
Copied!
%%bash
curl -X POST 'http://127.0.0.1:8601/v1/stainedglass' \
--header 'Content-Type: application/json' \
--data '{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
{"role": "user", "content": "Where was it played?"}
],
"return_plain_text_embeddings": true,
"return_transformed_embeddings": true,
"return_reconstructed_prompt": true,
"skip_special_tokens": true
}' \
-o response.json
%%bash
curl -X POST 'http://127.0.0.1:8601/v1/stainedglass' \
--header 'Content-Type: application/json' \
--data '{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
{"role": "user", "content": "Where was it played?"}
],
"return_plain_text_embeddings": true,
"return_transformed_embeddings": true,
"return_reconstructed_prompt": true,
"skip_special_tokens": true
}' \
-o response.json
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 4720k 100 4720k 100 480 6616k 672 --:--:-- --:--:-- --:--:-- 6611k20k 100 4720k 100 480 6615k 672 --:--:-- --:--:-- --:--:-- 6611k
response.json
The file output should be something like the following:
{
"plain_text_embeddings": [
[
0.0010528564453125,
-0.000888824462890625,
0.0021514892578125,
-0.0036773681640625,
...
]
],
"transformed_embeddings": [
[
-0.0005447890143841505,
0.001484002685174346,
-0.002132839523255825,
0.008831249549984932,
...
]
],
"reconstructed_prompt": "},\r();\r gepubliceFilters тогоess',\r\x0c]);\r',\r',\r //\r});\r },\r];\r',\r];\r });\r},\r\x1d\x85od';\r};\r //\r\x1c));\r //\r});\r },\r];\r');\r];\r>?[<',\r},\r //\r\x1d });\r"
}
Conclusion¶
- Stained Glass Transform Proxy can be used with a wide array of OpenAI Chat Completions API compatible clients.
- The
/stainedglass
endpoint offers an insight into the SGT Proxy's protection mechanisms by providing access to:- Plain (un-transformed) LLM embeddings.
- Transformed LLM embeddings.
- Reconstructed text from transformed embeddings.