Mistral Inference with Stained Glass Transform (SGT) Proxy and LLM API¶
This notebook demonstrates:
- Various use-cases of inference from a Stained Glass Transform LLM API instance running a Mistral base model via OpenAI Chat Completions API compatible clients while using Stained Glass Transform Proxy to protect user input prompts.
- Accessing the input embeddings (transformed and otherwise) and the reconstructed prompt from the transformed embeddings.
Inference¶
Pre-requisites¶
- A live instance of SGT LLM API.
- A live instance of SGT Proxy (Please refer to the deployment instructions).
Chat Completions¶
We can perform inference on the SGT LLM API instance by hitting the SGT Proxy's OpenAI Chat Completions API compatible endpoint via the following common interfaces. Let's walk through these methods.
In [1]:
Copied!
# Set proxy access parameters.
PROXY_URL = "http://127.0.0.1:8601/v1"
MODEL_NAME = "mistral-7b-instruct"
API_KEY = "<overwrite-with-your-api-key>"
# Set proxy access parameters.
PROXY_URL = "http://127.0.0.1:8601/v1"
MODEL_NAME = "mistral-7b-instruct"
API_KEY = ""
OpenAI Client¶
- Install
openai
python package.
In [ ]:
Copied!
%pip install openai
%pip install openai
- Perform inference.
In [3]:
Copied!
import openai
client = openai.OpenAI(base_url=PROXY_URL, api_key=API_KEY)
response = client.chat.completions.create(
model=MODEL_NAME,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{
"role": "assistant",
"content": "The Los Angeles Dodgers won the World Series in 2020.",
},
{"role": "user", "content": "Where was it played?"},
],
)
print(response.choices[0].message.content)
import openai
client = openai.OpenAI(base_url=PROXY_URL, api_key=API_KEY)
response = client.chat.completions.create(
model=MODEL_NAME,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{
"role": "assistant",
"content": "The Los Angeles Dodgers won the World Series in 2020.",
},
{"role": "user", "content": "Where was it played?"},
],
)
print(response.choices[0].message.content)
The 2020 World Series was played at Globe Life Field in Arlington, Texas. Due to the COVID-19 pandemic, all games were played at this one location, with no travel between ballparks. This was the first time since 1944 that all World Series games were held in the same city.
LangChain¶
- Install
langchain-openai
python package.
In [ ]:
Copied!
%pip install langchain-openai
%pip install langchain-openai
- Perform inference.
In [7]:
Copied!
import langchain_openai
from langchain_core import output_parsers, prompts
llm = langchain_openai.ChatOpenAI(
model=MODEL_NAME, base_url=PROXY_URL, api_key=API_KEY
)
prompt = prompts.ChatPromptTemplate.from_messages(
[
("system", "You are a helpful assistant."),
("user", "Who won the world series in 2020?"),
("assistant", "The Los Angeles Dodgers won the World Series in 2020."),
("user", "{input}"),
]
)
output_parser = output_parsers.StrOutputParser()
chain = prompt | llm | output_parser
print(chain.invoke({"input": "Where was it played?"}))
import langchain_openai
from langchain_core import output_parsers, prompts
llm = langchain_openai.ChatOpenAI(
model=MODEL_NAME, base_url=PROXY_URL, api_key=API_KEY
)
prompt = prompts.ChatPromptTemplate.from_messages(
[
("system", "You are a helpful assistant."),
("user", "Who won the world series in 2020?"),
("assistant", "The Los Angeles Dodgers won the World Series in 2020."),
("user", "{input}"),
]
)
output_parser = output_parsers.StrOutputParser()
chain = prompt | llm | output_parser
print(chain.invoke({"input": "Where was it played?"}))
The 2020 World Series was played at Globe Life Field in Arlington, Texas. Due to the COVID-19 pandemic, all games were played at this one location, with no travel between ballparks. This was the first time since 1944 that all World Series games were held in the same city.
LiteLLM¶
- Install
litellm
python package.
In [ ]:
Copied!
%pip install litellm
%pip install litellm
- Perform inference.
In [5]:
Copied!
import litellm
response = litellm.completion(
model=f"openai/{MODEL_NAME}",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{
"role": "assistant",
"content": "The Los Angeles Dodgers won the World Series in 2020.",
},
{"role": "user", "content": "Where was it played?"},
],
base_url=PROXY_URL,
api_key=API_KEY,
)
print(response.choices[0].message.content)
import litellm
response = litellm.completion(
model=f"openai/{MODEL_NAME}",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{
"role": "assistant",
"content": "The Los Angeles Dodgers won the World Series in 2020.",
},
{"role": "user", "content": "Where was it played?"},
],
base_url=PROXY_URL,
api_key=API_KEY,
)
print(response.choices[0].message.content)
The 2020 World Series was played at Globe Life Field in Arlington, Texas. Due to the COVID-19 pandemic, all games were played at this one location, with no travel between ballparks. This was the first time since 1944 that all World Series games were held in the same city.
Magentic¶
- Install
magentic
python package.
In [ ]:
Copied!
%pip install magentic
%pip install magentic
- Perform inference.
In [9]:
Copied!
import magentic
@magentic.chatprompt(
magentic.SystemMessage("You are a helpful assistant."),
magentic.UserMessage("Who won the world series in 2020?"),
magentic.AssistantMessage(
"The Los Angeles Dodgers won the World Series in 2020."
),
magentic.UserMessage("{prompt}"),
)
def get_response(prompt: str) -> str:
"""Use magentic to get a response to the chat history and prompt.
Magentic will automatically fill in the appropriate OpenAI API calls, which
is why this function definition is empty.
Args:
prompt: The prompt to ask the model as the final user message.
Returns:
The response from the model to the prompt and chat history.
"""
with magentic.OpenaiChatModel(MODEL_NAME, api_key=API_KEY, base_url=PROXY_URL):
response = get_response("Where was it played?")
print(response)
import magentic
@magentic.chatprompt(
magentic.SystemMessage("You are a helpful assistant."),
magentic.UserMessage("Who won the world series in 2020?"),
magentic.AssistantMessage(
"The Los Angeles Dodgers won the World Series in 2020."
),
magentic.UserMessage("{prompt}"),
)
def get_response(prompt: str) -> str:
"""Use magentic to get a response to the chat history and prompt.
Magentic will automatically fill in the appropriate OpenAI API calls, which
is why this function definition is empty.
Args:
prompt: The prompt to ask the model as the final user message.
Returns:
The response from the model to the prompt and chat history.
"""
with magentic.OpenaiChatModel(MODEL_NAME, api_key=API_KEY, base_url=PROXY_URL):
response = get_response("Where was it played?")
print(response)
The 2020 World Series was played at Globe Life Field in Arlington, Texas. Due to the COVID-19 pandemic, all games were played at this one location, with no travel between ballparks. This was the first time since 1944 that all World Series games were held in the same city.
curl¶
Request
In [13]:
Copied!
%%bash
curl --location 'http://127.0.0.1:8601/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "mistral-7b-instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
{"role": "user", "content": "Where was it played?"}
],
"max_tokens": 3000,
"temperature": 1.7,
"seed": 123456
}' | python -m json.tool
%%bash
curl --location 'http://127.0.0.1:8601/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "mistral-7b-instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
{"role": "user", "content": "Where was it played?"}
],
"max_tokens": 3000,
"temperature": 1.7,
"seed": 123456
}' | python -m json.tool
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1014 100 579 100 435 172 129 0:00:03 0:00:03 --:--:-- 301
{ "id": "f9da7709-94d8-4cbc-8334-38cd2efe5d89", "choices": [ { "finish_reason": "stop", "index": 0, "message": { "role": "assistant", "content": "The 2020 World Series was played at Globe Life Field in Arlington, Texas. Due to the COVID-19 pandemic, all games were played at this one location, with no travel between ballparks. This was the first time since 1944 that all World Series games were held in the same city.", "tool_calls": null }, "logprobs": null } ], "created": 1726595700, "model": "mistral-7b-instruct", "service_tier": null, "system_fingerprint": null, "object": "chat.completion", "usage": null }
Embeddings¶
We can hit the /stainedglass
endpoint to fetch:
- Plain-text (un-transformed) embeddings
- Transformed embeddings
- Text prompt reconstructed from the transformed embeddings
As a custom endpoint in SGT Proxy,/stainedglass
can be accessed in the following ways:
- curl
- Python
Python¶
Send a POST
request and write the response to a json file.
In [11]:
Copied!
import json
import requests
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {API_KEY}",
}
INPUT_PROMPT_MESSAGES = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{
"role": "assistant",
"content": "The Los Angeles Dodgers won the World Series in 2020.",
},
{"role": "user", "content": "Where was it played?"},
]
data = {
"messages": INPUT_PROMPT_MESSAGES,
"return_plain_text_embeddings": True,
"return_transformed_embeddings": True,
"return_reconstructed_prompt": True,
"skip_special_tokens": True,
}
with requests.post(
f"{PROXY_URL}/stainedglass",
headers=headers,
json=data,
stream=False,
timeout=120,
) as response:
response.raise_for_status()
with open("response.json", "w") as json_file:
json.dump(response.json(), json_file)
import json
import requests
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {API_KEY}",
}
INPUT_PROMPT_MESSAGES = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{
"role": "assistant",
"content": "The Los Angeles Dodgers won the World Series in 2020.",
},
{"role": "user", "content": "Where was it played?"},
]
data = {
"messages": INPUT_PROMPT_MESSAGES,
"return_plain_text_embeddings": True,
"return_transformed_embeddings": True,
"return_reconstructed_prompt": True,
"skip_special_tokens": True,
}
with requests.post(
f"{PROXY_URL}/stainedglass",
headers=headers,
json=data,
stream=False,
timeout=120,
) as response:
response.raise_for_status()
with open("response.json", "w") as json_file:
json.dump(response.json(), json_file)
curl¶
Send a curl
POST
request and pipe the response to a json file.
In [12]:
Copied!
%%bash
curl -X POST 'http://127.0.0.1:8601/v1/stainedglass' \
--header 'Content-Type: application/json' \
--data '{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
{"role": "user", "content": "Where was it played?"}
],
"return_plain_text_embeddings": true,
"return_transformed_embeddings": true,
"return_reconstructed_prompt": true,
"skip_special_tokens": true
}' \
-o response.json
%%bash
curl -X POST 'http://127.0.0.1:8601/v1/stainedglass' \
--header 'Content-Type: application/json' \
--data '{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
{"role": "user", "content": "Where was it played?"}
],
"return_plain_text_embeddings": true,
"return_transformed_embeddings": true,
"return_reconstructed_prompt": true,
"skip_special_tokens": true
}' \
-o response.json
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 6879k 100 6878k 100 480 3648k 254 0:00:01 0:00:01 --:--:-- 3647k
response.json
The file output should be something like the following:
{
"plain_text_embeddings": [
[
0.0010528564453125,
-0.000888824462890625,
0.0021514892578125,
-0.0036773681640625,
...
]
],
"transformed_embeddings": [
[
-0.0005447890143841505,
0.001484002685174346,
-0.002132839523255825,
0.008831249549984932,
...
]
],
"reconstructed_prompt": "},\r();\r gepubliceFilters тогоess',\r\x0c]);\r',\r',\r //\r});\r },\r];\r',\r];\r });\r},\r\x1d\x85od';\r};\r //\r\x1c));\r //\r});\r },\r];\r');\r];\r>?[<',\r},\r //\r\x1d });\r"
}
Conclusion¶
- Stained Glass Transform Proxy can be used with a wide array of OpenAI Chat Completions API compatible clients.
- The
/stainedglass
endpoint offers an insight into the SGT Proxy's protection mechanisms by providing access to:- Plain (un-transformed) LLM embeddings.
- Transformed LLM embeddings.
- Reconstructed text from transformed embeddings.