Mistral Inference with Stained Glass Transform (SGT) Proxy and LLM API¶

This notebook demonstrates:

Various use-cases of inference from a Stained Glass Transform LLM API instance running a Mistral base model via OpenAI Chat Completions API compatible clients while using Stained Glass Transform Proxy to protect user input prompts.
Accessing the input embeddings (transformed and otherwise) and the reconstructed prompt from the transformed embeddings.

Inference¶

Pre-requisites¶

A live instance of SGT LLM API.
A live instance of SGT Proxy (Please refer to the deployment instructions).

Chat Completions¶

We can perform inference on the SGT LLM API instance by hitting the SGT Proxy's OpenAI Chat Completions API compatible endpoint via the following common interfaces. Let's walk through these methods.

In [1]:

Copied!





# Set proxy access parameters.
PROXY_URL = "http://127.0.0.1:8601/v1"
MODEL_NAME = "mistral-7b-instruct"
API_KEY = "<overwrite-with-your-api-key>"
# Set proxy access parameters.
PROXY_URL = "http://127.0.0.1:8601/v1"
MODEL_NAME = "mistral-7b-instruct"
API_KEY = ""

OpenAI Client¶

Install openai python package.

In [ ]:

Copied!

%pip install openai
%pip install openai

Perform inference.

In [3]:

Copied!





import openai

client = openai.OpenAI(base_url=PROXY_URL, api_key=API_KEY)

response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {
            "role": "assistant",
            "content": "The Los Angeles Dodgers won the World Series in 2020.",
        },
        {"role": "user", "content": "Where was it played?"},
    ],
)

print(response.choices[0].message.content)
import openai

client = openai.OpenAI(base_url=PROXY_URL, api_key=API_KEY)

response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {
            "role": "assistant",
            "content": "The Los Angeles Dodgers won the World Series in 2020.",
        },
        {"role": "user", "content": "Where was it played?"},
    ],
)

print(response.choices[0].message.content)

The 2020 World Series was played at Globe Life Field in Arlington, Texas. Due to the COVID-19 pandemic, all games were played at this one location, with no travel between ballparks. This was the first time since 1944 that all World Series games were held in the same city.

LangChain¶

Install langchain-openai python package.

In [ ]:

Copied!

%pip install langchain-openai
%pip install langchain-openai

Perform inference.

In [7]:

Copied!





import langchain_openai
from langchain_core import output_parsers, prompts

llm = langchain_openai.ChatOpenAI(
    model=MODEL_NAME, base_url=PROXY_URL, api_key=API_KEY
)
prompt = prompts.ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant."),
        ("user", "Who won the world series in 2020?"),
        ("assistant", "The Los Angeles Dodgers won the World Series in 2020."),
        ("user", "{input}"),
    ]
)
output_parser = output_parsers.StrOutputParser()

chain = prompt | llm | output_parser
print(chain.invoke({"input": "Where was it played?"}))
import langchain_openai
from langchain_core import output_parsers, prompts

llm = langchain_openai.ChatOpenAI(
    model=MODEL_NAME, base_url=PROXY_URL, api_key=API_KEY
)
prompt = prompts.ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant."),
        ("user", "Who won the world series in 2020?"),
        ("assistant", "The Los Angeles Dodgers won the World Series in 2020."),
        ("user", "{input}"),
    ]
)
output_parser = output_parsers.StrOutputParser()

chain = prompt | llm | output_parser
print(chain.invoke({"input": "Where was it played?"}))

The 2020 World Series was played at Globe Life Field in Arlington, Texas. Due to the COVID-19 pandemic, all games were played at this one location, with no travel between ballparks. This was the first time since 1944 that all World Series games were held in the same city.

LiteLLM¶

Install litellm python package.

In [ ]:

Copied!

%pip install litellm
%pip install litellm

Perform inference.

In [5]:

Copied!





import litellm

response = litellm.completion(
    model=f"openai/{MODEL_NAME}",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {
            "role": "assistant",
            "content": "The Los Angeles Dodgers won the World Series in 2020.",
        },
        {"role": "user", "content": "Where was it played?"},
    ],
    base_url=PROXY_URL,
    api_key=API_KEY,
)

print(response.choices[0].message.content)
import litellm

response = litellm.completion(
    model=f"openai/{MODEL_NAME}",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {
            "role": "assistant",
            "content": "The Los Angeles Dodgers won the World Series in 2020.",
        },
        {"role": "user", "content": "Where was it played?"},
    ],
    base_url=PROXY_URL,
    api_key=API_KEY,
)

print(response.choices[0].message.content)

The 2020 World Series was played at Globe Life Field in Arlington, Texas. Due to the COVID-19 pandemic, all games were played at this one location, with no travel between ballparks. This was the first time since 1944 that all World Series games were held in the same city.

Magentic¶

Install magentic python package.

In [ ]:

Copied!

%pip install magentic
%pip install magentic

Perform inference.

In [9]:

Copied!





import magentic


@magentic.chatprompt(
    magentic.SystemMessage("You are a helpful assistant."),
    magentic.UserMessage("Who won the world series in 2020?"),
    magentic.AssistantMessage(
        "The Los Angeles Dodgers won the World Series in 2020."
    ),
    magentic.UserMessage("{prompt}"),
)
def get_response(prompt: str) -> str:
    """Use magentic to get a response to the chat history and prompt.

    Magentic will automatically fill in the appropriate OpenAI API calls, which
    is why this function definition is empty.

    Args:
        prompt: The prompt to ask the model as the final user message.

    Returns:
        The response from the model to the prompt and chat history.
    """


with magentic.OpenaiChatModel(MODEL_NAME, api_key=API_KEY, base_url=PROXY_URL):
    response = get_response("Where was it played?")

print(response)
import magentic


@magentic.chatprompt(
    magentic.SystemMessage("You are a helpful assistant."),
    magentic.UserMessage("Who won the world series in 2020?"),
    magentic.AssistantMessage(
        "The Los Angeles Dodgers won the World Series in 2020."
    ),
    magentic.UserMessage("{prompt}"),
)
def get_response(prompt: str) -> str:
    """Use magentic to get a response to the chat history and prompt.

    Magentic will automatically fill in the appropriate OpenAI API calls, which
    is why this function definition is empty.

    Args:
        prompt: The prompt to ask the model as the final user message.

    Returns:
        The response from the model to the prompt and chat history.
    """


with magentic.OpenaiChatModel(MODEL_NAME, api_key=API_KEY, base_url=PROXY_URL):
    response = get_response("Where was it played?")

print(response)

The 2020 World Series was played at Globe Life Field in Arlington, Texas. Due to the COVID-19 pandemic, all games were played at this one location, with no travel between ballparks. This was the first time since 1944 that all World Series games were held in the same city.

curl¶

Request

In [13]:

Copied!





%%bash
curl --location 'http://127.0.0.1:8601/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "model": "mistral-7b-instruct",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ],
    "max_tokens": 3000,
    "temperature": 1.7,
    "seed": 123456
}' | python -m json.tool
%%bash
curl --location 'http://127.0.0.1:8601/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "model": "mistral-7b-instruct",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ],
    "max_tokens": 3000,
    "temperature": 1.7,
    "seed": 123456
}' | python -m json.tool

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1014  100   579  100   435    172    129  0:00:03  0:00:03 --:--:--   301

{
    "id": "f9da7709-94d8-4cbc-8334-38cd2efe5d89",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "The 2020 World Series was played at Globe Life Field in Arlington, Texas. Due to the COVID-19 pandemic, all games were played at this one location, with no travel between ballparks. This was the first time since 1944 that all World Series games were held in the same city.",
                "tool_calls": null
            },
            "logprobs": null
        }
    ],
    "created": 1726595700,
    "model": "mistral-7b-instruct",
    "service_tier": null,
    "system_fingerprint": null,
    "object": "chat.completion",
    "usage": null
}

Embeddings¶

We can hit the /stainedglass endpoint to fetch:

Plain-text (un-transformed) embeddings
Transformed embeddings
Text prompt reconstructed from the transformed embeddings

As a custom endpoint in SGT Proxy,/stainedglass can be accessed in the following ways:

curl
Python

Python¶

Send a POST request and write the response to a json file.

In [11]:

Copied!





import json

import requests

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {API_KEY}",
}

INPUT_PROMPT_MESSAGES = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"},
    {
        "role": "assistant",
        "content": "The Los Angeles Dodgers won the World Series in 2020.",
    },
    {"role": "user", "content": "Where was it played?"},
]

data = {
    "messages": INPUT_PROMPT_MESSAGES,
    "return_plain_text_embeddings": True,
    "return_transformed_embeddings": True,
    "return_reconstructed_prompt": True,
    "skip_special_tokens": True,
}

with requests.post(
    f"{PROXY_URL}/stainedglass",
    headers=headers,
    json=data,
    stream=False,
    timeout=120,
) as response:
    response.raise_for_status()

    with open("response.json", "w") as json_file:
        json.dump(response.json(), json_file)
import json

import requests

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {API_KEY}",
}

INPUT_PROMPT_MESSAGES = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"},
    {
        "role": "assistant",
        "content": "The Los Angeles Dodgers won the World Series in 2020.",
    },
    {"role": "user", "content": "Where was it played?"},
]

data = {
    "messages": INPUT_PROMPT_MESSAGES,
    "return_plain_text_embeddings": True,
    "return_transformed_embeddings": True,
    "return_reconstructed_prompt": True,
    "skip_special_tokens": True,
}

with requests.post(
    f"{PROXY_URL}/stainedglass",
    headers=headers,
    json=data,
    stream=False,
    timeout=120,
) as response:
    response.raise_for_status()

    with open("response.json", "w") as json_file:
        json.dump(response.json(), json_file)

curl¶

Send a curl POST request and pipe the response to a json file.

In [12]:

Copied!





%%bash
curl -X POST 'http://127.0.0.1:8601/v1/stainedglass' \
--header 'Content-Type: application/json' \
--data '{
  "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ],
  "return_plain_text_embeddings": true,
  "return_transformed_embeddings": true,
  "return_reconstructed_prompt": true,
  "skip_special_tokens": true
}' \
-o response.json
%%bash
curl -X POST 'http://127.0.0.1:8601/v1/stainedglass' \
--header 'Content-Type: application/json' \
--data '{
  "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ],
  "return_plain_text_embeddings": true,
  "return_transformed_embeddings": true,
  "return_reconstructed_prompt": true,
  "skip_special_tokens": true
}' \
-o response.json

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 6879k  100 6878k  100   480  3648k    254  0:00:01  0:00:01 --:--:-- 3647k

response.json

The file output should be something like the following:

As you can probably tell, the reconstructed prompt from transformed embeddings is an un-readable name="__codelineno-0-1">{ "plain_text_embeddings": [ [ 0.0010528564453125, -0.000888824462890625, 0.0021514892578125, -0.0036773681640625, ... ] ], "transformed_embeddings": [ [ -0.0005447890143841505, 0.001484002685174346, -0.002132839523255825, 0.008831249549984932, ... ] ], "reconstructed_prompt": "},\r();\r gepubliceFilters тогоess',\r\x0c]);\r',\r',\r //\r});\r },\r];\r',\r];\r });\r},\r\x1d\x85od';\r};\r //\r\x1c));\r //\r});\r },\r];\r');\r];\r>?[<',\r},\r //\r\x1d });\r" } text block, far from the original input, thereby demonstrating the input prompt protection provided by Stained Glass Transform Proxy.

Conclusion¶

Stained Glass Transform Proxy can be used with a wide array of OpenAI Chat Completions API compatible clients.
The /stainedglass endpoint offers an insight into the SGT Proxy's protection mechanisms by providing access to:
- Plain (un-transformed) LLM embeddings.
- Transformed LLM embeddings.
- Reconstructed text from transformed embeddings.