Skip to content

request_output

Patched vLLM RequestOutput class with encrypted text fields.

Warning

Under almost no circumstances should you need to import this module directly. If stainedglass_output_protection is installed, and you launch vLLM via the alternative entrypoint, this module will be automatically applied.

Classes:

Name Description
EncryptedRequestOutput

RequestOutput with user-facing text encrypted upon instantiation.

EncryptedRequestOutput

Bases: RequestOutput

RequestOutput with user-facing text encrypted upon instantiation.

Methods:

Name Description
__init__

Encrypt the string fields of a RequestOutput before initializing it.

Attributes:

Name Type Description
key_registry DictProxy[str, bytes | None]

Reference to the class's shared key registry.

key_registry property

key_registry: DictProxy[str, bytes | None]

Reference to the class's shared key registry.

Opening a connection is expensive (on the order of 170-ish milliseconds), but maintaining one is cheap. This class can be used to maintain a single connection to the shared registry as a class variable (i.e. all EncryptedRequestOutputs can share a single connection per process), instead of re-establishing a connection every time one is needed.

__init__

__init__(
    request_id: str,
    prompt: str | None,
    prompt_token_ids: list[int] | None,
    prompt_logprobs: PromptLogprobs | None,
    outputs: list[CompletionOutput],
    finished: bool,
    metrics: RequestMetrics | None = None,
    lora_request: LoRARequest | None = None,
    encoder_prompt: str | None = None,
    encoder_prompt_token_ids: list[int] | None = None,
    num_cached_tokens: int | None = None,
    *,
    multi_modal_placeholders: MultiModalPlaceholderDict
    | None = None,
    kv_transfer_params: dict[str, Any] | None = None,
    **kwargs: Any,
) -> None

Encrypt the string fields of a RequestOutput before initializing it.

Parameters:

Name Type Description Default

request_id

str

The unique ID of the request.

required

prompt

str | None

The prompt string of the request. For encoder/decoder models, this is the decoder input prompt.

required

prompt_token_ids

list[int] | None

The token IDs of the prompt. For encoder/decoder models, this is the decoder input prompt token ids.

required

prompt_logprobs

PromptLogprobs | None

The log probabilities to return per prompt token.

required

outputs

list[CompletionOutput]

The output sequences of the request.

required

finished

bool

Whether the whole request is finished.

required

metrics

RequestMetrics | None

Metrics associated with the request.

None

lora_request

LoRARequest | None

The LoRA request that was used to generate the output.

None

encoder_prompt

str | None

The encoder prompt string of the request. None if decoder-only.

None

encoder_prompt_token_ids

list[int] | None

The token IDs of the encoder prompt. None if decoder-only.

None

num_cached_tokens

int | None

The number of tokens with prefix cache hit.

None

multi_modal_placeholders

MultiModalPlaceholderDict | None

Placeholder data for multimodal requests.

None

kv_transfer_params

dict[str, Any] | None

The params for remote K/V transfer.

None

**kwargs

Any

Keyword arguments for RequestOutput.

required

Raises:

Type Description
NotImplementedError

If encoder_prompt is not None. Encoder models are not currently supported.

NotImplementedError

If encoder_prompt_token_ids is not None. Encoder models are not currently supported.

NotImplementedError

If prompt_logprobs is not None. List of log probabilities cannot be encrypted, currently.

NotImplementedError

If multi_modal_placeholders is not None. Multimodal models are not currently supported.