Skip to content

huggingface

Classes:

Name Description
SGHFLM

Eval harness model class for evaluating Stained Glass Transforms. Stained Glass Transform can be loaded from a

SGHFLM

Bases: HFLM

Eval harness model class for evaluating Stained Glass Transforms. Stained Glass Transform can be loaded from a StainedGlassTransformForText or from a lightning module checkpoint.

Note

StainedGlassTransformForText takes precedence when both are provided.

Parameters:

Name Type Description Default

apply_stainedglass

bool

Whether to apply the Stained Glass Transform.

required

transform_model_path

str | Path | None

The path to the Stained Glass Transform.

None

lightning_checkpoint_path

PathLike | None

The path to the lightning module checkpoint.

None

debug_clean_embeds

bool

Pass clean embeddings to the model, only for debugging purposes.

False

seed

int | None

The seed to use for the Stained Glass Transform.

1234

chat_template_path

str | None

The path to the chat template to use for the evaluation. Useful when the chat template is not part of the tokenizer_config.json file.

None

kwargs

Any

Additional keyword arguments to pass to the parent class.

required

Methods:

Name Description
__init__
apply_chat_template

Return the json string of the chat history which will be processed by _apply_noise_tokenizer_mapper method.

tok_batch_encode

Encode a batch of strings into input ids and attention masks.

Attributes:

Name Type Description
input_ids Tensor

Cached input ids to be used during generation to return the original prompt together with the response.

input_ids property writable

input_ids: Tensor

Cached input ids to be used during generation to return the original prompt together with the response.

Returns:

Type Description
Tensor

The cached input ids.

__init__

__init__(
    apply_stainedglass: bool,
    transform_model_path: str | Path | None = None,
    lightning_checkpoint_path: PathLike | None = None,
    debug_clean_embeds: bool = False,
    seed: int | None = 1234,
    chat_template_path: str | None = None,
    **kwargs: Any,
)

Changed in version v2.11.0:

apply_chat_template

apply_chat_template(
    chat_history: list[dict[str, str]],
    add_generation_prompt: bool = True,
) -> str

Return the json string of the chat history which will be processed by _apply_noise_tokenizer_mapper method.

Parameters:

Name Type Description Default

chat_history

list[dict[str, str]]

The chat history.

required

add_generation_prompt

bool

If this is set, a prompt with the token(s) that indicate the start of an assistant message will be appended to the formatted output.

True

Raises:

Type Description
ValueError

If apply_stainedglass is True but add_generation_prompt is False.

Returns:

Type Description
str

The json string of the chat history.

tok_batch_encode

tok_batch_encode(
    strings: list[str],
    padding_side: str = "left",
    left_truncate_len: int | None = None,
    truncation: bool = False,
) -> tuple[torch.Tensor, torch.Tensor]

Encode a batch of strings into input ids and attention masks.

Parameters:

Name Type Description Default

strings

list[str]

The list of strings to encode.

required

padding_side

str

The side to pad the sequences.

'left'

left_truncate_len

int | None

The length to truncate the left side of the sequences.

None

truncation

bool

Whether to truncate the sequences.

False

Returns:

Type Description
tuple[torch.Tensor, torch.Tensor]

A tuple of input embeddings and attention masks if apply_stainedglass is True, otherwise the input ids and attention masks.