huggingface
SGHFLM
¶
Bases: HFLM
Eval harness model class for evaluating Stained Glass Transforms.
__init__
¶
__init__(
transform_model_path: str,
base_model_dir: str,
device: device | str,
batch_size: int | str,
max_length: int | str,
apply_stainedglass: bool = True,
quantization_type: Literal["bf16", "int8", "nf4", "fp4"]
| None = None,
trust_remote_code: bool = False,
) -> None
Load the model from the path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transform_model_path |
str
|
The path to the |
required |
base_model_dir |
str
|
That path to model directory. Can be a string, the model id of a pretrained model configuration hosted inside a model repo on huggingface.co or a path to a directory containing a configuration file saved using the save_pretrained() method or a path or url to a saved configuration JSON file. |
required |
device |
device | str
|
The device to load the models onto. |
required |
batch_size |
int | str
|
The batch size to use for inference. |
required |
max_length |
int | str
|
The maximum length of the context used for evaluation. |
required |
apply_stainedglass |
bool
|
Whether to disable the noise layer. Stainedglass is applied (True) by default. |
True
|
quantization_type |
Literal['bf16', 'int8', 'nf4', 'fp4'] | None
|
The type of quantization to apply to model parameters. |
None
|
trust_remote_code |
bool
|
The hugging face parameter to know whether to trust remote code. |
False
|
get_inputs_batch
¶
get_inputs_batch(
chunk: list[tuple[int, int, Tensor, Tensor]]
) -> tuple[
list[Tensor],
list[Tensor],
list[Tensor],
list[int],
int | None,
]
Pad the input_ids and noise_mask and create the attention mask while applying one token shift to align with the ground truth labels.
how this all works
CTX CONT
inp 0 1 2 3|4 5 6 7 8 9 <- last token is deleted by inp[:, :-1] logits 1 2 3|4 5 6 7 8 9 <- the ctx half gets tossed out by the cont_toks 4 5 6 7 8 9 [:, -len(continuation_enc):, :self.vocab_size] slice
inp is the input to the model for the text completion task
inp is truncated from the left side of the input_ids for longer examples
[' This example is way too long '] becomes ['is way too long'] if max_length is 4
the extra -1 in the indices is there to account for the last token since it can't be compared with the generation output
Parameters:
Name | Type | Description | Default |
---|---|---|---|
chunk |
list[tuple[int, int, Tensor, Tensor]]
|
A list of tuples containing the index ,continuation_len, input_ids, and noise_mask. |
required |
Returns:
Type | Description |
---|---|
tuple[list[Tensor], list[Tensor], list[Tensor], list[int], int | None]
|
A tuple of input ids, noise_masks, attention_masks, input lengths, and padding_length. |
get_log_probability_and_exact_match
¶
get_log_probability_and_exact_match(
continuation_len: int,
logits: Tensor,
inp: Tensor,
input_length: int,
padding_length: int,
) -> tuple[float, bool]
Obtain log probability at corresponding continuation token and calculates if there is an exact match of the output logits from the model with respect to the continuation tokens.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
continuation_len |
int
|
Length of the continuation tokens. |
required |
logits |
Tensor
|
Output logits from the model. |
required |
inp |
Tensor
|
Input ids tensor with padding. |
required |
input_length |
int
|
Length of inp. |
required |
padding_length |
int
|
Total max length of the inp for a given batch. |
required |
Returns:
Type | Description |
---|---|
tuple[float, bool]
|
A tuple of the sum of log probability and exact match (boolean). |
load_base_model
¶
load_base_model(
base_model_dir: str,
quantization_type: Literal["bf16", "int8", "nf4", "fp4"]
| None,
device: device | str,
) -> Module
Load the base model from the base_model_dir provided.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
base_model_dir |
str
|
The path to the Hugging Face model directory. |
required |
quantization_type |
Literal['bf16', 'int8', 'nf4', 'fp4'] | None
|
The type of quantization to apply to model parameters. |
required |
device |
device | str
|
The device to load the models onto. |
required |
Raises:
Type | Description |
---|---|
NotImplementedError
|
For unsupported quantization_types. |
Returns:
Type | Description |
---|---|
Module
|
A base language model. |
load_noisy_model_and_tokenizer_wrapper
¶
load_noisy_model_and_tokenizer_wrapper(
transform_model_path: str, device: device | str
) -> tuple[
NoiseMaskedNoisyTransformerModel, TokenizerWrapper
]
Load the StainedGlassTransformForText using the path to get noisy_model model and tokenizer_wrapper.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transform_model_path |
str
|
The path with saved StainedGlassTransformForText. |
required |
device |
device | str
|
The device to map the data to. |
required |
Returns:
Type | Description |
---|---|
tuple[NoiseMaskedNoisyTransformerModel, TokenizerWrapper]
|
A NoiseMaskedNoisyTransformerModel without the base_model weights and the tokenizer wrapper. |
loglikelihood
¶
loglikelihood(
requests: Sequence[Instance], disable_tqdm: bool = False
) -> list[tuple[float, bool]]
Compute the log-likelihood of the continuation given the context.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
requests |
Sequence[Instance]
|
A list of requests to evaluate. Each request is a tuple of the context and continuation to evaluate. |
required |
disable_tqdm |
bool
|
Whether to disable tqdm progress bar. |
False
|
Raises:
Type | Description |
---|---|
ValueError
|
If the context is empty in the requests. |
Returns:
Type | Description |
---|---|
list[tuple[float, bool]]
|
A list of tuples, where each tuple is the log-likelihood of the continuation given the context and whether the model |
list[tuple[float, bool]]
|
exactly matched the continuation. |