noisy_transformer_masking_model
Classes:
Name | Description |
---|---|
NoiseMaskedNoisyTransformerModel |
A [ |
NoiseMaskedNoisyTransformerModel
¶
Bases: NoisyModel[CausalModelT, NoiseLayerP, NoiseLayerT]
A [NoisyTransformerModel
][stainedglass_core.model.NoisyTransformerModel] that adds noise to a portion of the inputs, excluding
any special tokens.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
Callable[NoiseLayerP, NoiseLayerT]
|
The type of noise that is added to the given model. |
required |
|
CausalModelT
|
The model to add noise to. |
required |
|
str | None
|
Name of the layer whose output noise will be added to. A submodule of the model may be specified by providing the
|
None
|
|
str | None
|
If the target layer is the input, the keyword parameter to which noise is added (default: None). By default, noise is added to the first positional parameter of the model's forward method. |
None
|
|
int | None
|
The layer index to truncate the model at. |
None
|
|
args
|
Positional arguments to the |
required |
|
kwargs
|
Keyword arguments to the |
required |
Methods:
Name | Description |
---|---|
__init__ |
|
distillation_context |
Prepare the base model to facilitate distillation training by applying losses over the transformed and non-transformed |
forward |
Call the |
generate |
Generate sequences of token ids using transformed embeddings. |
reconstruct_ids_from_embeddings |
Reconstruct token ids from embeddings using L2 similarity search on the input embedding layer. |
reset_parameters |
Reinitialize parameters and buffers. |
restore_and_load |
Restore the final decoder layers and final normalization layer and move them back to their original devices. |
sample_transformed_embeddings |
Sample transformed embeddings for the given input token ids. |
truncate_and_offload |
Remove the decoder layers after |
Attributes:
Name | Type | Description |
---|---|---|
is_truncated_and_offloaded |
bool
|
Whether the model decoder layers are currently truncated. |
single_precision_input_embeddings |
Embedding
|
A single-precision copy of input embeddings. |
target_layer |
Module
|
The |
target_parameter |
str | None
|
The name of the |
target_parameter_index |
int
|
The index of the |
is_truncated_and_offloaded
property
¶
is_truncated_and_offloaded: bool
Whether the model decoder layers are currently truncated.
single_precision_input_embeddings
cached
property
¶
single_precision_input_embeddings: Embedding
A single-precision copy of input embeddings.
target_layer
property
¶
target_layer: Module
The base_model
submodule whose output Tensor
to transform.
Raises:
Type | Description |
---|---|
ValueError
|
If |
target_parameter
property
¶
target_parameter: str | None
The name of the base_model
input Tensor
argument to transform when target_layer
is None
.
target_parameter_index
cached
property
¶
target_parameter_index: int
The index of the base_model
input Tensor
argument to transform when target_layer
is None
.
__init__
¶
__init__(
noise_layer_class: Callable[NoiseLayerP, NoiseLayerT],
base_model: CausalModelT,
truncated_layer_index: int | None = None,
*args: args,
target_layer: str | None = None,
target_parameter: str | None = None,
**kwargs: kwargs,
) -> None
Changed in version 0.74.0: The `noise_token_mask` was renamed to `noise_mask` to create a uniform interface everywhere.
distillation_context
¶
Prepare the base model to facilitate distillation training by applying losses over the transformed and non-transformed activations.
Returns:
Type | Description |
---|---|
contextlib.ExitStack
|
A context manager that detaches the hooks when exited. |
Added in version 0.55.0.
forward
¶
Call the base_model
, applying the noise_layer
to the target_parameter
or target_layer
output.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
Any
|
Positional arguments to |
required |
|
Tensor
|
A mask that selects the elements of the |
required |
|
Any
|
Keyword arguments to |
required |
Returns:
Type | Description |
---|---|
Any
|
The result of |
generate
¶
generate(
inputs: Tensor,
*args: Any,
noise_mask: Tensor,
return_transformed_embeddings: bool = False,
**kwargs: Any,
) -> torch.Tensor | tuple[torch.Tensor, torch.Tensor]
generate(
inputs: Tensor,
*args: Any,
noise_mask: Tensor,
return_transformed_embeddings: bool = False,
**kwargs: Any,
) -> torch.Tensor | tuple[torch.Tensor, torch.Tensor]
generate(
inputs: Tensor,
*args: Any,
noise_mask: Tensor,
return_transformed_embeddings: bool = False,
**kwargs: Any,
) -> torch.Tensor | tuple[torch.Tensor, torch.Tensor]
generate(
inputs: Tensor,
*args: Any,
noise_mask: Tensor,
return_transformed_embeddings: bool = False,
**kwargs: Any,
) -> torch.Tensor | tuple[torch.Tensor, torch.Tensor]
Generate sequences of token ids using transformed embeddings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
Tensor
|
The sequences of input tokens to use as a prompt for generation. |
required |
|
Any
|
Additional positional arguments to the base model's |
required |
|
Tensor
|
The mask that selects the elements of |
required |
|
bool
|
Whether to return the transformed embeddings. Transformed embeddings can be used with
|
False
|
|
Any
|
Additional keyword arguments to the base model's |
required |
Returns:
Type | Description |
---|---|
torch.Tensor | tuple[torch.Tensor, torch.Tensor]
|
The generated token ids and optionally the transformed embeddings. |
Examples:
>>> from stainedglass_core import metrics as sg_metrics
>>> from stainedglass_core.huggingface import generation as sg_generation
>>> pretrained_model_name_or_path = (
... "tests/resources/tokenizers/mini-Mistral-7B-Instruct-v0.2"
... )
>>> tokenizer = transformers.AutoTokenizer.from_pretrained(
... pretrained_model_name_or_path
... )
>>> config = transformers.AutoConfig.from_pretrained(pretrained_model_name_or_path)
>>> base_model = transformers.AutoModelForCausalLM.from_pretrained(
... pretrained_model_name_or_path,
... torch_dtype=torch.bfloat16,
... )
>>> noisy_model = NoiseMaskedNoisyTransformerModel(
... transformer_cloak.TransformerCloak,
... base_model,
... target_layer="model.embed_tokens",
... scale=(1e-8, 1.0),
... transformer_type=transformers.MistralModel,
... config_path=pretrained_model_name_or_path,
... )
>>> batch_size, seq_length = 1, 10
>>> input_ids = torch.randint(0, config.vocab_size, (batch_size, seq_length))
>>> attention_mask = torch.hstack(
... [
... torch.zeros((batch_size, 2), dtype=torch.bool),
... torch.ones((batch_size, seq_length - 2), dtype=torch.bool),
... ]
... )
>>> noise_mask = torch.randint(0, 2, (batch_size, seq_length, 1), dtype=torch.bool)
>>> generation_config = sg_generation.StainedGlassGenerationConfig.from_tokenizer(
... tokenizer, max_length=seq_length + 1
... )
Generation without Stained Glass Transform:
>>> generated_ids = noisy_model.base_model.generate(
... inputs=input_ids,
... generation_config=generation_config,
... attention_mask=attention_mask,
... use_cache=True,
... )
Generation with Stained Glass Transform:
>>> generated_ids_from_transformed_embeddings = noisy_model.generate(
... inputs=input_ids,
... generation_config=generation_config,
... attention_mask=attention_mask,
... use_cache=True,
... noise_mask=noise_mask,
... )
Decoding the generated ids into text:
>>> generated_text_from_transformed_embeddings = tokenizer.batch_decode(
... generated_ids_from_transformed_embeddings[:, input_ids.shape[-1] :],
... skip_special_ids=True,
... )
Using return_transformed_embeddings=True
to compare the reconstructed input ids with the original input ids:
>>> generated_ids_from_transformed_embeddings, transformed_embeddings = (
... noisy_model.generate(
... inputs=input_ids,
... generation_config=generation_config,
... attention_mask=attention_mask,
... use_cache=True,
... noise_mask=noise_mask,
... return_transformed_embeddings=True,
... )
... )
>>> reconstructed_input_ids = noisy_model.reconstruct_ids_from_embeddings(
... transformed_embeddings
... )
>>> reconstructed_input_text = tokenizer.batch_decode(
... reconstructed_input_ids, skip_special_ids=True
... )
>>> percentage_changed_input_ids = sg_metrics.percentage_changed_ids(
... input_ids, reconstructed_input_ids, noise_mask
... )
Added in version 0.86.0. To support generations with noisy models.
reconstruct_ids_from_embeddings
¶
reconstruct_ids_from_embeddings(
embeddings: Tensor,
max_batch_size: int | None = None,
max_sequence_length: int | None = None,
max_num_embeddings: int | None = None,
) -> torch.Tensor
Reconstruct token ids from embeddings using L2 similarity search on the input embedding layer.
Smaller values of max_batch_size
, max_sequence_length
, and max_num_embeddings
require less memory to store the intermediate
distance calculations but have longer runtimes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
Tensor
|
The embeddings of shape ( |
required |
|
int | None
|
The maximum number of batch elements over which to calculate distances. |
None
|
|
int | None
|
The maximum number of sequence positions over which to calculate distances. |
None
|
|
int | None
|
The maximum number of embeddings over which to calculate distances. The results from each split are recursively merged together. |
None
|
Returns:
Type | Description |
---|---|
torch.Tensor
|
The token ids of shape ( |
Examples:
>>> pretrained_model_name_or_path = (
... "tests/resources/tokenizers/mini-Mistral-7B-Instruct-v0.2"
... )
>>> config = transformers.AutoConfig.from_pretrained(pretrained_model_name_or_path)
>>> base_model = transformers.AutoModelForCausalLM.from_pretrained(
... pretrained_model_name_or_path,
... torch_dtype=torch.bfloat16,
... )
>>> noisy_model = NoiseMaskedNoisyTransformerModel(
... transformer_cloak.TransformerCloak,
... base_model,
... target_layer="model.embed_tokens",
... scale=(1e-8, 1.0),
... transformer_type=transformers.MistralModel,
... config_path=pretrained_model_name_or_path,
... )
>>> batch_size, seq_length = 1, 10
>>> input_ids = torch.randint(0, config.vocab_size, (batch_size, seq_length))
>>> attention_mask = torch.hstack(
... [
... torch.zeros((batch_size, 2), dtype=torch.bool),
... torch.ones((batch_size, seq_length - 2), dtype=torch.bool),
... ]
... )
>>> noise_mask = torch.randint(0, 2, (batch_size, seq_length, 1), dtype=torch.bool)
reset_parameters
¶
Reinitialize parameters and buffers.
This method is useful for initializing tensors created on the meta device.
restore_and_load
¶
Restore the final decoder layers and final normalization layer and move them back to their original devices.
Raises:
Type | Description |
---|---|
ValueError
|
If the |
sample_transformed_embeddings
¶
sample_transformed_embeddings(
input_ids: Tensor, noise_mask: Tensor, **kwargs: Any
) -> torch.Tensor
Sample transformed embeddings for the given input token ids.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
Tensor
|
The sequences of input tokens to transform. |
required |
|
Tensor
|
The mask that selects the elements of |
required |
|
Any
|
Additional keyword arguments to the noise layer's |
required |
Returns:
Type | Description |
---|---|
torch.Tensor
|
Sampled transformed embeddings. |
Examples:
>>> pretrained_model_name_or_path = (
... "tests/resources/tokenizers/mini-Mistral-7B-Instruct-v0.2"
... )
>>> config = transformers.AutoConfig.from_pretrained(pretrained_model_name_or_path)
>>> base_model = transformers.AutoModelForCausalLM.from_pretrained(
... pretrained_model_name_or_path,
... torch_dtype=torch.bfloat16,
... )
>>> noisy_model = NoiseMaskedNoisyTransformerModel(
... transformer_cloak.TransformerCloak,
... base_model,
... target_layer="model.embed_tokens",
... scale=(1e-8, 1.0),
... transformer_type=transformers.MistralModel,
... config_path=pretrained_model_name_or_path,
... )
>>> batch_size, seq_length = 1, 10
>>> input_ids = torch.randint(0, config.vocab_size, (batch_size, seq_length))
>>> attention_mask = torch.hstack(
... [
... torch.zeros((batch_size, 2), dtype=torch.bool),
... torch.ones((batch_size, seq_length - 2), dtype=torch.bool),
... ]
... )
>>> noise_mask = torch.randint(0, 2, (batch_size, seq_length, 1), dtype=torch.bool)
truncate_and_offload
¶
Remove the decoder layers after truncated_layer_index
and the final normalization layer from the model and move them to the
CPU.
Decoder layer truncation improves runtime performance and lowers memory usage, but removes access to the logits layer and thereby sacrifices the computability of metrics such as perplexity.
Raises:
Type | Description |
---|---|
ValueError
|
If the |