Skip to content

noisy_transformer_masking_model

Classes:

Name Description
NoiseMaskedNoisyTransformerModel

A [NoisyTransformerModel][stainedglass_core.model.NoisyTransformerModel] that adds noise to a portion of the inputs, excluding

NoiseMaskedNoisyTransformerModel

Bases: NoisyModel[CausalModelT, NoiseLayerP, NoiseLayerT]

A [NoisyTransformerModel][stainedglass_core.model.NoisyTransformerModel] that adds noise to a portion of the inputs, excluding any special tokens.

Parameters:

Name Type Description Default

noise_layer_class

Callable[NoiseLayerP, NoiseLayerT]

The type of noise that is added to the given model.

required

base_model

CausalModelT

The model to add noise to.

required

target_layer

str | None

Name of the layer whose output noise will be added to. A submodule of the model may be specified by providing the .-delimited name, e.g. features.0.conv.1.2 (default: 'input').

None

target_parameter

str | None

If the target layer is the input, the keyword parameter to which noise is added (default: None). By default, noise is added to the first positional parameter of the model's forward method.

None

truncated_layer_index

int | None

The layer index to truncate the model at.

None

*args

args

Positional arguments to the noise_layer_class.

required

**kwargs

kwargs

Keyword arguments to the noise_layer_class.

required

Methods:

Name Description
__init__
distillation_context

Prepare the base model to facilitate distillation training by applying losses over the transformed and non-transformed

forward

Call the base_model, applying the noise_layer to the target_parameter or target_layer output.

generate

Generate sequences of token ids using transformed embeddings.

reconstruct_ids_from_embeddings

Reconstruct token ids from embeddings using L2 similarity search on the input embedding layer.

reset_parameters

Reinitialize parameters and buffers.

restore_and_load

Restore the final decoder layers and final normalization layer and move them back to their original devices.

sample_transformed_embeddings

Sample transformed embeddings for the given input token ids.

truncate_and_offload

Remove the decoder layers after truncated_layer_index and the final normalization layer from the model and move them to the

Attributes:

Name Type Description
is_truncated_and_offloaded bool

Whether the model decoder layers are currently truncated.

single_precision_input_embeddings Embedding

A single-precision copy of input embeddings.

target_layer Module

The base_model submodule whose output Tensor to transform.

target_parameter str | None

The name of the base_model input Tensor argument to transform when target_layer is None.

target_parameter_index int

The index of the base_model input Tensor argument to transform when target_layer is None.

is_truncated_and_offloaded property

is_truncated_and_offloaded: bool

Whether the model decoder layers are currently truncated.

single_precision_input_embeddings cached property

single_precision_input_embeddings: Embedding

A single-precision copy of input embeddings.

target_layer property

target_layer: Module

The base_model submodule whose output Tensor to transform.

Raises:

Type Description
ValueError

If _target_layer cannot be found as a submodule of base_model.

target_parameter property

target_parameter: str | None

The name of the base_model input Tensor argument to transform when target_layer is None.

target_parameter_index cached property

target_parameter_index: int

The index of the base_model input Tensor argument to transform when target_layer is None.

__init__

__init__(
    noise_layer_class: Callable[NoiseLayerP, NoiseLayerT],
    base_model: CausalModelT,
    truncated_layer_index: int | None = None,
    *args: args,
    target_layer: str | None = None,
    target_parameter: str | None = None,
    **kwargs: kwargs,
) -> None

Changed in version 0.74.0: The `noise_token_mask` was renamed to `noise_mask` to create a uniform interface everywhere.

distillation_context

distillation_context() -> contextlib.ExitStack

Prepare the base model to facilitate distillation training by applying losses over the transformed and non-transformed activations.

Returns:

Type Description
contextlib.ExitStack

A context manager that detaches the hooks when exited.

Added in version 0.55.0.

forward

forward(
    *args: Any, noise_mask: Tensor, **kwargs: Any
) -> Any

Call the base_model, applying the noise_layer to the target_parameter or target_layer output.

Parameters:

Name Type Description Default

*args

Any

Positional arguments to base_model.

required

noise_mask

Tensor

A mask that selects the elements of the target_layer output to transform. Where the mask is False, the original values of the target are used.

required

**kwargs

Any

Keyword arguments to base_model.

required

Returns:

Type Description
Any

The result of base_model with the noise_layer applied to the target_parameter or target_layer output.

generate

generate(
    inputs: Tensor,
    *args: Any,
    noise_mask: Tensor,
    return_transformed_embeddings: bool = False,
    **kwargs: Any,
) -> torch.Tensor | tuple[torch.Tensor, torch.Tensor]
generate(
    inputs: Tensor,
    *args: Any,
    noise_mask: Tensor,
    return_transformed_embeddings: bool = False,
    **kwargs: Any,
) -> torch.Tensor | tuple[torch.Tensor, torch.Tensor]
generate(
    inputs: Tensor,
    *args: Any,
    noise_mask: Tensor,
    return_transformed_embeddings: bool = False,
    **kwargs: Any,
) -> torch.Tensor | tuple[torch.Tensor, torch.Tensor]
generate(
    inputs: Tensor,
    *args: Any,
    noise_mask: Tensor,
    return_transformed_embeddings: bool = False,
    **kwargs: Any,
) -> torch.Tensor | tuple[torch.Tensor, torch.Tensor]

Generate sequences of token ids using transformed embeddings.

Parameters:

Name Type Description Default

inputs

Tensor

The sequences of input tokens to use as a prompt for generation.

required

*args

Any

Additional positional arguments to the base model's generate method.

required

noise_mask

Tensor

The mask that selects the elements of inputs to transform. Where the mask is False, the values of inputs are passed through to the base model.

required

return_transformed_embeddings

bool

Whether to return the transformed embeddings. Transformed embeddings can be used with stainedglass_core.model.noisy_transformer_masking_model.NoiseMaskedNoisyTransformerModel.reconstruct_ids_from_embeddings and compared against inputs to estimate transformation strength.

False

**kwargs

Any

Additional keyword arguments to the base model's generate method.

required

Returns:

Type Description
torch.Tensor | tuple[torch.Tensor, torch.Tensor]

The generated token ids and optionally the transformed embeddings.

Examples:

>>> from stainedglass_core import metrics as sg_metrics
>>> from stainedglass_core.huggingface import generation as sg_generation
>>> pretrained_model_name_or_path = (
...     "tests/resources/tokenizers/mini-Mistral-7B-Instruct-v0.2"
... )
>>> tokenizer = transformers.AutoTokenizer.from_pretrained(
...     pretrained_model_name_or_path
... )
>>> config = transformers.AutoConfig.from_pretrained(pretrained_model_name_or_path)
>>> base_model = transformers.AutoModelForCausalLM.from_pretrained(
...     pretrained_model_name_or_path,
...     torch_dtype=torch.bfloat16,
... )
>>> noisy_model = NoiseMaskedNoisyTransformerModel(
...     transformer_cloak.TransformerCloak,
...     base_model,
...     target_layer="model.embed_tokens",
...     scale=(1e-8, 1.0),
...     transformer_type=transformers.MistralModel,
...     config_path=pretrained_model_name_or_path,
... )
>>> batch_size, seq_length = 1, 10
>>> input_ids = torch.randint(0, config.vocab_size, (batch_size, seq_length))
>>> attention_mask = torch.hstack(
...     [
...         torch.zeros((batch_size, 2), dtype=torch.bool),
...         torch.ones((batch_size, seq_length - 2), dtype=torch.bool),
...     ]
... )
>>> noise_mask = torch.randint(0, 2, (batch_size, seq_length, 1), dtype=torch.bool)
>>> generation_config = sg_generation.StainedGlassGenerationConfig.from_tokenizer(
...     tokenizer, max_length=seq_length + 1
... )

Generation without Stained Glass Transform:

>>> generated_ids = noisy_model.base_model.generate(
...     inputs=input_ids,
...     generation_config=generation_config,
...     attention_mask=attention_mask,
...     use_cache=True,
... )

Generation with Stained Glass Transform:

>>> generated_ids_from_transformed_embeddings = noisy_model.generate(
...     inputs=input_ids,
...     generation_config=generation_config,
...     attention_mask=attention_mask,
...     use_cache=True,
...     noise_mask=noise_mask,
... )

Decoding the generated ids into text:

>>> generated_text_from_transformed_embeddings = tokenizer.batch_decode(
...     generated_ids_from_transformed_embeddings[:, input_ids.shape[-1] :],
...     skip_special_ids=True,
... )

Using return_transformed_embeddings=True to compare the reconstructed input ids with the original input ids:

>>> generated_ids_from_transformed_embeddings, transformed_embeddings = (
...     noisy_model.generate(
...         inputs=input_ids,
...         generation_config=generation_config,
...         attention_mask=attention_mask,
...         use_cache=True,
...         noise_mask=noise_mask,
...         return_transformed_embeddings=True,
...     )
... )
>>> reconstructed_input_ids = noisy_model.reconstruct_ids_from_embeddings(
...     transformed_embeddings
... )
>>> reconstructed_input_text = tokenizer.batch_decode(
...     reconstructed_input_ids, skip_special_ids=True
... )
>>> percentage_changed_input_ids = sg_metrics.percentage_changed_ids(
...     input_ids, reconstructed_input_ids, noise_mask
... )

Added in version 0.86.0. To support generations with noisy models.

reconstruct_ids_from_embeddings

reconstruct_ids_from_embeddings(
    embeddings: Tensor,
    max_batch_size: int | None = None,
    max_sequence_length: int | None = None,
    max_num_embeddings: int | None = None,
) -> torch.Tensor

Reconstruct token ids from embeddings using L2 similarity search on the input embedding layer.

Smaller values of max_batch_size, max_sequence_length, and max_num_embeddings require less memory to store the intermediate distance calculations but have longer runtimes.

Parameters:

Name Type Description Default

embeddings

Tensor

The embeddings of shape (batch_size, sequence_length, hidden_size) to reconstruct.

required

max_batch_size

int | None

The maximum number of batch elements over which to calculate distances.

None

max_sequence_length

int | None

The maximum number of sequence positions over which to calculate distances.

None

max_num_embeddings

int | None

The maximum number of embeddings over which to calculate distances. The results from each split are recursively merged together.

None

Returns:

Type Description
torch.Tensor

The token ids of shape (batch_size, sequence_length) of the closest embeddings in the input embedding layer to embeddings.

Examples:

>>> pretrained_model_name_or_path = (
...     "tests/resources/tokenizers/mini-Mistral-7B-Instruct-v0.2"
... )
>>> config = transformers.AutoConfig.from_pretrained(pretrained_model_name_or_path)
>>> base_model = transformers.AutoModelForCausalLM.from_pretrained(
...     pretrained_model_name_or_path,
...     torch_dtype=torch.bfloat16,
... )
>>> noisy_model = NoiseMaskedNoisyTransformerModel(
...     transformer_cloak.TransformerCloak,
...     base_model,
...     target_layer="model.embed_tokens",
...     scale=(1e-8, 1.0),
...     transformer_type=transformers.MistralModel,
...     config_path=pretrained_model_name_or_path,
... )
>>> batch_size, seq_length = 1, 10
>>> input_ids = torch.randint(0, config.vocab_size, (batch_size, seq_length))
>>> attention_mask = torch.hstack(
...     [
...         torch.zeros((batch_size, 2), dtype=torch.bool),
...         torch.ones((batch_size, seq_length - 2), dtype=torch.bool),
...     ]
... )
>>> noise_mask = torch.randint(0, 2, (batch_size, seq_length, 1), dtype=torch.bool)
>>> transformed_embeddings = noisy_model.sample_transformed_embeddings(
...     input_ids, noise_mask, attention_mask=attention_mask, use_cache=True
... )
>>> reconstructed_ids = noisy_model.reconstruct_ids_from_embeddings(
...     transformed_embeddings
... )

reset_parameters

reset_parameters() -> None

Reinitialize parameters and buffers.

This method is useful for initializing tensors created on the meta device.

restore_and_load

restore_and_load() -> None

Restore the final decoder layers and final normalization layer and move them back to their original devices.

Raises:

Type Description
ValueError

If the truncated_layer_index is None

sample_transformed_embeddings

sample_transformed_embeddings(
    input_ids: Tensor, noise_mask: Tensor, **kwargs: Any
) -> torch.Tensor

Sample transformed embeddings for the given input token ids.

Parameters:

Name Type Description Default

input_ids

Tensor

The sequences of input tokens to transform.

required

noise_mask

Tensor

The mask that selects the elements of input_ids to transform. Where the mask is False, the values of input_ids are passed through.

required

**kwargs

Any

Additional keyword arguments to the noise layer's forward method.

required

Returns:

Type Description
torch.Tensor

Sampled transformed embeddings.

Examples:

>>> pretrained_model_name_or_path = (
...     "tests/resources/tokenizers/mini-Mistral-7B-Instruct-v0.2"
... )
>>> config = transformers.AutoConfig.from_pretrained(pretrained_model_name_or_path)
>>> base_model = transformers.AutoModelForCausalLM.from_pretrained(
...     pretrained_model_name_or_path,
...     torch_dtype=torch.bfloat16,
... )
>>> noisy_model = NoiseMaskedNoisyTransformerModel(
...     transformer_cloak.TransformerCloak,
...     base_model,
...     target_layer="model.embed_tokens",
...     scale=(1e-8, 1.0),
...     transformer_type=transformers.MistralModel,
...     config_path=pretrained_model_name_or_path,
... )
>>> batch_size, seq_length = 1, 10
>>> input_ids = torch.randint(0, config.vocab_size, (batch_size, seq_length))
>>> attention_mask = torch.hstack(
...     [
...         torch.zeros((batch_size, 2), dtype=torch.bool),
...         torch.ones((batch_size, seq_length - 2), dtype=torch.bool),
...     ]
... )
>>> noise_mask = torch.randint(0, 2, (batch_size, seq_length, 1), dtype=torch.bool)
>>> transformed_embeddings = noisy_model.sample_transformed_embeddings(
...     input_ids, noise_mask, attention_mask=attention_mask, use_cache=True
... )

truncate_and_offload

truncate_and_offload() -> None

Remove the decoder layers after truncated_layer_index and the final normalization layer from the model and move them to the CPU.

Decoder layer truncation improves runtime performance and lowers memory usage, but removes access to the logits layer and thereby sacrifices the computability of metrics such as perplexity.

Raises:

Type Description
ValueError

If the truncated_layer_index is None