Skip to content

noisy_transformer_model

NoisyTransformerModel

Bases: NoisyModel[PreTrainedModelT, NLP, NL]

Overloads NoisyModel methods to enable adding noise correctly to tensors batched with sequences, specifically Transformers.

config property

Return the config of the base model.

Returns:

Type Description
PretrainedConfig

The config of the base model.

input_shape property

input_shape: tuple[int, ...]

The expected shape input to the base model.

target_layer property

target_layer: Module

The base_model layer to which noise is added.

target_parameter property

target_parameter: str | None

The base_model.forward parameter to which noise is added.

target_parameter_index cached property

target_parameter_index: int

The base_model.forward parameter to which noise is added.

forward

forward(*args: Any, **kwargs: Any) -> NoisyModelOutput[Any]

Delegate calls to the base model.

Parameters:

Name Type Description Default
args Any

Inputs to the base model.

required
kwargs Any

Keyword arguments to the base model.

required

Returns:

Type Description
NoisyModelOutput[Any]

The result of the underlying model with noise added to the output of the base model's target layer.

from_pretrained classmethod

from_pretrained(save_directory: str | Path, base_model_directory: str | Path | None = None, **kwargs: Any) -> Self

Load the model from save_pretrained directory, and optionally load the base model from a different directory.

Mirrors the from_pretrained method of the huggingface transformers models so as to be compatible with their api calls.

Parameters:

Name Type Description Default
save_directory str | Path

The path to the saved model.

required
base_model_directory str | Path | None

The path to the saved base model, if not the same as save_directory.

None
**kwargs Any

Keyword arguments to pass to the base model's from_pretrained method.

required

Returns:

Type Description
Self

The loaded model.

get_extra_state

get_extra_state() -> NoisyTransformerModelExtraState[PreTrainedModelT, noisy_model.NLP, noisy_model.NL]

Return the extra state of the model.

Returns:

Type Description
NoisyTransformerModelExtraState[PreTrainedModelT, noisy_model.NLP, noisy_model.NL]

The extra state of the model.

gradient_checkpointing_enable

gradient_checkpointing_enable() -> None

Enable gradient checkpointing on the base model.

noise_loss_wrapper

noise_loss_wrapper(criterion: Callable[Concatenate[T, CriterionP], Tensor | dict[str, Tensor]], alpha: float | None, grad_scaler: GradScaler | None = None, backward_wrapper: BackwardWrapper | None = None) -> Callable[Concatenate[NoisyModelOutput[T], CriterionP], dict[str, torch.Tensor]]

Wrap the given criterion with a criterion that optimizes the noise layer.

This method has 2 modes
  1. If alpha is a float between 0.0 and 1.0, the returned criterion interpolates between the original criterion and a noise loss term, with 0.0 devolving to the original criterion and 1.0 devolving to the noise loss term.
  2. If alpha is None, the returned criterion adaptively calculates the noise layer parameter gradient update using the gradients of the original criterion and the noise loss term, optimizing whichever is larger, using only the components of the larger gradient tensor that are orthogonal to the smaller gradient tensor. The loss returned is the original criterion loss, differentiable, but detached from the graph, since the wrapped criterion calls backward() itself.
Note

criterion must either return a torch.Tensor or a dict containing torch.Tensor and must necessarily include the key 'model_loss'.

Note

The noise layer must return a loss tensor in order to optimize the noise layer.

Parameters:

Name Type Description Default
criterion Callable[Concatenate[T, CriterionP], Tensor | dict[str, Tensor]]

The original loss function.

required
alpha float | None

Interpolation factor between the original criterion (0.0) and the noise loss term (1.0). Higher means that noise is learned more quickly and that more noise can be added. This is a model, task, loss function... dependent hyperparameter that, in practice, really does range from 0.0001 to 0.9999. Without prior knowledge, you will need to perform a grid search over different alphas to find the best one for your model and task. Alternatively, if None, either the original criterion loss and the noise loss term are adaptively optimized.

required
grad_scaler GradScaler | None

A GradScaler object to use to scale the alphaless loss gradients when using automatic mixed precision (AMP).

None
backward_wrapper BackwardWrapper | None

A managed GradScaler like accelerate.Accelerator or lightning.fabric.fabric.Fabric to use to scale the alphaless loss gradients when using automatic mixed precision (AMP).

None

Returns:

Type Description
Callable[Concatenate[NoisyModelOutput[T], CriterionP], dict[str, torch.Tensor]]

A criterion that optimizes the noise layer using the wrapped criterion and the noise layer loss.

Raises:

Type Description
ValueError

If grad_scaler and backward_wrapper are both specified.

ValueError

If alpha is not None and it is not between 0.0 and 1.0 exclusive.

Examples:

>>> from stainedglass_core import model as sg_model, noise_layer as sg_noise_layer
>>> model = nn.Linear(2, 2)
>>> model1 = sg_model.NoisyModel(
...     sg_noise_layer.CloakNoiseLayer1, model, input_shape=(-1, 2)
... )
>>> model2 = sg_model.NoisyModel(
...     sg_noise_layer.CloakNoiseLayer2,
...     model,
...     input_shape=(-1, 2),
...     percent_to_mask=0.42,
... )
>>> criterion = nn.functional.mse_loss
>>> input = torch.rand(2, 2)
>>> labels = torch.randint(0, 2, (2, 2), dtype=torch.float32)

Alpha

>>> stainedglass_loss = model1.noise_loss_wrapper(criterion, alpha=0.8)
>>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'noise_loss': tensor(...), 'composite_loss': tensor(...)}
>>> losses["composite_loss"].backward()
>>> stainedglass_loss = model2.noise_loss_wrapper(criterion, alpha=0.8)
>>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'noise_loss': tensor(...), 'composite_loss': tensor(...)}
>>> losses["composite_loss"].backward()

Alphaless

>>> stainedglass_loss = model1.noise_loss_wrapper(criterion, alpha=None)
>>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'composite_loss': tensor(...), 'noise_loss': tensor(...), 'alpha (std_estimator.module.weight)': tensor(...), 'scaling factor (std_estimator.module.weight)': tensor(...)}
>>> losses["composite_loss"].backward()
>>> stainedglass_loss = model2.noise_loss_wrapper(criterion, alpha=None)
            >>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'composite_loss': tensor(...), 'noise_loss': tensor(...)}
>>> losses["composite_loss"].backward()

Alphaless with AMP

>>> import torch.cuda.amp
>>> grad_scaler = torch.cuda.amp.GradScaler()
>>> stainedglass_loss = model1.noise_loss_wrapper(
...     criterion, alpha=None, grad_scaler=grad_scaler
... )
>>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'composite_loss': tensor(...), 'noise_loss': tensor(...), 'alpha (std_estimator.module.weight)': tensor(...), 'scaling factor (std_estimator.module.weight)': tensor(...)}
>>> losses["composite_loss"].backward()
>>> stainedglass_loss = model2.noise_loss_wrapper(
...     criterion, alpha=None, grad_scaler=grad_scaler
... )
>>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'composite_loss': tensor(...), 'noise_loss': tensor(...)}
>>> losses["composite_loss"].backward()

Changed in version 0.76.1: Added `composite_loss` key to the returned losses dictionary when specifying `alpha=None` to maintain a consistent interface between alpha and alphaless training.

save_pretrained

save_pretrained(save_directory: str | Path, only_noise_layer: bool = False, **kwargs: Any) -> None

Save the model to a directory.

Mirrors the save_pretrained method of the huggingface transformers models so as to be compatible with their api calls.

Parameters:

Name Type Description Default
save_directory str | Path

The directory to save the model to.

required
only_noise_layer bool

Whether to only save the noise layer, or also the base model.

False
**kwargs Any

Keyword arguments to pass to the base model's save_pretrained method.

required

set_extra_state

set_extra_state(state: NoisyTransformerModelExtraState[PreTrainedModelT, NLP, NL]) -> None

Set the extra state contained in the loaded state_dict.

Parameters:

Name Type Description Default
state NoisyTransformerModelExtraState[PreTrainedModelT, NLP, NL]

The extra state, returned by get_extra_state.

required

NoisyTransformerModelExtraState

Bases: TypedDict, Generic[PreTrainedModelT, NLP, NL]

Extra state for NoisyTransformerModel.

Used to hold information necessary to reconstruct the model when using from_pretrained (to simulate transformers).

Attributes:

Name Type Description
base_model_class type[PreTrainedModelT]

The class of the base transformers model.

base_model_config PretrainedConfig

The config of the base transformers model.

noise_layer_class NoiseLayerConstructor[NLP, NL]

The class of the noise layer.

noise_layer_constructor_args tuple[Any, ...]

The positional arguments to the noise layer constructor.

noise_layer_constructor_kwargs dict[str, Any]

The keyword arguments to the noise layer constructor.

input_shape tuple[int, ...]

The expected input shape to the base model.

target_layer str

The name of the target layer.

find_model_inputs

find_model_inputs(model_class: type[Module]) -> list[str]

Find the expected inputs to a model, by extracting forward signature parameters excluding self and kwargs.

Parameters:

Name Type Description Default
model_class type[Module]

The model class to find inputs for.

required

Returns:

Type Description
list[str]

The names of the expected inputs to the model.