noisy_transformer_model

NoisyTransformerModel ¶

Bases: NoisyModel[PreTrainedModelT, NLP, NL]

Overloads NoisyModel methods to enable adding noise correctly to tensors batched with sequences, specifically Transformers.

config `property` ¶

config: PretrainedConfig

Return the config of the base model.

Returns:

Type	Description
`PretrainedConfig`	The config of the base model.

input_shape `property` ¶

input_shape: tuple[int, ...]

The expected shape input to the base model.

target_layer `property` ¶

target_layer: Module

The base_model layer to which noise is added.

Raises:

Type	Description
`ValueError`	If the target layer cannot be found as a submodule of the base model.

target_parameter `property` ¶

target_parameter: str | None

The base_model.forward parameter to which noise is added.

target_parameter_index `cached` `property` ¶

target_parameter_index: int

The base_model.forward parameter to which noise is added.

forward ¶

forward(*args: Any, **kwargs: Any) -> NoisyModelOutput[Any]

Delegate calls to the base model.

Parameters:

Name	Type	Description	Default
`args`	`Any`	Positional arguments to the base model.	required
`kwargs`	`Any`	Keyword arguments to the base model.	required

Returns:

Type	Description
`NoisyModelOutput[Any]`	The result of the underlying model with noise added to the output of the base model's target layer.

get_extra_state ¶

get_extra_state() -> None

Return the extra state of the model.

gradient_checkpointing_enable ¶

gradient_checkpointing_enable() -> None

Enable gradient checkpointing on the base model.

noise_loss_wrapper ¶

noise_loss_wrapper(criterion: Callable[Concatenate[T, CriterionP], Tensor | dict[str, Tensor]], alpha: float | None, grad_scaler: GradScaler | None = None, backward_wrapper: BackwardWrapper | None = None) -> Callable[Concatenate[NoisyModelOutput[T], CriterionP], dict[str, torch.Tensor]]

Wrap the given criterion with a criterion that optimizes the noise layer.

This method has 2 modes

If alpha is a float between 0.0 and 1.0, the returned criterion interpolates between the original criterion and a noise loss term, with 0.0 devolving to the original criterion and 1.0 devolving to the noise loss term.
If alpha is None, the returned criterion adaptively calculates the noise layer parameter gradient update using the gradients of the original criterion and the noise loss term, optimizing whichever is larger, using only the components of the larger gradient tensor that are orthogonal to the smaller gradient tensor. The loss returned is the original criterion loss, differentiable, but detached from the graph, since the wrapped criterion calls backward() itself.

Note

criterion must either return a torch.Tensor or a dict containing torch.Tensor and must necessarily include the key 'model_loss'.

Note

The noise layer must return a loss tensor in order to optimize the noise layer.

Parameters:

Name	Type	Description	Default
`criterion`	`Callable[Concatenate[T, CriterionP], Tensor \| dict[str, Tensor]]`	The original loss function.	required
`alpha`	`float \| None`	Interpolation factor between the original criterion (0.0) and the noise loss term (1.0). Higher means that noise is learned more quickly and that more noise can be added. This is a model, task, loss function... dependent hyperparameter that, in practice, really does range from 0.0001 to 0.9999. Without prior knowledge, you will need to perform a grid search over different alphas to find the best one for your model and task. Alternatively, if `None`, either the original criterion loss and the noise loss term are adaptively optimized.	required
`grad_scaler`	`GradScaler \| None`	A `GradScaler` object to use to scale the alphaless loss gradients when using automatic mixed precision (AMP).	`None`
`backward_wrapper`	`BackwardWrapper \| None`	A managed `GradScaler` like accelerate.Accelerator or lightning.fabric.fabric.Fabric to use to scale the alphaless loss gradients when using automatic mixed precision (AMP).	`None`

Returns:

Type	Description
`Callable[Concatenate[NoisyModelOutput[T], CriterionP], dict[str, torch.Tensor]]`	A criterion that optimizes the noise layer using the wrapped criterion and the noise layer loss.

Raises:

Type	Description
`ValueError`	If `grad_scaler` and `backward_wrapper` are both specified.
`ValueError`	If `alpha is not None` and it is not between `0.0` and `1.0` exclusive.

Examples:

>>> from stainedglass_core import model as sg_model, noise_layer as sg_noise_layer
>>> model = nn.Linear(2, 2)
>>> model1 = sg_model.NoisyModel(
...     sg_noise_layer.CloakNoiseLayer1, model, input_shape=(-1, 2)
... )
>>> model2 = sg_model.NoisyModel(
...     sg_noise_layer.CloakNoiseLayer2,
...     model,
...     input_shape=(-1, 2),
...     percent_to_mask=0.42,
... )
>>> criterion = nn.functional.mse_loss
>>> input = torch.rand(2, 2)
>>> labels = torch.randint(0, 2, (2, 2), dtype=torch.float32)

Alpha

>>> stainedglass_loss = model1.noise_loss_wrapper(criterion, alpha=0.8)
>>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'noise_loss': tensor(...), 'composite_loss': tensor(...)}
>>> losses["composite_loss"].backward()

>>> stainedglass_loss = model2.noise_loss_wrapper(criterion, alpha=0.8)
>>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'noise_loss': tensor(...), 'composite_loss': tensor(...)}
>>> losses["composite_loss"].backward()

Alphaless

>>> stainedglass_loss = model1.noise_loss_wrapper(criterion, alpha=None)
>>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'composite_loss': tensor(...), 'noise_loss': tensor(...), 'alpha (std_estimator.module.weight)': tensor(...), 'scaling factor (std_estimator.module.weight)': tensor(...)}
>>> losses["composite_loss"].backward()

>>> stainedglass_loss = model2.noise_loss_wrapper(criterion, alpha=None)
            >>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'composite_loss': tensor(...), 'noise_loss': tensor(...)}
>>> losses["composite_loss"].backward()

Alphaless with AMP

>>> import torch.cuda.amp
>>> grad_scaler = torch.cuda.amp.GradScaler()
>>> stainedglass_loss = model1.noise_loss_wrapper(
...     criterion, alpha=None, grad_scaler=grad_scaler
... )
>>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'composite_loss': tensor(...), 'noise_loss': tensor(...), 'alpha (std_estimator.module.weight)': tensor(...), 'scaling factor (std_estimator.module.weight)': tensor(...)}
>>> losses["composite_loss"].backward()

>>> stainedglass_loss = model2.noise_loss_wrapper(
...     criterion, alpha=None, grad_scaler=grad_scaler
... )
>>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'composite_loss': tensor(...), 'noise_loss': tensor(...)}
>>> losses["composite_loss"].backward()

Changed in version 0.76.1: Added `composite_loss` key to the returned losses dictionary when specifying `alpha=None` to maintain a consistent interface between alpha and alphaless training.

reset_parameters ¶

reset_parameters() -> None

Reinitialize parameters and buffers.

This method is useful for initializing tensors created on the meta device.

set_extra_state ¶

set_extra_state(state: None) -> None

Set the extra state contained in the loaded state_dict.

Parameters:

Name	Type	Description	Default
`state`	`None`	The extra state, returned by `get_extra_state`.	required

find_model_inputs ¶

find_model_inputs(model_class: type[Module]) -> list[str]

Find the expected inputs to a model, by extracting forward signature parameters excluding self and kwargs.

Parameters:

Name	Type	Description	Default
`model_class`	`type[Module]`	The model class to find inputs for.	required

Returns:

Type	Description
`list[str]`	The names of the expected inputs to the model.

noisy_transformer_model