Skip to content

noisy_model

BackwardWrapper

Bases: Protocol

Interface for managed grad scalers, like accelerate.Accelerator or lightning.fabric.fabric.Fabric.

backward abstractmethod

backward(loss: Tensor, /, **kwargs: Any) -> None

Perform a backward pass with the given loss tensor.

Parameters:

Name Type Description Default
loss Tensor

The loss tensor to backpropagate.

required
kwargs Any

Keyword arguments to the backward pass.

required

NoisyModel

Bases: SGModel[M], Generic[M, NLP, NL]

Wrapper class that adds noise to the output of an arbitrary layer of the base model.

Parameters:

Name Type Description Default
noise_layer_class NoiseLayerConstructor[NLP, NL]

The type of noise that is added to the given model.

required
base_model M

The model to add noise to.

required
input_shape tuple[int, ...]

The shape of the model input; used to infer the shape of the noise layer.

required
target_layer str

Name of the layer whose output noise will be added to. A submodule of the model may be specified by providing the .-delimited name, e.g. features.0.conv.1.2 (default: 'input').

'input'
target_parameter str | None

If the target layer is the input, the keyword parameter to which noise is added (default: None). By default, noise is added to the first positional parameter of the model's forward method.

None
*args args

Positional arguments to the noise_layer_class.

()
**kwargs kwargs

Keyword arguments to the noise_layer_class.

{}

Raises:

Type Description
AttributeError

If the target_layer does not exist, or if the target layer already has a noise_layer attribute.

ValueError

If the target_layer is not called from model.forward() and its size cannot be determined.

input_shape property

input_shape: tuple[int, ...]

The expected shape input to the base model.

target_layer property

target_layer: Module

The base_model layer to which noise is added.

target_parameter property

target_parameter: str | None

The base_model.forward parameter to which noise is added.

target_parameter_index cached property

target_parameter_index: int

The base_model.forward parameter to which noise is added.

forward

forward(*args: Any, **kwargs: Any) -> NoisyModelOutput[Any]

Delegate calls to the base model.

Parameters:

Name Type Description Default
args Any

Inputs to the base model.

required
kwargs Any

Keyword arguments to the base model.

required

Returns:

Type Description
NoisyModelOutput[Any]

The result of the underlying model with noise added to the output of the base model's target layer.

noise_loss_wrapper

noise_loss_wrapper(criterion: Callable[Concatenate[T, CriterionP], Tensor | dict[str, Tensor]], alpha: float | None, grad_scaler: GradScaler | None = None, backward_wrapper: BackwardWrapper | None = None) -> Callable[Concatenate[NoisyModelOutput[T], CriterionP], dict[str, torch.Tensor]]

Wrap the given criterion with a criterion that optimizes the noise layer.

This method has 2 modes
  1. If alpha is a float between 0.0 and 1.0, the returned criterion interpolates between the original criterion and a noise loss term, with 0.0 devolving to the original criterion and 1.0 devolving to the noise loss term.
  2. If alpha is None, the returned criterion adaptively calculates the noise layer parameter gradient update using the gradients of the original criterion and the noise loss term, optimizing whichever is larger, using only the components of the larger gradient tensor that are orthogonal to the smaller gradient tensor. The loss returned is the original criterion loss, differentiable, but detached from the graph, since the wrapped criterion calls backward() itself.
Note

criterion must either return a torch.Tensor or a dict containing torch.Tensor and must necessarily include the key 'model_loss'.

Note

The noise layer must return a loss tensor in order to optimize the noise layer.

Parameters:

Name Type Description Default
criterion Callable[Concatenate[T, CriterionP], Tensor | dict[str, Tensor]]

The original loss function.

required
alpha float | None

Interpolation factor between the original criterion (0.0) and the noise loss term (1.0). Higher means that noise is learned more quickly and that more noise can be added. This is a model, task, loss function... dependent hyperparameter that, in practice, really does range from 0.0001 to 0.9999. Without prior knowledge, you will need to perform a grid search over different alphas to find the best one for your model and task. Alternatively, if None, either the original criterion loss and the noise loss term are adaptively optimized.

required
grad_scaler GradScaler | None

A GradScaler object to use to scale the alphaless loss gradients when using automatic mixed precision (AMP).

None
backward_wrapper BackwardWrapper | None

A managed GradScaler like accelerate.Accelerator or lightning.fabric.fabric.Fabric to use to scale the alphaless loss gradients when using automatic mixed precision (AMP).

None

Returns:

Type Description
Callable[Concatenate[NoisyModelOutput[T], CriterionP], dict[str, torch.Tensor]]

A criterion that optimizes the noise layer using the wrapped criterion and the noise layer loss.

Raises:

Type Description
ValueError

If grad_scaler and backward_wrapper are both specified.

ValueError

If alpha is not None and it is not between 0.0 and 1.0 exclusive.

Examples:

>>> from stainedglass_core import model as sg_model, noise_layer as sg_noise_layer
>>> model = nn.Linear(2, 2)
>>> model1 = sg_model.NoisyModel(
...     sg_noise_layer.CloakNoiseLayer1, model, input_shape=(-1, 2)
... )
>>> model2 = sg_model.NoisyModel(
...     sg_noise_layer.CloakNoiseLayer2,
...     model,
...     input_shape=(-1, 2),
...     percent_to_mask=0.42,
... )
>>> criterion = nn.functional.mse_loss
>>> input = torch.rand(2, 2)
>>> labels = torch.randint(0, 2, (2, 2), dtype=torch.float32)

Alpha

>>> stainedglass_loss = model1.noise_loss_wrapper(criterion, alpha=0.8)
>>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'noise_loss': tensor(...), 'composite_loss': tensor(...)}
>>> losses["composite_loss"].backward()
>>> stainedglass_loss = model2.noise_loss_wrapper(criterion, alpha=0.8)
>>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'noise_loss': tensor(...), 'composite_loss': tensor(...)}
>>> losses["composite_loss"].backward()

Alphaless

>>> stainedglass_loss = model1.noise_loss_wrapper(criterion, alpha=None)
>>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'composite_loss': tensor(...), 'noise_loss': tensor(...), 'alpha (std_estimator.module.weight)': tensor(...), 'scaling factor (std_estimator.module.weight)': tensor(...)}
>>> losses["composite_loss"].backward()
>>> stainedglass_loss = model2.noise_loss_wrapper(criterion, alpha=None)
            >>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'composite_loss': tensor(...), 'noise_loss': tensor(...)}
>>> losses["composite_loss"].backward()

Alphaless with AMP

>>> import torch.cuda.amp
>>> grad_scaler = torch.cuda.amp.GradScaler()
>>> stainedglass_loss = model1.noise_loss_wrapper(
...     criterion, alpha=None, grad_scaler=grad_scaler
... )
>>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'composite_loss': tensor(...), 'noise_loss': tensor(...), 'alpha (std_estimator.module.weight)': tensor(...), 'scaling factor (std_estimator.module.weight)': tensor(...)}
>>> losses["composite_loss"].backward()
>>> stainedglass_loss = model2.noise_loss_wrapper(
...     criterion, alpha=None, grad_scaler=grad_scaler
... )
>>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'composite_loss': tensor(...), 'noise_loss': tensor(...)}
>>> losses["composite_loss"].backward()

Changed in version 0.76.1: Added `composite_loss` key to the returned losses dictionary when specifying `alpha=None` to maintain a consistent interface between alpha and alphaless training.

NoisyModelOutput dataclass

Bases: SGModelOutput[T]

The output of NoisyModel.forward().

__init_subclass__

__init_subclass__() -> None

Register subclasses as pytree nodes.

This is necessary to synchronize gradients when using torch.nn.parallel.DistributedDataParallel(static_graph=True) with modules that output ModelOutput subclasses.

See: https://github.com/pytorch/pytorch/issues/106690.

to_tuple

to_tuple() -> tuple[Any, ...]

Convert self to a tuple containing all the attributes/keys that are not None.

Returns:

Type Description
tuple[Any, ...]

A tuple of all attributes/keys that are not None.

append_noise_loss_wrapper

append_noise_loss_wrapper(criterion: Callable[Concatenate[T, CriterionP], Tensor | dict[str, Tensor]]) -> Callable[Concatenate[NoisyModelOutput[T], CriterionP], dict[str, torch.Tensor]]

Wrap a loss function to accept an NoisyModelOutput as its first argument.

Note

criterion must either return a torch.Tensor or a dict containing torch.Tensor and must necessarily include the key 'model_loss'.

Parameters:

Name Type Description Default
criterion Callable[Concatenate[T, CriterionP], Tensor | dict[str, Tensor]]

The loss function to wrap.

required

Returns:

Type Description
Callable[Concatenate[NoisyModelOutput[T], CriterionP], dict[str, torch.Tensor]]

A function that accepts a NoisyModelOutput as its first argument, and passes the base_model_output to the wrapped loss

Callable[Concatenate[NoisyModelOutput[T], CriterionP], dict[str, torch.Tensor]]

function and adds the noise_layer_loss to the result.