noisy_transformer_model
NoisyTransformerModel
¶
Bases: NoisyModel[PreTrainedModelT, NLP, NL]
Overloads NoisyModel
methods to enable adding noise correctly to tensors batched with
sequences, specifically Transformers.
config
property
¶
config: PretrainedConfig
Return the config of the base model.
Returns:
Type | Description |
---|---|
PretrainedConfig
|
The config of the base model. |
target_layer
property
¶
target_layer: Module
The base_model
layer to which noise is added.
Raises:
Type | Description |
---|---|
ValueError
|
If the target layer cannot be found as a submodule of the base model. |
target_parameter
property
¶
target_parameter: str | None
The base_model.forward
parameter to which noise is added.
target_parameter_index
cached
property
¶
target_parameter_index: int
The base_model.forward
parameter to which noise is added.
forward
¶
Delegate calls to the base model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
args |
Any
|
Positional arguments to the base model. |
required |
kwargs |
Any
|
Keyword arguments to the base model. |
required |
Returns:
Type | Description |
---|---|
NoisyModelOutput[Any]
|
The result of the underlying model with noise added to the output of the base model's target layer. |
gradient_checkpointing_enable
¶
Enable gradient checkpointing on the base model.
noise_loss_wrapper
¶
noise_loss_wrapper(criterion: Callable[Concatenate[T, CriterionP], Tensor | dict[str, Tensor]], alpha: float | None, grad_scaler: GradScaler | None = None, backward_wrapper: BackwardWrapper | None = None) -> Callable[Concatenate[NoisyModelOutput[T], CriterionP], dict[str, torch.Tensor]]
Wrap the given criterion with a criterion that optimizes the noise layer.
This method has 2 modes
- If
alpha
is afloat
between0.0
and1.0
, the returned criterion interpolates between the original criterion and a noise loss term, with0.0
devolving to the original criterion and1.0
devolving to the noise loss term. - If
alpha
isNone
, the returned criterion adaptively calculates the noise layer parameter gradient update using the gradients of the original criterion and the noise loss term, optimizing whichever is larger, using only the components of the larger gradient tensor that are orthogonal to the smaller gradient tensor. The loss returned is the original criterion loss, differentiable, but detached from the graph, since the wrapped criterion callsbackward()
itself.
Note
criterion
must either return a torch.Tensor
or a dict
containing torch.Tensor
and must necessarily include the key
'model_loss'.
Note
The noise layer must return a loss tensor in order to optimize the noise layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
criterion |
Callable[Concatenate[T, CriterionP], Tensor | dict[str, Tensor]]
|
The original loss function. |
required |
alpha |
float | None
|
Interpolation factor between the original criterion (0.0) and the noise loss term (1.0). Higher means that noise is
learned more quickly and that more noise can be added. This is a model, task, loss function... dependent hyperparameter
that, in practice, really does range from 0.0001 to 0.9999. Without prior knowledge, you will need to perform a grid search
over different alphas to find the best one for your model and task. Alternatively, if |
required |
grad_scaler |
GradScaler | None
|
A |
None
|
backward_wrapper |
BackwardWrapper | None
|
A managed |
None
|
Returns:
Type | Description |
---|---|
Callable[Concatenate[NoisyModelOutput[T], CriterionP], dict[str, torch.Tensor]]
|
A criterion that optimizes the noise layer using the wrapped criterion and the noise layer loss. |
Raises:
Type | Description |
---|---|
ValueError
|
If |
ValueError
|
If |
Examples:
>>> from stainedglass_core import model as sg_model, noise_layer as sg_noise_layer
>>> model = nn.Linear(2, 2)
>>> model1 = sg_model.NoisyModel(
... sg_noise_layer.CloakNoiseLayer1, model, input_shape=(-1, 2)
... )
>>> model2 = sg_model.NoisyModel(
... sg_noise_layer.CloakNoiseLayer2,
... model,
... input_shape=(-1, 2),
... percent_to_mask=0.42,
... )
>>> criterion = nn.functional.mse_loss
>>> input = torch.rand(2, 2)
>>> labels = torch.randint(0, 2, (2, 2), dtype=torch.float32)
Alpha
>>> stainedglass_loss = model1.noise_loss_wrapper(criterion, alpha=0.8)
>>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'noise_loss': tensor(...), 'composite_loss': tensor(...)}
>>> losses["composite_loss"].backward()
>>> stainedglass_loss = model2.noise_loss_wrapper(criterion, alpha=0.8)
>>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'noise_loss': tensor(...), 'composite_loss': tensor(...)}
>>> losses["composite_loss"].backward()
Alphaless
>>> stainedglass_loss = model1.noise_loss_wrapper(criterion, alpha=None)
>>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'composite_loss': tensor(...), 'noise_loss': tensor(...), 'alpha (std_estimator.module.weight)': tensor(...), 'scaling factor (std_estimator.module.weight)': tensor(...)}
>>> losses["composite_loss"].backward()
>>> stainedglass_loss = model2.noise_loss_wrapper(criterion, alpha=None)
>>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'composite_loss': tensor(...), 'noise_loss': tensor(...)}
>>> losses["composite_loss"].backward()
Alphaless with AMP
>>> import torch.cuda.amp
>>> grad_scaler = torch.cuda.amp.GradScaler()
>>> stainedglass_loss = model1.noise_loss_wrapper(
... criterion, alpha=None, grad_scaler=grad_scaler
... )
>>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'composite_loss': tensor(...), 'noise_loss': tensor(...), 'alpha (std_estimator.module.weight)': tensor(...), 'scaling factor (std_estimator.module.weight)': tensor(...)}
>>> losses["composite_loss"].backward()
>>> stainedglass_loss = model2.noise_loss_wrapper(
... criterion, alpha=None, grad_scaler=grad_scaler
... )
>>> losses = stainedglass_loss(model1(input), labels)
>>> losses
{'model_loss': tensor(...), 'composite_loss': tensor(...), 'noise_loss': tensor(...)}
>>> losses["composite_loss"].backward()
Changed in version 0.76.1: Added `composite_loss` key to the returned losses dictionary when specifying `alpha=None` to maintain a consistent interface between alpha and alphaless training.
reset_parameters
¶
Reinitialize parameters and buffers.
This method is useful for initializing tensors created on the meta device.
set_extra_state
¶
Set the extra state contained in the loaded state_dict.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state |
None
|
The extra state, returned by |
required |
find_model_inputs
¶
Find the expected inputs to a model, by extracting forward signature parameters excluding self and kwargs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_class |
type[Module]
|
The model class to find inputs for. |
required |
Returns:
Type | Description |
---|---|
list[str]
|
The names of the expected inputs to the model. |