transformer_cloak
Classes:
Name | Description |
---|---|
TransformerCloak |
Applies a stochastic transformation to a causal language model embedding |
TransformerBlockEstimator
¶
Bases: Module
, Generic[TransformerT]
Estimates components of sequence dependent noise using a single layer transformer model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
type[TransformerT]
|
The type of transformer model to build a single layer estimator of, e.g. |
required |
|
str
|
The path to the transformers config. |
required |
|
float
|
Initial value of the final |
0.0
|
|
float
|
Dropout probability of the transformer model output. |
0.1
|
|
bool
|
Whether to use a causal or a non-causal attention mask. |
True
|
|
float
|
The scale factor to multiply the initial values of |
0.05
|
|
Any
|
Additional keyword arguments to [ |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If the PyTorch version is <2.0.0 and |
Methods:
Name | Description |
---|---|
__init__ |
|
forward |
Compose the transformer block with a dropout and a linear adapter layer. |
reset_parameters |
Reinitialize parameters and buffers. |
__init__
¶
__init__(
transformer_type: type[TransformerT],
config_path: str,
initial: float = 0.0,
dropout: float = 0.1,
use_causal_mask: bool = True,
initialization_scale: float = 0.05,
**kwargs: Any,
) -> None
Changed in version 0.85.0: Passing param path to transformer cloak was highly error prone and unreasonable for the typical user.
forward
¶
Compose the transformer block with a dropout and a linear adapter layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
Any
|
Positional arguments to the transformer model. |
required |
|
Any
|
Keyword arguments to the transformer model. |
required |
Returns:
Type | Description |
---|---|
torch.Tensor
|
The output of the transformer parameter model. |
Changed in version 0.75.1: The noise mask should always be non-None when using TransformerCloak.
reset_parameters
¶
Reinitialize parameters and buffers.
This method is useful for initializing tensors created on the meta device.
TransformerCloak
¶
Bases: BaseNoiseLayer[TransformerBlockEstimator[TransformerT], Union[CloakStandardDeviationParameterization, DirectStandardDeviationParameterization], Optional[PercentMasker]]
Applies a stochastic transformation to a causal language model embedding Tensor
using TransformerBlockEstimator
,
with standard deviations parameterized by either CloakStandardDeviationParameterization
or DirectStandardDeviationParameterization
,
and optional standard deviation-based input masking using PercentMasker
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
tuple[float, float]
|
The range of standard deviations of the noise. |
required |
|
type[TransformerT]
|
The type of the transformer to build a single layer estimator from. |
required |
|
str
|
Path to transformer config. |
required |
|
float | None
|
The percentage of the input to mask. |
None
|
|
float
|
A fixed temperature like parameter which alters the scale of the standard deviation of the noise. |
1.0
|
|
int | None
|
Seed for the random number generator used to generate noise. |
None
|
|
float
|
Initial values for rhos. |
-3.0
|
|
float
|
Dropout ratio for std parameter model. |
0.0
|
|
float
|
Dropout ratio for mean parameter model. |
0.0
|
|
bool
|
Whether or not the rhos estimator is used to learn rhos (values in R) or standard deviations directly (values in R^+). |
False
|
|
int
|
The number of experts to use for the multilayer perceptron after the attention layer for mean_estimator. The value zero corresponds to not using mixture of experts. |
0
|
|
int
|
The number of experts to use for the multilayer perceptron after the attention layer for std_estimator. The value zero corresponds to not using mixture of experts. |
0
|
|
bool
|
Whether to use a causal or a non-causal attention mask in the llama estimator. |
True
|
|
Any
|
Keyword arguments used to define the transformer parameter models. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If |
ValueError
|
If |
Methods:
Name | Description |
---|---|
__call__ |
Transform the input data. |
__getstate__ |
Prepare a serializable copy of |
__init__ |
|
__init_subclass__ |
Set the default dtype to |
__setstate__ |
Restore from a serialized copy of |
forward |
Transform the input data. |
get_applied_transform_components_factory |
Create a function that returns the elements of the transform components ( |
get_transformed_output_factory |
Create a function that returns the transformed output from the most recent forward pass. |
initial_seed |
Return the initial seed of the CPU device's random number generator. |
manual_seed |
Seed each of the random number generators. |
reset_parameters |
Reinitialize parameters and buffers. |
seed |
Seed each of the random number generators using a non-deterministic random number. |
__call__
¶
Transform the input data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
Tensor
|
The input to transform. |
required |
|
Tensor | None
|
An optional mask that selects the elements of |
None
|
|
Any
|
Additional keyword arguments to the estimator modules. |
required |
__init__
¶
__init__(
scale: tuple[float, float],
transformer_type: type[TransformerT],
config_path: str,
percent_to_mask: float | None = None,
shallow: float = 1.0,
seed: int | None = None,
rho_init: float = -3.0,
std_dropout: float = 0.0,
mean_dropout: float = 0.0,
directly_learn_stds: bool = False,
mean_num_experts: int = 0,
std_num_experts: int = 0,
use_causal_mask: bool = True,
**kwargs: Any,
) -> None
Changed in version 0.85.0: Passing param path to transformer cloak was highly error prone and unreasonable for the typical user.
Changed in version 0.105.0: The std_loss_type argument is deprecated and no longer has any effect.
__init_subclass__
¶
Set the default dtype to torch.float32
inside all subclass __init__
methods.
__setstate__
¶
Restore from a serialized copy of self.__dict__
.
forward
¶
Transform the input data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
Tensor
|
The input to transform. |
required |
|
Tensor | None
|
A mask that selects the elements of |
None
|
|
Any
|
Additional keyword arguments to the estimator modules. |
required |
Returns:
Type | Description |
---|---|
torch.Tensor
|
The transformed input data. |
Raises:
Type | Description |
---|---|
ValueError
|
If the |
Changed in version 0.74.0: The `noise_token_mask` was renamed to `noise_mask` to create a uniform interface everywhere.
get_applied_transform_components_factory
¶
Create a function that returns the elements of the transform components ('mean'
and 'std'
) applied during the most recent
forward pass.
Specifically, the applied elements are those selected by the noise mask (if supplied) and standard deviation mask (if
std_estimator.masker is not None
). If no masks are used, all elements are returned.
The applied transform components are returned flattened.
This function is intended to be used to log histograms of the transform components.
Returns:
Type | Description |
---|---|
Callable[[], dict[str, torch.Tensor]]
|
A function that returns the the elements of the transform components applied during the most recent forward pass. |
Examples:
>>> from torch import nn
>>> from stainedglass_core import model as sg_model, noise_layer as sg_noise_layer
>>> base_model = nn.Linear(20, 2)
>>> noisy_model = sg_model.NoisyModel(
... sg_noise_layer.CloakNoiseLayer1,
... base_model,
... target_parameter="input",
... )
>>> get_applied_transform_components = (
... noisy_model.noise_layer.get_applied_transform_components_factory()
... )
>>> input = torch.ones(1, 20)
>>> noise_mask = torch.tensor(5 * [False] + 15 * [True])
>>> output = noisy_model(input, noise_mask=noise_mask)
>>> applied_transform_components = get_applied_transform_components()
>>> applied_transform_components
{'mean': tensor(...), 'std': tensor(...)}
>>> {
... component_name: component.shape
... for component_name, component in applied_transform_components.items()
... }
{'mean': torch.Size([15]), 'std': torch.Size([15])}
get_transformed_output_factory
¶
Create a function that returns the transformed output from the most recent forward pass.
If super batching is active, only the transformed half of the super batch output is returned.
Returns:
Type | Description |
---|---|
Callable[[], torch.Tensor]
|
A function that returns the transformed output from the most recent forward pass. |
Examples:
>>> from stainedglass_core import noise_layer as sg_noise_layer
>>> noise_layer = sg_noise_layer.CloakNoiseLayer1()
>>> get_transformed_output = noise_layer.get_transformed_output_factory()
>>> input = torch.ones(2, 3, 32, 32)
>>> output = noise_layer(input)
>>> transformed_output = get_transformed_output()
>>> assert output.equal(transformed_output)
initial_seed
¶
Return the initial seed of the CPU device's random number generator.
manual_seed
¶
reset_parameters
¶
Reinitialize parameters and buffers.
This method is useful for initializing tensors created on the meta device.
seed
¶
Seed each of the random number generators using a non-deterministic random number.
transformer_parameter_model
¶
transformer_parameter_model(
transformer_type: type[TransformerT],
config_path: str,
**kwargs: Any,
) -> TransformerT
Create a single block of a transformers.PreTrainedModel and loads the weights from the parameter path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
type[TransformerT]
|
The type of the transformer to use to construct the transformer parameter model. |
required |
|
str
|
Path to transformer config. |
required |
|
Any
|
The keyword arguments to pass to transformers.PreTrainedModel.from_pretrained. |
required |
Returns:
Type | Description |
---|---|
TransformerT
|
A transformer that can be used to estimate rhos/locs. |
Changed in version 0.85.0: Passing param path to transformer cloak was highly error prone and unreasonable for the typical user.