transformer_cloak
Module for Transformer Cloak noise layers.
Classes:
| Name | Description |
|---|---|
TransformerCloak |
Applies a stochastic transformation to a causal language model embedding |
TransformerBlockEstimator
¶
Bases: Module, Generic[TransformerT]
Estimates components of sequence dependent noise using a single layer transformer model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
type[TransformerT]
|
The type of transformer model to build a single layer estimator of, e.g. |
required |
|
PretrainedConfig
|
Transformers config. |
required |
|
float
|
Initial value of the final |
0.0
|
|
float
|
Dropout probability of the transformer model output. |
0.1
|
|
float
|
The scale factor to multiply the initial values of |
FIVE_PERCENT
|
|
dtype
|
The torch dtype to initialize the transformer block estimator with. |
float32
|
|
int
|
The number of hidden layers to use in the transformer of the |
1
|
|
SupportedAttentionImplementationsType
|
The attention implementation to used. Supported values are |
None
|
|
bool
|
Whether to trust remote code when loading from HuggingFace Hub. |
False
|
Changed in version v1.3.0: Added support for multilayer estimators.
Changed in version v1.13.0: Added support for SGT4Text explicitly setting attention implementation.
Changed in version v3.1.0: Added `trust_remote_code` parameter to allow deserialization of third party tokenizers and models.
Methods:
| Name | Description |
|---|---|
forward |
Compose the transformer block with a dropout and a linear adapter layer. |
reset_parameters |
Reinitialize parameters and buffers. |
tensor_parallel |
Tensor parallelize the model across the given device mesh. |
forward
¶
Compose the transformer block with a dropout and a linear adapter layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
Any
|
Positional arguments to the transformer model. |
required |
|
Any
|
Keyword arguments to the transformer model. |
required |
Returns:
| Type | Description |
|---|---|
torch.Tensor
|
The output of the transformer parameter model. |
reset_parameters
¶
Reinitialize parameters and buffers.
This method is useful for initializing tensors created on the meta device.
tensor_parallel
¶
tensor_parallel(mesh: DeviceMesh) -> None
Tensor parallelize the model across the given device mesh.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
DeviceMesh
|
The tensor parallel device mesh. |
required |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If the transformer does not support tensor parallelism. |
TransformerCloak
¶
Bases: BaseNoiseLayer[TransformerBlockEstimator[TransformerT], CloakStandardDeviationParameterization | DirectStandardDeviationParameterization, PercentMasker]
Applies a stochastic transformation to a causal language model embedding Tensor using TransformerBlockEstimator,
with standard deviations parameterized by either CloakStandardDeviationParameterization
or DirectStandardDeviationParameterization,
and optional standard deviation-based input masking using PercentMasker.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
tuple[float, float]
|
The range of standard deviations of the noise. |
required |
|
type[TransformerT]
|
The type of the transformer to build a single layer estimator from. |
required |
|
PretrainedConfig | str | None
|
A |
None
|
|
str | None
|
A filepath that can be loaded via |
None
|
|
float | None
|
The percentage of the input to mask. |
None
|
|
float
|
A fixed temperature like parameter which alters the scale of the standard deviation of the noise. |
1.0
|
|
int | None
|
Seed for the random number generator used to generate noise. |
None
|
|
float
|
Initial values for rhos. |
-3.0
|
|
float
|
Dropout ratio for std parameter model. |
0.0
|
|
float
|
Dropout ratio for mean parameter model. |
0.0
|
|
bool
|
Whether or not the rhos estimator is used to learn rhos (values in R) or standard deviations directly (values in R^+). |
False
|
|
dtype | None
|
The dtype of the noise layer. |
None
|
|
int
|
The number of hidden layers to use in the transformer model of the |
1
|
|
SupportedAttentionImplementationsType
|
The attention implementation to used. Supported values are |
None
|
|
bool
|
Whether to trust remote code when loading from HuggingFace Hub. |
False
|
|
Any
|
Keyword arguments used to define the transformer parameter models. Ignored if |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
ValueError
|
If |
Methods:
| Name | Description |
|---|---|
__call__ |
Transform the input data. |
__getstate__ |
Prepare a JSON-serializable copy of the noise layer's state. |
__init__ |
|
__setstate__ |
Set the state of the object. |
forward |
Transform the input data. |
get_applied_transform_components_factory |
Create a function that returns the elements of the transform components ( |
get_transformed_output_factory |
Create a function that returns the transformed output from the most recent forward pass. |
initial_seed |
Return the initial seed of the CPU device's random number generator. |
manual_seed |
Seed each of the random number generators. |
reset_parameters |
Reinitialize parameters and buffers. |
seed |
Seed each of the random number generators using a non-deterministic random number. |
tensor_parallel |
Tensor parallelize the model across the given device mesh. |
__call__
¶
Transform the input data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
Tensor
|
The input to transform. |
required |
|
Tensor | None
|
An optional mask that selects the elements of |
None
|
|
Any
|
Additional keyword arguments to the estimator modules. |
required |
__getstate__
¶
Prepare a JSON-serializable copy of the noise layer's state.
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
A dictionary containing the configuration of the noise layer, including its type string, the state dict, and the generator |
dict[str, Any]
|
states if they exist. |
Changed in version v1.13.0: Added support for SGT4Text explicitly setting attention implementation.
Changed in version v3.15.0: Added serialization support for all noise layers.
__init__
¶
__init__(
scale: tuple[float, float],
transformer_type: type[TransformerT],
config: PretrainedConfig | str | None = None,
config_path: str | None = None,
percent_to_mask: float | None = None,
shallow: float = 1.0,
seed: int | None = None,
rho_init: float = -3.0,
std_dropout: float = 0.0,
mean_dropout: float = 0.0,
directly_learn_stds: bool = False,
noise_layer_dtype: dtype | None = None,
num_hidden_layers: int = 1,
noise_layer_attention: SupportedAttentionImplementationsType = None,
trust_remote_code: bool = False,
**kwargs: Any
) -> None
Changed in version v1.3.0: Added support for multilayer estimators.
Changed in version v1.13.0: Added support for SGT4Text explicitly setting attention implementation.
Changed in version v3.1.0: Added `trust_remote_code` parameter to allow deserialization of third party tokenizers and models.
__setstate__
¶
__setstate__(
state: dict[str, Any],
trust_remote_code: bool = False,
third_party_model_path: (
str | PathLike[str] | None
) = None,
) -> None
Set the state of the object.
state_dict and _generators are both optional keys, and will be restored if they exist in the state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
dict[str, Any]
|
The state to set. |
required |
|
bool
|
Whether to trust remote code when loading from HuggingFace Hub. |
False
|
|
str | PathLike[str] | None
|
The path or huggingface reference to a third-party model to load. This is useful when loading SGTs whose internal structure depends on transformers which are not importable directly through transformers, but are present on the Hugging Face Hub. |
None
|
Changed in version v3.15.0: Added serialization support for all noise layers.
forward
¶
Transform the input data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
Tensor
|
The input to transform. |
required |
|
Tensor | None
|
A mask that selects the elements of |
None
|
|
Any
|
Additional keyword arguments to the estimator modules. |
required |
Returns:
| Type | Description |
|---|---|
torch.Tensor
|
The transformed input data. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the |
get_applied_transform_components_factory
¶
Create a function that returns the elements of the transform components ('mean' and 'std') applied during the most recent
forward pass.
Specifically, the applied elements are those selected by the noise mask (if supplied) and standard deviation mask (if
std_estimator.masker is not None). If no masks are used, all elements are returned.
The applied transform components are returned flattened.
This function is intended to be used to log histograms of the transform components.
Returns:
| Type | Description |
|---|---|
Callable[[], dict[str, torch.Tensor]]
|
A function that returns the the elements of the transform components applied during the most recent forward pass. |
Examples:
>>> from torch import nn
>>> from stainedglass_core import model as sg_model, noise_layer as sg_noise_layer
>>> base_model = nn.Linear(20, 2)
>>> noisy_model = sg_model.NoisyModel(
... sg_noise_layer.CloakNoiseLayer1,
... base_model,
... target_parameter="input",
... )
>>> get_applied_transform_components = (
... noisy_model.noise_layer.get_applied_transform_components_factory()
... )
>>> input = torch.ones(1, 20)
>>> noise_mask = torch.tensor(5 * [False] + 15 * [True])
>>> output = noisy_model(input, noise_mask=noise_mask)
>>> applied_transform_components = get_applied_transform_components()
>>> applied_transform_components
{'mean': tensor(...), 'std': tensor(...)}
>>> {
... component_name: component.shape
... for component_name, component in applied_transform_components.items()
... }
{'mean': torch.Size([15]), 'std': torch.Size([15])}
get_transformed_output_factory
¶
Create a function that returns the transformed output from the most recent forward pass.
If super batching is active, only the transformed half of the super batch output is returned.
Returns:
| Type | Description |
|---|---|
Callable[[], torch.Tensor]
|
A function that returns the transformed output from the most recent forward pass. |
Examples:
>>> from stainedglass_core import noise_layer as sg_noise_layer
>>> noise_layer = sg_noise_layer.CloakNoiseLayer1()
>>> get_transformed_output = noise_layer.get_transformed_output_factory()
>>> input = torch.ones(2, 3, 32, 32)
>>> output = noise_layer(input)
>>> transformed_output = get_transformed_output()
>>> assert output.equal(transformed_output)
initial_seed
¶
Return the initial seed of the CPU device's random number generator.
manual_seed
¶
manual_seed(
seed: int | None, rank_dependent: bool = True
) -> None
Seed each of the random number generators.
Setting seed to None will destroy any existing generators.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
int | None
|
The seed to set. |
required |
|
bool
|
Whether to add the distributed rank to the seed to ensure that each process samples different noise. |
True
|
reset_parameters
¶
Reinitialize parameters and buffers.
This method is useful for initializing tensors created on the meta device.
seed
¶
Seed each of the random number generators using a non-deterministic random number.
tensor_parallel
¶
tensor_parallel(mesh: DeviceMesh) -> None
Tensor parallelize the model across the given device mesh.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
DeviceMesh
|
The tensor parallel device mesh. |
required |
transformer_parameter_model
¶
transformer_parameter_model(
transformer_type: type[TransformerT],
config: PretrainedConfig,
num_hidden_layers: int = 1,
attn_implementation: SupportedAttentionImplementationsType = None,
trust_remote_code: bool = False,
) -> TransformerT
Create a single block of a transformers.PreTrainedModel and loads the weights from the parameter path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
type[TransformerT]
|
The type of the transformer to use to construct the transformer parameter model. |
required |
|
PretrainedConfig
|
Transformer config. |
required |
|
int
|
The number of hidden layers to use in the transformer model. |
1
|
|
SupportedAttentionImplementationsType
|
The attention implementation to used. Supported values are |
None
|
|
bool
|
Whether to trust remote code when loading from HuggingFace Hub. |
False
|
Returns:
| Type | Description |
|---|---|
TransformerT
|
A transformer that can be used to estimate rhos/locs. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the attention implementation is not supported. |
TypeError
|
If the transformer type does not match the loaded config. |
Changed in version v1.3.0: Added support for multilayer estimators.
Changed in version v1.10.0: Remove non-causal mask support via `use_causal_mask`.
Changed in version v1.13.0: Added support for SGT4Text explicitly setting attention implementation.
Changed in version v2.22.0: Removed deepspeed mixture of experts support from transformer cloak.
Changed in version v3.1.0: Added `trust_remote_code` parameter to allow deserialization of third party tokenizers and models.