transformer_cloak

SupportedStdLossType ¶

Bases: Enum

The types of noise losses that can be computed.

Attributes:

Name	Type	Description
`LOG_MEAN_TOKEN_NORMALIZED`		Log of the mean of the standard deviations, normalized by the number of tokens.
`LOG_MEAN_SAMPLE_NORMALIZED`		Log of the mean of the standard deviations, normalized by the number of samples.
`MEAN_LOG_SAMPLE_NORMALIZED`		Mean of log with equal sample weights.
`MEAN_LOG_TOKEN_NORMALIZED`		Mean of log with equal token weights.

Deprecated since version 0.105.0. The std_loss_type argument is deprecated and no longer has any effect. This class only still exists for backwards compatibility with older model checkpoints and will be removed in a future version.

TransformerBlockEstimator ¶

Bases: Module, Generic[TransformerT]

Transformer based parameter model that uses a single layer of a transformer, followed by a linear layer with dropout.

init ¶

__init__(transformer_type: type[TransformerT], config_path: str, initial: float = 0.0, dropout: float = 0.1, directly_learn_stds: bool = False, use_causal_mask: bool = True, **kwargs: Any) -> None

Initialize the transformer parameter model.

Parameters:

Name	Type	Description	Default
`transformer_type`	`type[TransformerT]`	The type of the transformer to build a single layer estimator from.	required
`config_path`	`str`	The path to the transformers config.	required
`initial`	`float`	Initial value that the model needs to output.	`0.0`
`dropout`	`float`	Dropout ratio.	`0.1`
`directly_learn_stds`	`bool`	Whether or not the rhos estimator is used to learn rhos (values in R) or standard deviations directly	`False`
`use_causal_mask`	`bool`	Whether to use a causal or a non-causal attention mask in the llama estimator.	`True`
`kwargs`	`Any`	The keyword arguments to pass to the transformers parameter model.	required

Raises:

Type	Description
`ValueError`	If the pytorch version is not 2 or greater when using non-causal masks.

Changed in version 0.85.0: Passing param path to transformer cloak was highly error prone and unreasonable for the typical user.

forward ¶

forward(*args: Any, **kwargs: Any) -> torch.Tensor

Compose the transformer block with a dropout and a linear adapter layer.

Parameters:

Name	Type	Description	Default
`*args`	`Any`	Positional arguments to the transformer model.	required
`**kwargs`	`Any`	Keyword arguments to the transformer model.	required

Returns:

Type	Description
`torch.Tensor`	The output of the transformer parameter model.

Changed in version 0.75.1: The noise mask should always be non-None when using TransformerCloak.

TransformerCloak ¶

Bases: BaseNoiseLayer[TransformerBlockEstimator[TransformerT], Union[CloakStandardDeviationParameterization, DirectStandardDeviationParameterization], Optional[PercentMasker]]

Stained Glass Transform that uses a single Transformer layer to estimate features of the input.

input_shape `property` ¶

input_shape: tuple[int, ...]

The shape of the expected input including its batch dimension.

mask `property` `writable` ¶

mask: Tensor | None

The mask to apply calculated from parameters of the stochastic transformation computed during the most recent call to forward.

mean `property` `writable` ¶

mean: Tensor

The means of the stochastic transformation computed during the most recent call to forward.

std `property` `writable` ¶

std: Tensor

The standard deviations of the stochastic transformation computed during the most recent call to forward.

call ¶

__call__(input: Tensor, noise_mask: Tensor | None = None, **kwargs: Any) -> NoiseLayerOutput

Stochastically transform the input.

Parameters:

Name	Type	Description	Default
`input`	`Tensor`	The input to transform.	required
`noise_mask`	`Tensor \| None`	An optional mask that selects the elements of `input` to transform. Where the mask is `False`, the original `input` value is returned. Also used to select the elements of the sampled standard deviations to use to mask the `input`. If `None`, the entire `input` is transformed.	`None`
`**kwargs`	`Any`	Additional keyword arguments to the estimator modules.	required

getstate ¶

__getstate__() -> dict[str, Any]

Prepare a serializable copy of self.__dict__.

init ¶

__init__(input_shape: tuple[int, ...], scale: tuple[float, float], transformer_type: type[TransformerT], config_path: str, percent_to_mask: float | Tensor | None = None, shallow: float = 1.0, seed: int | None = None, rho_init: float = -3.0, std_dropout: float = 0.0, mean_dropout: float = 0.0, directly_learn_stds: bool = False, mean_num_experts: int = 0, std_num_experts: int = 0, use_causal_mask: bool = True, **kwargs: Any) -> None

Initialize the layer.

Parameters:

Name	Type	Description	Default
`input_shape`	`tuple[int, ...]`	The shape of the input tensor.	required
`scale`	`tuple[float, float]`	The range of standard deviations of the noise.	required
`transformer_type`	`type[TransformerT]`	The type of the transformer to build a single layer estimator from.	required
`config_path`	`str`	Path to transformer config.	required
`percent_to_mask`	`float \| Tensor \| None`	The percentage of the input to mask.	`None`
`shallow`	`float`	A fixed temperature like parameter which alters the scale of the standard deviation of the noise.	`1.0`
`seed`	`int \| None`	Seed for the random number generator used to generate noise.	`None`
`rho_init`	`float`	Initial values for rhos.	`-3.0`
`std_dropout`	`float`	Dropout ratio for std parameter model.	`0.0`
`mean_dropout`	`float`	Dropout ratio for mean parameter model.	`0.0`
`directly_learn_stds`	`bool`	Whether or not the rhos estimator is used to learn rhos (values in R) or standard deviations directly (values in R^+).	`False`
`mean_num_experts`	`int`	The number of experts to use for the multilayer perceptron after the attention layer for mean_estimator. The value zero corresponds to not using mixture of experts.	`0`
`std_num_experts`	`int`	The number of experts to use for the multilayer perceptron after the attention layer for std_estimator. The value zero corresponds to not using mixture of experts.	`0`
`use_causal_mask`	`bool`	Whether to use a causal or a non-causal attention mask in the llama estimator.	`True`
`kwargs`	`Any`	Keyword arguments used to define the transformer parameter models.	required

Raises:

Type	Description
`ValueError`	If `shallow` is not `1.0` when `directly_learn_stds` is `True`.
`ValueError`	If `rho_init` is not `0.0` when `directly_learn_stds` is `True`.

Changed in version 0.85.0: Passing param path to transformer cloak was highly error prone and unreasonable for the typical user.

Changed in version 0.105.0: The std_loss_type argument is deprecated and no longer has any effect.

__init_subclass__ ¶

__init_subclass__() -> None

Set the default dtype to torch.float32 inside all subclass __init__ methods.

setstate ¶

__setstate__(state: dict[str, Any]) -> None

Restore from a serialized copy of self.__dict__.

forward ¶

forward(input: Tensor, noise_mask: Tensor | None = None, **kwargs: Any) -> base.NoiseLayerOutput

Transform the input data.

Parameters:

Name	Type	Description	Default
`input`	`Tensor`	The input to transform.	required
`noise_mask`	`Tensor \| None`	An optional mask that selects the elements of `input` to transform. Where the mask is `0`, the original `input` value is returned. Also used to select the elements of the sampled standard deviations to use to mask the `input`. If `None`, the entire `input` is transformed.	`None`
`**kwargs`	`Any`	Additional keyword arguments to the estimator modules.	required

Returns:

Type	Description
`base.NoiseLayerOutput`	The transformed input data.

Raises:

Type	Description
`ValueError`	If the noise mask is `None`.

Changed in version 0.74.0: The `noise_token_mask` was renamed to `noise_mask` to create a uniform interface everywhere.

get_applied_transform_components_factory ¶

get_applied_transform_components_factory() -> Callable[[], dict[str, torch.Tensor]]

Create a function that returns the elements of the transform components ('mean' and 'std') applied during the most recent forward pass.

Specifically, the applied elements are those selected by the noise mask (if supplied) and standard deviation mask (if std_estimator.masker is not None). If no masks are used, all elements are returned.

The applied transform components are returned flattened.

This function is intended to be used to log histograms of the transform components.

Returns:

Type	Description
`Callable[[], dict[str, torch.Tensor]]`	A function that returns the the elements of the transform components applied during the most recent forward pass.

Examples:

>>> from torch import nn
>>> from stainedglass_core import model as sg_model, noise_layer as sg_noise_layer
>>> base_model = nn.Linear(20, 2)
>>> noisy_model = sg_model.NoisyModel(
...     sg_noise_layer.CloakNoiseLayer1,
...     base_model,
...     input_shape=(-1, 20),
... )
>>> get_applied_transform_components = (
...     noisy_model.noise_layer.get_applied_transform_components_factory()
... )
>>> input = torch.ones(1, 20)
>>> noise_mask = torch.tensor(5 * [False] + 15 * [True])
>>> output = base_model(input, noise_mask=noise_mask)
>>> applied_transform_components = get_applied_transform_components()
>>> applied_transform_components
{'mean': tensor(...), 'std': tensor(...)}
>>> {
...     component_name: component.shape
...     for component_name, component in applied_transform_components.items()
... }
{'mean': torch.Size([15]), 'std': torch.Size([15])}

get_transformed_output_factory ¶

get_transformed_output_factory() -> Callable[[], torch.Tensor]

Create a function that returns the transformed output from the most recent forward pass.

If super batching is active, only the transformed half of the super batch output is returned.

Returns:

Type	Description
`Callable[[], torch.Tensor]`	A function that returns the transformed output from the most recent forward pass.

Examples:

>>> from stainedglass_core import noise_layer as sg_noise_layer
>>> noise_layer = sg_noise_layer.CloakNoiseLayer1(input_shape=(-1, 3, 32, 32))
>>> get_transformed_output = noise_layer.get_transformed_output_factory()
>>> input = torch.ones(2, 3, 32, 32)
>>> output = noise_layer(input)
>>> transformed_output = get_transformed_output()
>>> assert output.output.equal(transformed_output)

initial_seed ¶

initial_seed() -> int

Return the initial seed of the CPU device's random number generator.

manual_seed ¶

manual_seed(seed: int) -> None

Seed each of the random number generators.

Parameters:

Name	Type	Description	Default
`seed`	`int`	The seed to set.	required

masked_select ¶

masked_select(noise_mask: Tensor, cloak_mask: Tensor | None) -> None

Compute masked std and mean values for proper logging.

Note

The std and mean are flattened by this operation.

Parameters:

Name	Type	Description	Default
`noise_mask`	`Tensor`	The mask to apply.	required
`cloak_mask`	`Tensor \| None`	The mask on the feature embeddings.	required

Changed in version 0.74.0: The `noise_token_mask` was renamed to `noise_mask` to create a uniform interface everywhere.

seed ¶

seed() -> None

Seed each of the random number generators using a non-deterministic random number.

transformer_parameter_model ¶

transformer_parameter_model(transformer_type: type[TransformerT], config_path: str, **kwargs: Any) -> TransformerT

Create a single block of a transformers.PreTrainedModel and loads the weights from the parameter path.

Parameters:

Name	Type	Description	Default
`transformer_type`	`type[TransformerT]`	The type of the transformer to use to construct the transformer parameter model.	required
`config_path`	`str`	Path to transformer config.	required
`**kwargs`	`Any`	The keyword arguments to pass to transformers.PreTrainedModel.from_pretrained.	required

Returns:

Type	Description
`TransformerT`	A transformer that can be used to estimate rhos/locs.

Changed in version 0.85.0: Passing param path to transformer cloak was highly error prone and unreasonable for the typical user.

transformer_cloak

SupportedStdLossType ¶

TransformerBlockEstimator ¶

__init__ ¶

forward ¶

TransformerCloak ¶

input_shape property ¶

mask property writable ¶

mean property writable ¶

std property writable ¶

__call__ ¶

__getstate__ ¶

__init__ ¶

__init_subclass__ ¶

__setstate__ ¶

forward ¶

get_applied_transform_components_factory ¶

get_transformed_output_factory ¶

initial_seed ¶

manual_seed ¶

masked_select ¶

seed ¶

transformer_parameter_model ¶

init ¶

input_shape `property` ¶

mask `property` `writable` ¶

mean `property` `writable` ¶

std `property` `writable` ¶

call ¶

getstate ¶

init ¶

setstate ¶