Skip to content

transformer_cloak

SupportedStdLossType

Bases: Enum

The types of noise losses that can be computed.

Attributes:

Name Type Description
LOG_MEAN_TOKEN_NORMALIZED

Log of the mean of the standard deviations, normalized by the number of tokens.

LOG_MEAN_SAMPLE_NORMALIZED

Log of the mean of the standard deviations, normalized by the number of samples.

MEAN_LOG_SAMPLE_NORMALIZED

Mean of log with equal sample weights.

MEAN_LOG_TOKEN_NORMALIZED

Mean of log with equal token weights.

Deprecated since version 0.105.0. The std_loss_type argument is deprecated and no longer has any effect. This class only still exists for backwards compatibility with older model checkpoints and will be removed in a future version.

TransformerBlockEstimator

Bases: Module, Generic[TransformerT]

Transformer based parameter model that uses a single layer of a transformer, followed by a linear layer with dropout.

__init__

__init__(transformer_type: type[TransformerT], config_path: str, initial: float = 0.0, dropout: float = 0.1, directly_learn_stds: bool = False, use_causal_mask: bool = True, **kwargs: Any) -> None

Initialize the transformer parameter model.

Parameters:

Name Type Description Default
transformer_type type[TransformerT]

The type of the transformer to build a single layer estimator from.

required
config_path str

The path to the transformers config.

required
initial float

Initial value that the model needs to output.

0.0
dropout float

Dropout ratio.

0.1
directly_learn_stds bool

Whether or not the rhos estimator is used to learn rhos (values in R) or standard deviations directly

False
use_causal_mask bool

Whether to use a causal or a non-causal attention mask in the llama estimator.

True
kwargs Any

The keyword arguments to pass to the transformers parameter model.

required

Raises:

Type Description
ValueError

If the pytorch version is not 2 or greater when using non-causal masks.

Changed in version 0.85.0: Passing param path to transformer cloak was highly error prone and unreasonable for the typical user.

forward

forward(*args: Any, **kwargs: Any) -> torch.Tensor

Compose the transformer block with a dropout and a linear adapter layer.

Parameters:

Name Type Description Default
*args Any

Positional arguments to the transformer model.

required
**kwargs Any

Keyword arguments to the transformer model.

required

Returns:

Type Description
torch.Tensor

The output of the transformer parameter model.

Changed in version 0.75.1: The noise mask should always be non-None when using TransformerCloak.

TransformerCloak

Bases: BaseNoiseLayer[TransformerBlockEstimator[TransformerT], Union[CloakStandardDeviationParameterization, DirectStandardDeviationParameterization], Optional[PercentMasker]]

Stained Glass Transform that uses a single Transformer layer to estimate features of the input.

input_shape property

input_shape: tuple[int, ...]

The shape of the expected input including its batch dimension.

mask property writable

mask: Tensor | None

The mask to apply calculated from parameters of the stochastic transformation computed during the most recent call to forward.

mean property writable

mean: Tensor

The means of the stochastic transformation computed during the most recent call to forward.

std property writable

std: Tensor

The standard deviations of the stochastic transformation computed during the most recent call to forward.

__call__

__call__(input: Tensor, noise_mask: Tensor | None = None, **kwargs: Any) -> NoiseLayerOutput

Stochastically transform the input.

Parameters:

Name Type Description Default
input Tensor

The input to transform.

required
noise_mask Tensor | None

An optional mask that selects the elements of input to transform. Where the mask is False, the original input value is returned. Also used to select the elements of the sampled standard deviations to use to mask the input. If None, the entire input is transformed.

None
**kwargs Any

Additional keyword arguments to the estimator modules.

required

__getstate__

__getstate__() -> dict[str, Any]

Prepare a serializable copy of self.__dict__.

__init__

__init__(input_shape: tuple[int, ...], scale: tuple[float, float], transformer_type: type[TransformerT], config_path: str, percent_to_mask: float | Tensor | None = None, shallow: float = 1.0, seed: int | None = None, rho_init: float = -3.0, std_dropout: float = 0.0, mean_dropout: float = 0.0, directly_learn_stds: bool = False, mean_num_experts: int = 0, std_num_experts: int = 0, use_causal_mask: bool = True, **kwargs: Any) -> None

Initialize the layer.

Parameters:

Name Type Description Default
input_shape tuple[int, ...]

The shape of the input tensor.

required
scale tuple[float, float]

The range of standard deviations of the noise.

required
transformer_type type[TransformerT]

The type of the transformer to build a single layer estimator from.

required
config_path str

Path to transformer config.

required
percent_to_mask float | Tensor | None

The percentage of the input to mask.

None
shallow float

A fixed temperature like parameter which alters the scale of the standard deviation of the noise.

1.0
seed int | None

Seed for the random number generator used to generate noise.

None
rho_init float

Initial values for rhos.

-3.0
std_dropout float

Dropout ratio for std parameter model.

0.0
mean_dropout float

Dropout ratio for mean parameter model.

0.0
directly_learn_stds bool

Whether or not the rhos estimator is used to learn rhos (values in R) or standard deviations directly (values in R^+).

False
mean_num_experts int

The number of experts to use for the multilayer perceptron after the attention layer for mean_estimator. The value zero corresponds to not using mixture of experts.

0
std_num_experts int

The number of experts to use for the multilayer perceptron after the attention layer for std_estimator. The value zero corresponds to not using mixture of experts.

0
use_causal_mask bool

Whether to use a causal or a non-causal attention mask in the llama estimator.

True
kwargs Any

Keyword arguments used to define the transformer parameter models.

required

Raises:

Type Description
ValueError

If shallow is not 1.0 when directly_learn_stds is True.

ValueError

If rho_init is not 0.0 when directly_learn_stds is True.

Changed in version 0.85.0: Passing param path to transformer cloak was highly error prone and unreasonable for the typical user.

Changed in version 0.105.0: The std_loss_type argument is deprecated and no longer has any effect.

__init_subclass__

__init_subclass__() -> None

Set the default dtype to torch.float32 inside all subclass __init__ methods.

__setstate__

__setstate__(state: dict[str, Any]) -> None

Restore from a serialized copy of self.__dict__.

forward

forward(input: Tensor, noise_mask: Tensor | None = None, **kwargs: Any) -> base.NoiseLayerOutput

Transform the input data.

Parameters:

Name Type Description Default
input Tensor

The input to transform.

required
noise_mask Tensor | None

An optional mask that selects the elements of input to transform. Where the mask is 0, the original input value is returned. Also used to select the elements of the sampled standard deviations to use to mask the input. If None, the entire input is transformed.

None
**kwargs Any

Additional keyword arguments to the estimator modules.

required

Returns:

Type Description
base.NoiseLayerOutput

The transformed input data.

Raises:

Type Description
ValueError

If the noise mask is None.

Changed in version 0.74.0: The `noise_token_mask` was renamed to `noise_mask` to create a uniform interface everywhere.

get_applied_transform_components_factory

get_applied_transform_components_factory() -> Callable[[], dict[str, torch.Tensor]]

Create a function that returns the elements of the transform components ('mean' and 'std') applied during the most recent forward pass.

Specifically, the applied elements are those selected by the noise mask (if supplied) and standard deviation mask (if std_estimator.masker is not None). If no masks are used, all elements are returned.

The applied transform components are returned flattened.

This function is intended to be used to log histograms of the transform components.

Returns:

Type Description
Callable[[], dict[str, torch.Tensor]]

A function that returns the the elements of the transform components applied during the most recent forward pass.

Examples:

>>> from torch import nn
>>> from stainedglass_core import model as sg_model, noise_layer as sg_noise_layer
>>> base_model = nn.Linear(20, 2)
>>> noisy_model = sg_model.NoisyModel(
...     sg_noise_layer.CloakNoiseLayer1,
...     base_model,
...     input_shape=(-1, 20),
... )
>>> get_applied_transform_components = (
...     noisy_model.noise_layer.get_applied_transform_components_factory()
... )
>>> input = torch.ones(1, 20)
>>> noise_mask = torch.tensor(5 * [False] + 15 * [True])
>>> output = base_model(input, noise_mask=noise_mask)
>>> applied_transform_components = get_applied_transform_components()
>>> applied_transform_components
{'mean': tensor(...), 'std': tensor(...)}
>>> {
...     component_name: component.shape
...     for component_name, component in applied_transform_components.items()
... }
{'mean': torch.Size([15]), 'std': torch.Size([15])}

get_transformed_output_factory

get_transformed_output_factory() -> Callable[[], torch.Tensor]

Create a function that returns the transformed output from the most recent forward pass.

If super batching is active, only the transformed half of the super batch output is returned.

Returns:

Type Description
Callable[[], torch.Tensor]

A function that returns the transformed output from the most recent forward pass.

Examples:

>>> from stainedglass_core import noise_layer as sg_noise_layer
>>> noise_layer = sg_noise_layer.CloakNoiseLayer1(input_shape=(-1, 3, 32, 32))
>>> get_transformed_output = noise_layer.get_transformed_output_factory()
>>> input = torch.ones(2, 3, 32, 32)
>>> output = noise_layer(input)
>>> transformed_output = get_transformed_output()
>>> assert output.output.equal(transformed_output)

initial_seed

initial_seed() -> int

Return the initial seed of the CPU device's random number generator.

manual_seed

manual_seed(seed: int) -> None

Seed each of the random number generators.

Parameters:

Name Type Description Default
seed int

The seed to set.

required

masked_select

masked_select(noise_mask: Tensor, cloak_mask: Tensor | None) -> None

Compute masked std and mean values for proper logging.

Note

The std and mean are flattened by this operation.

Parameters:

Name Type Description Default
noise_mask Tensor

The mask to apply.

required
cloak_mask Tensor | None

The mask on the feature embeddings.

required

Changed in version 0.74.0: The `noise_token_mask` was renamed to `noise_mask` to create a uniform interface everywhere.

seed

seed() -> None

Seed each of the random number generators using a non-deterministic random number.

transformer_parameter_model

transformer_parameter_model(transformer_type: type[TransformerT], config_path: str, **kwargs: Any) -> TransformerT

Create a single block of a transformers.PreTrainedModel and loads the weights from the parameter path.

Parameters:

Name Type Description Default
transformer_type type[TransformerT]

The type of the transformer to use to construct the transformer parameter model.

required
config_path str

Path to transformer config.

required
**kwargs Any

The keyword arguments to pass to transformers.PreTrainedModel.from_pretrained.

required

Returns:

Type Description
TransformerT

A transformer that can be used to estimate rhos/locs.

Changed in version 0.85.0: Passing param path to transformer cloak was highly error prone and unreasonable for the typical user.