data_collator

Classes:

Name	Description
`DataCollatorForStainedGlassSeq2Seq`	Collates batches of sequences for training a sequence-to-sequence model with StainedGlass.

DataCollatorForStainedGlassSeq2Seq `dataclass` ¶

Collates batches of sequences for training a sequence-to-sequence model with StainedGlass.

Added in version 0.84.0.

Methods:

Name	Description
`pad`	Pack a list or tuple of variable length tensors into a single 2D tensor, padding with the given value.

Attributes:

Name	Type	Description
`max_length`	`int \| None`	The length to truncate the sequences to. If `None`, will pad to the maximum length sequence. If `pad_to_multiple_of` is set, must
`pad_to_multiple_of`	`int \| None`	If set, will pad the sequences to a multiple of the provided value. This is especially useful to enable the use of Tensor Cores on
`tokenizer`	`PreTrainedTokenizerBase`	The tokenizer to use to configure padding and truncation. Important attributes are `pad_token_id`, `padding_side`, and

max_length `class-attribute` `instance-attribute` ¶

max_length: int | None = None

The length to truncate the sequences to. If None, will pad to the maximum length sequence. If pad_to_multiple_of is set, must be a multiple of that value.

pad_to_multiple_of `class-attribute` `instance-attribute` ¶

pad_to_multiple_of: int | None = None

If set, will pad the sequences to a multiple of the provided value. This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5 (Volta).

Workloads must use mixed precision to take advantage of Tensor Cores. Due to their design, Tensor Cores have shape constraints on their inputs. In practice, for mixed precision training, NVIDIA's recommendations are:

Choose mini-batch to be a multiple of 8
Choose linear layer dimensions to be a multiple of 8
Choose convolution layer channel counts to be a multiple of 8
For classification problems, pad vocabulary to be a multiple of 8
For sequence problems, pad the sequence length to be a multiple of 8

See: https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html#tensor-core-shape.

tokenizer `instance-attribute` ¶

tokenizer: PreTrainedTokenizerBase

The tokenizer to use to configure padding and truncation. Important attributes are pad_token_id, padding_side, and truncation_side.

pad ¶

pad(
    sequences: list[Tensor] | tuple[Tensor, ...],
    padding_value: float,
) -> torch.Tensor

Pack a list or tuple of variable length tensors into a single 2D tensor, padding with the given value.

Parameters:

Name	Type	Description	Default
`sequences` ¶	`list[Tensor] \| tuple[Tensor, ...]`	The sequences to pad.	required
`padding_value` ¶	`float`	The value to use for padding.	required

Returns:

Type	Description
`torch.Tensor`	A 2D tensor of padded sequences.

TestInputWithAttentionMask ¶

Bases: TestInput[ContainerT]

Input for LlamaForCausalLM testing with attention_mask.

Attributes:

Name	Type	Description
`attention_mask`	`ContainerT`	The mask that dictates which tokens in `input_ids` to attend to.
`input_ids`	`ContainerT`	The input token ids.
`labels`	`ContainerT`	The expected model response to the `input_ids`. When pretraining, the `input_ids` are used as the labels.

attention_mask `instance-attribute` ¶

attention_mask: ContainerT

The mask that dictates which tokens in input_ids to attend to.

input_ids `instance-attribute` ¶

input_ids: ContainerT

The input token ids.

labels `instance-attribute` ¶

labels: ContainerT

The expected model response to the input_ids. When pretraining, the input_ids are used as the labels.

TrainInputWithAttentionMask ¶

Bases: TrainInput[ContainerT]

Input for transformers.PreTrainedModel training with attention_mask.

Attributes:

Name	Type	Description
`attention_mask`	`ContainerT`	The mask that dictates which tokens in `input_ids` to attend to.
`input_ids`	`ContainerT`	The input token ids.

attention_mask `instance-attribute` ¶

attention_mask: ContainerT

The mask that dictates which tokens in input_ids to attend to.

input_ids `instance-attribute` ¶

input_ids: ContainerT

The input token ids.

TransformLayerTestInputWithAttentionMask ¶

Bases: TransformLayerTestInput[ContainerT]

Input for InstructionTransformLayer testing with attention_mask.

Attributes:

Name	Type	Description
`attention_mask`	`ContainerT`	The mask that dictates which tokens in `input_ids` to attend to.
`input_ids`	`ContainerT`	The input token ids.
`labels`	`ContainerT`	The expected model response to the `input_ids`. When pretraining, the `input_ids` are used as the labels.
`noise_mask`	`ContainerT`	The mask that dictates which tokens in `input_ids` to obfuscate.

attention_mask `instance-attribute` ¶

attention_mask: ContainerT

The mask that dictates which tokens in input_ids to attend to.

input_ids `instance-attribute` ¶

input_ids: ContainerT

The input token ids.

labels `instance-attribute` ¶

labels: ContainerT

The expected model response to the input_ids. When pretraining, the input_ids are used as the labels.

noise_mask `instance-attribute` ¶

noise_mask: ContainerT

The mask that dictates which tokens in input_ids to obfuscate.

TransformLayerTrainInputWithAttentionMask ¶

Bases: TransformLayerTrainInput[ContainerT]

Input for TransformLayer training with attention_mask.

Attributes:

Name	Type	Description
`attention_mask`	`ContainerT`	The mask that dictates which tokens in `input_ids` to attend to.
`input_ids`	`ContainerT`	The input token ids.
`loss_mask`	`ContainerT`	The mask that dictates which tokens in `input_ids` to use to calculate the loss.
`noise_mask`	`ContainerT`	The mask that dictates which tokens in `input_ids` to obfuscate.

attention_mask `instance-attribute` ¶

attention_mask: ContainerT

The mask that dictates which tokens in input_ids to attend to.

input_ids `instance-attribute` ¶

input_ids: ContainerT

The input token ids.

loss_mask `instance-attribute` ¶

loss_mask: ContainerT

The mask that dictates which tokens in input_ids to use to calculate the loss.

noise_mask `instance-attribute` ¶

noise_mask: ContainerT

The mask that dictates which tokens in input_ids to obfuscate.

data_collator

DataCollatorForStainedGlassSeq2Seq `dataclass` ¶

max_length `class-attribute` `instance-attribute` ¶

pad_to_multiple_of `class-attribute` `instance-attribute` ¶

tokenizer `instance-attribute` ¶

pad ¶

`sequences` ¶

`padding_value` ¶

TestInputWithAttentionMask ¶

attention_mask `instance-attribute` ¶

input_ids `instance-attribute` ¶

labels `instance-attribute` ¶

TrainInputWithAttentionMask ¶

attention_mask `instance-attribute` ¶

input_ids `instance-attribute` ¶

TransformLayerTestInputWithAttentionMask ¶

attention_mask `instance-attribute` ¶

input_ids `instance-attribute` ¶

labels `instance-attribute` ¶

noise_mask `instance-attribute` ¶

TransformLayerTrainInputWithAttentionMask ¶

attention_mask `instance-attribute` ¶

input_ids `instance-attribute` ¶

loss_mask `instance-attribute` ¶

noise_mask `instance-attribute` ¶

data_collator

DataCollatorForStainedGlassSeq2Seq dataclass ¶

max_length class-attribute instance-attribute ¶

pad_to_multiple_of class-attribute instance-attribute ¶

tokenizer instance-attribute ¶

pad ¶

sequences ¶

padding_value ¶

TestInputWithAttentionMask ¶

attention_mask instance-attribute ¶

input_ids instance-attribute ¶

labels instance-attribute ¶

TrainInputWithAttentionMask ¶

attention_mask instance-attribute ¶

input_ids instance-attribute ¶

TransformLayerTestInputWithAttentionMask ¶

attention_mask instance-attribute ¶

input_ids instance-attribute ¶

labels instance-attribute ¶

noise_mask instance-attribute ¶

TransformLayerTrainInputWithAttentionMask ¶

attention_mask instance-attribute ¶

input_ids instance-attribute ¶

loss_mask instance-attribute ¶

noise_mask instance-attribute ¶

DataCollatorForStainedGlassSeq2Seq `dataclass` ¶

max_length `class-attribute` `instance-attribute` ¶

pad_to_multiple_of `class-attribute` `instance-attribute` ¶

tokenizer `instance-attribute` ¶

`sequences` ¶

`padding_value` ¶

attention_mask `instance-attribute` ¶

input_ids `instance-attribute` ¶

labels `instance-attribute` ¶

attention_mask `instance-attribute` ¶

input_ids `instance-attribute` ¶

attention_mask `instance-attribute` ¶

input_ids `instance-attribute` ¶

labels `instance-attribute` ¶

noise_mask `instance-attribute` ¶

attention_mask `instance-attribute` ¶

input_ids `instance-attribute` ¶

loss_mask `instance-attribute` ¶

noise_mask `instance-attribute` ¶