Skip to content

data_collator

Classes:

Name Description
DataCollatorForStainedGlassSeq2Seq

Collates batches of sequences for training a sequence-to-sequence model with StainedGlass.

DataCollatorForStainedGlassSeq2Seq dataclass

Collates batches of sequences for training a sequence-to-sequence model with StainedGlass.

Added in version 0.84.0.

Methods:

Name Description
pad

Pack a list or tuple of variable length tensors into a single 2D tensor, padding with the given value.

Attributes:

Name Type Description
max_length int | None

The length to truncate the sequences to. If None, will pad to the maximum length sequence. If pad_to_multiple_of is set, must

pad_to_multiple_of int | None

If set, will pad the sequences to a multiple of the provided value. This is especially useful to enable the use of Tensor Cores on

tokenizer PreTrainedTokenizerBase

The tokenizer to use to configure padding and truncation. Important attributes are pad_token_id, padding_side, and

max_length class-attribute instance-attribute

max_length: int | None = None

The length to truncate the sequences to. If None, will pad to the maximum length sequence. If pad_to_multiple_of is set, must be a multiple of that value.

pad_to_multiple_of class-attribute instance-attribute

pad_to_multiple_of: int | None = None

If set, will pad the sequences to a multiple of the provided value. This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5 (Volta).

Workloads must use mixed precision to take advantage of Tensor Cores. Due to their design, Tensor Cores have shape constraints on their inputs. In practice, for mixed precision training, NVIDIA's recommendations are:

  1. Choose mini-batch to be a multiple of 8
  2. Choose linear layer dimensions to be a multiple of 8
  3. Choose convolution layer channel counts to be a multiple of 8
  4. For classification problems, pad vocabulary to be a multiple of 8
  5. For sequence problems, pad the sequence length to be a multiple of 8

See: https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html#tensor-core-shape.

tokenizer instance-attribute

The tokenizer to use to configure padding and truncation. Important attributes are pad_token_id, padding_side, and truncation_side.

pad

pad(
    sequences: list[Tensor] | tuple[Tensor, ...],
    padding_value: float,
) -> torch.Tensor

Pack a list or tuple of variable length tensors into a single 2D tensor, padding with the given value.

Parameters:

Name Type Description Default

sequences

list[Tensor] | tuple[Tensor, ...]

The sequences to pad.

required

padding_value

float

The value to use for padding.

required

Returns:

Type Description
torch.Tensor

A 2D tensor of padded sequences.

TestInputWithAttentionMask

Bases: TestInput[ContainerT]

Input for LlamaForCausalLM testing with attention_mask.

Attributes:

Name Type Description
attention_mask ContainerT

The mask that dictates which tokens in input_ids to attend to.

input_ids ContainerT

The input token ids.

labels ContainerT

The expected model response to the input_ids. When pretraining, the input_ids are used as the labels.

attention_mask instance-attribute

attention_mask: ContainerT

The mask that dictates which tokens in input_ids to attend to.

input_ids instance-attribute

input_ids: ContainerT

The input token ids.

labels instance-attribute

labels: ContainerT

The expected model response to the input_ids. When pretraining, the input_ids are used as the labels.

TrainInputWithAttentionMask

Bases: TrainInput[ContainerT]

Input for transformers.PreTrainedModel training with attention_mask.

Attributes:

Name Type Description
attention_mask ContainerT

The mask that dictates which tokens in input_ids to attend to.

input_ids ContainerT

The input token ids.

attention_mask instance-attribute

attention_mask: ContainerT

The mask that dictates which tokens in input_ids to attend to.

input_ids instance-attribute

input_ids: ContainerT

The input token ids.

TransformLayerTestInputWithAttentionMask

Bases: TransformLayerTestInput[ContainerT]

Input for InstructionTransformLayer testing with attention_mask.

Attributes:

Name Type Description
attention_mask ContainerT

The mask that dictates which tokens in input_ids to attend to.

input_ids ContainerT

The input token ids.

labels ContainerT

The expected model response to the input_ids. When pretraining, the input_ids are used as the labels.

noise_mask ContainerT

The mask that dictates which tokens in input_ids to obfuscate.

attention_mask instance-attribute

attention_mask: ContainerT

The mask that dictates which tokens in input_ids to attend to.

input_ids instance-attribute

input_ids: ContainerT

The input token ids.

labels instance-attribute

labels: ContainerT

The expected model response to the input_ids. When pretraining, the input_ids are used as the labels.

noise_mask instance-attribute

noise_mask: ContainerT

The mask that dictates which tokens in input_ids to obfuscate.

TransformLayerTrainInputWithAttentionMask

Bases: TransformLayerTrainInput[ContainerT]

Input for TransformLayer training with attention_mask.

Attributes:

Name Type Description
attention_mask ContainerT

The mask that dictates which tokens in input_ids to attend to.

input_ids ContainerT

The input token ids.

loss_mask ContainerT

The mask that dictates which tokens in input_ids to use to calculate the loss.

noise_mask ContainerT

The mask that dictates which tokens in input_ids to obfuscate.

attention_mask instance-attribute

attention_mask: ContainerT

The mask that dictates which tokens in input_ids to attend to.

input_ids instance-attribute

input_ids: ContainerT

The input token ids.

loss_mask instance-attribute

loss_mask: ContainerT

The mask that dictates which tokens in input_ids to use to calculate the loss.

noise_mask instance-attribute

noise_mask: ContainerT

The mask that dictates which tokens in input_ids to obfuscate.