data_collator
Classes:
| Name | Description |
|---|---|
DataCollatorForStainedGlassSeq2Seq |
Collates batches of sequences for training a sequence-to-sequence model with StainedGlass. |
DataCollatorForStainedGlassSeq2Seq
dataclass
¶
Collates batches of sequences for training a sequence-to-sequence model with StainedGlass.
Added in version 0.84.0.
Methods:
| Name | Description |
|---|---|
pad |
Pack a list or tuple of variable length tensors into a single 2D tensor, padding with the given value. |
Attributes:
| Name | Type | Description |
|---|---|---|
max_length |
int | None
|
The length to truncate the sequences to. If |
pad_to_multiple_of |
int | None
|
If set, will pad the sequences to a multiple of the provided value. This is especially useful to enable the use of Tensor Cores on |
tokenizer |
PreTrainedTokenizerBase
|
The tokenizer to use to configure padding and truncation. Important attributes are |
max_length
class-attribute
instance-attribute
¶
max_length: int | None = None
The length to truncate the sequences to. If None, will pad to the maximum length sequence. If pad_to_multiple_of is set, must
be a multiple of that value.
pad_to_multiple_of
class-attribute
instance-attribute
¶
pad_to_multiple_of: int | None = None
If set, will pad the sequences to a multiple of the provided value. This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5 (Volta).
Workloads must use mixed precision to take advantage of Tensor Cores. Due to their design, Tensor Cores have shape constraints on their inputs. In practice, for mixed precision training, NVIDIA's recommendations are:
- Choose mini-batch to be a multiple of 8
- Choose linear layer dimensions to be a multiple of 8
- Choose convolution layer channel counts to be a multiple of 8
- For classification problems, pad vocabulary to be a multiple of 8
- For sequence problems, pad the sequence length to be a multiple of 8
See: https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html#tensor-core-shape.
tokenizer
instance-attribute
¶
tokenizer: PreTrainedTokenizerBase
The tokenizer to use to configure padding and truncation. Important attributes are pad_token_id, padding_side, and
truncation_side.
pad
¶
Pack a list or tuple of variable length tensors into a single 2D tensor, padding with the given value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
list[Tensor] | tuple[Tensor, ...]
|
The sequences to pad. |
required |
|
float
|
The value to use for padding. |
required |
Returns:
| Type | Description |
|---|---|
torch.Tensor
|
A 2D tensor of padded sequences. |
TestInputWithAttentionMask
¶
Bases: TestInput[ContainerT]
Input for LlamaForCausalLM testing with attention_mask.
Attributes:
| Name | Type | Description |
|---|---|---|
attention_mask |
ContainerT
|
The mask that dictates which tokens in |
input_ids |
ContainerT
|
The input token ids. |
labels |
ContainerT
|
The expected model response to the |
TrainInputWithAttentionMask
¶
Bases: TrainInput[ContainerT]
Input for transformers.PreTrainedModel training with attention_mask.
Attributes:
| Name | Type | Description |
|---|---|---|
attention_mask |
ContainerT
|
The mask that dictates which tokens in |
input_ids |
ContainerT
|
The input token ids. |
TransformLayerTestInputWithAttentionMask
¶
Bases: TransformLayerTestInput[ContainerT]
Input for InstructionTransformLayer testing with attention_mask.
Attributes:
| Name | Type | Description |
|---|---|---|
attention_mask |
ContainerT
|
The mask that dictates which tokens in |
input_ids |
ContainerT
|
The input token ids. |
labels |
ContainerT
|
The expected model response to the |
noise_mask |
ContainerT
|
The mask that dictates which tokens in |
attention_mask
instance-attribute
¶
The mask that dictates which tokens in input_ids to attend to.
labels
instance-attribute
¶
The expected model response to the input_ids. When pretraining, the input_ids are used as the labels.
noise_mask
instance-attribute
¶
The mask that dictates which tokens in input_ids to obfuscate.
TransformLayerTrainInputWithAttentionMask
¶
Bases: TransformLayerTrainInput[ContainerT]
Input for TransformLayer training with attention_mask.
Attributes:
| Name | Type | Description |
|---|---|---|
attention_mask |
ContainerT
|
The mask that dictates which tokens in |
input_ids |
ContainerT
|
The input token ids. |
loss_mask |
ContainerT
|
The mask that dictates which tokens in |
noise_mask |
ContainerT
|
The mask that dictates which tokens in |
attention_mask
instance-attribute
¶
The mask that dictates which tokens in input_ids to attend to.
loss_mask
instance-attribute
¶
The mask that dictates which tokens in input_ids to use to calculate the loss.
noise_mask
instance-attribute
¶
The mask that dictates which tokens in input_ids to obfuscate.