data_collator
Classes:
Name | Description |
---|---|
DataCollatorForStainedGlassSeq2Seq |
Collates batches of sequences for training a sequence-to-sequence model with StainedGlass. |
DataCollatorForStainedGlassSeq2Seq
dataclass
¶
Collates batches of sequences for training a sequence-to-sequence model with StainedGlass.
Added in version 0.84.0.
Methods:
Name | Description |
---|---|
pad |
Pack a list or tuple of variable length tensors into a single 2D tensor, padding with the given value. |
Attributes:
Name | Type | Description |
---|---|---|
max_length |
int | None
|
The length to truncate the sequences to. If |
pad_to_multiple_of |
int | None
|
If set, will pad the sequences to a multiple of the provided value. This is especially useful to enable the use of Tensor Cores on |
tokenizer |
PreTrainedTokenizerBase
|
The tokenizer to use to configure padding and truncation. Important attributes are |
max_length
class-attribute
instance-attribute
¶
max_length: int | None = None
The length to truncate the sequences to. If None
, will pad to the maximum length sequence. If pad_to_multiple_of
is set, must
be a multiple of that value.
pad_to_multiple_of
class-attribute
instance-attribute
¶
pad_to_multiple_of: int | None = None
If set, will pad the sequences to a multiple of the provided value. This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5 (Volta).
Workloads must use mixed precision to take advantage of Tensor Cores. Due to their design, Tensor Cores have shape constraints on their inputs. In practice, for mixed precision training, NVIDIA's recommendations are:
- Choose mini-batch to be a multiple of 8
- Choose linear layer dimensions to be a multiple of 8
- Choose convolution layer channel counts to be a multiple of 8
- For classification problems, pad vocabulary to be a multiple of 8
- For sequence problems, pad the sequence length to be a multiple of 8
See: https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html#tensor-core-shape.
tokenizer
instance-attribute
¶
tokenizer: PreTrainedTokenizerBase
The tokenizer to use to configure padding and truncation. Important attributes are pad_token_id
, padding_side
, and
truncation_side
.
pad
¶
Pack a list or tuple of variable length tensors into a single 2D tensor, padding with the given value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
list[Tensor] | tuple[Tensor, ...]
|
The sequences to pad. |
required |
|
float
|
The value to use for padding. |
required |
Returns:
Type | Description |
---|---|
torch.Tensor
|
A 2D tensor of padded sequences. |
TestInputWithAttentionMask
¶
Bases: TestInput[ContainerT]
Input for LlamaForCausalLM
testing with attention_mask
.
Attributes:
Name | Type | Description |
---|---|---|
attention_mask |
ContainerT
|
The mask that dictates which tokens in |
input_ids |
ContainerT
|
The input token ids. |
labels |
ContainerT
|
The expected model response to the |
TrainInputWithAttentionMask
¶
Bases: TrainInput[ContainerT]
Input for transformers.PreTrainedModel training with attention_mask
.
Attributes:
Name | Type | Description |
---|---|---|
attention_mask |
ContainerT
|
The mask that dictates which tokens in |
input_ids |
ContainerT
|
The input token ids. |
TransformLayerTestInputWithAttentionMask
¶
Bases: TransformLayerTestInput[ContainerT]
Input for InstructionTransformLayer
testing with attention_mask
.
Attributes:
Name | Type | Description |
---|---|---|
attention_mask |
ContainerT
|
The mask that dictates which tokens in |
input_ids |
ContainerT
|
The input token ids. |
labels |
ContainerT
|
The expected model response to the |
noise_mask |
ContainerT
|
The mask that dictates which tokens in |
attention_mask
instance-attribute
¶
The mask that dictates which tokens in input_ids
to attend to.
labels
instance-attribute
¶
The expected model response to the input_ids
. When pretraining, the input_ids
are used as the labels.
noise_mask
instance-attribute
¶
The mask that dictates which tokens in input_ids
to obfuscate.
TransformLayerTrainInputWithAttentionMask
¶
Bases: TransformLayerTrainInput[ContainerT]
Input for TransformLayer
training with attention_mask
.
Attributes:
Name | Type | Description |
---|---|---|
attention_mask |
ContainerT
|
The mask that dictates which tokens in |
input_ids |
ContainerT
|
The input token ids. |
loss_mask |
ContainerT
|
The mask that dictates which tokens in |
noise_mask |
ContainerT
|
The mask that dictates which tokens in |
attention_mask
instance-attribute
¶
The mask that dictates which tokens in input_ids
to attend to.
loss_mask
instance-attribute
¶
The mask that dictates which tokens in input_ids
to use to calculate the loss.
noise_mask
instance-attribute
¶
The mask that dictates which tokens in input_ids
to obfuscate.