universal

Model-agnostic Mapper classes (designed to be compatible with datasets.Dataset.map) useful for building LLM prompts for Stained Glass Transform training and testing.

ChatFormatMapper `dataclass` ¶

Builds the tensor components of the transformers.PreTrainedModel chat prompt.

Added in version 0.77.0.

ChatRoleStrings `dataclass` ¶

Role strings of a chat prompt.

Added in version 0.77.0.

ASSISTANT_ROLE `class-attribute` `instance-attribute` ¶

ASSISTANT_ROLE: Final[str] = 'assistant'

The assistant role.

SYSTEM_ROLE `class-attribute` `instance-attribute` ¶

SYSTEM_ROLE: Final[str] = 'system'

The system role.

USER_ROLE `class-attribute` `instance-attribute` ¶

USER_ROLE: Final[str] = 'user'

The user role.

ChatSchemaMapper `dataclass` ¶

Bases: SchemaMapper

Maps samples from an arbitrary dataset to a universal schema for building an LLM chat prompt.

Either define a subclass for easier reuse, or use this directly.

Examples:

>>> sample = {
...     "question": "What is the capital of France?",
...     "response": "Paris",
...     "system_prompt": "Answer the following question:",
... }
>>> mapper = ChatSchemaMapper(
...     instruction_key="question",
...     response_key="response",
...     system_prompt_key="system_prompt",
... )
>>> mapped_sample = mapper(sample)
>>> mapped_sample
[{'role': 'system', 'content': 'Answer the following question:'}, {'role': 'user', 'content': 'What is the capital of France?'}, {'role': 'assistant', 'content': 'Paris'}]

Added in version 0.77.0.

instruction_key `instance-attribute` ¶

instruction_key: str

The dataset key/column corresponding to the input.

response_key `instance-attribute` ¶

response_key: str | None

An optional dataset key/column corresponding to the expected model response to the instruction.

system_prompt_key `instance-attribute` ¶

system_prompt_key: str | None

An optional dataset key/column corresponding to the system prompt for the model.

Schema ¶

Bases: Schema

Universal schema for building an LLM chat prompt.

Added in version 0.77.0.

content `instance-attribute` ¶

content: str

The content of the message.

role `instance-attribute` ¶

role: str

The role of the message.

ChatSpecialStrings `dataclass` ¶

Special string components of a chat prompt.

An instance of this class is expected to be defined for each model to dictate the structure of its prompt.

Added in version 0.77.0.

MESSAGE_END `instance-attribute` ¶

MESSAGE_END: Final[str]

The end of a message.

ROLES `instance-attribute` ¶

ROLES: Final[ChatRoleStrings]

The role strings of a chat prompt.

ROLE_HEADER_END `instance-attribute` ¶

ROLE_HEADER_END: Final[str]

The end of a role header.

ROLE_HEADER_START `instance-attribute` ¶

ROLE_HEADER_START: Final[str]

The start of a role header.

ChatTokenizerMapper `dataclass` ¶

Bases: TokenizerMapper, ABC

Tokenizes and builds the intermediate tensor components of a chat prompt.

Added in version 0.77.0.

special_strings `class-attribute` `instance-attribute` ¶

special_strings: ChatSpecialStrings = field(init=False)

The special prompt strings to use.

special_tokens `class-attribute` `instance-attribute` ¶

special_tokens: SpecialTokens = field(init=False)

The tokenized special prompt strings.

tokenizer `instance-attribute` ¶

tokenizer: PreTrainedTokenizerBase

The LLM tokenizer to use.

PromptTokens ¶

Bases: TypedDict

Collection of all tokenized components of the prompt.

schema_tokens `instance-attribute` ¶

schema_tokens: list[SchemaTokens]

The tokenized schema components of the prompt.

special_tokens `instance-attribute` ¶

special_tokens: SpecialTokens

The tokenized special components of the prompt.

SchemaTokens ¶

Bases: TypedDict

Tokenized intermediate prompt schema.

content `instance-attribute` ¶

content: Tensor

The content of the message.

role `instance-attribute` ¶

role: Tensor

The role of the message.

SpecialTokens ¶

Bases: TypedDict

Tokenized special components of the prompt.

assistant_role `instance-attribute` ¶

assistant_role: Tensor

The assistant role.

bos `instance-attribute` ¶

bos: Tensor

The beginning of string token.

message_end `instance-attribute` ¶

message_end: Tensor

The end of a message.

role_header_end `instance-attribute` ¶

role_header_end: Tensor

The end of the role header.

role_header_start `instance-attribute` ¶

role_header_start: Tensor

The start of the role header.

system_role `instance-attribute` ¶

system_role: Tensor

The system role.

user_role `instance-attribute` ¶

user_role: Tensor

The user role.

tokenize ¶

tokenize(text: str) -> torch.Tensor

Tokenize the text.

Parameters:

Name	Type	Description	Default
`text` ¶	`str`	The text to tokenize.	required

Returns:

Type	Description
`torch.Tensor`	An int64 tensor of token ids.

InstructionFormatMapper `dataclass` ¶

Builds the tensor components of the transformers.PreTrainedModel instruction prompt.

PromptIndices ¶

Bases: TypedDict

Indices of the prompt components in the input_ids tensor.

Can be used to extract the prompt components from the input_ids tensor by slicing along the sequence dimension.

Examples:

Using the PromptIndices to extract the instruction from the input_ids tensor:

>>> mapper = InstructionFormatMapper()
>>> sample: universal.InstructionTokenizerMapper.PromptTokens = {
...     "special_tokens": {
...         "bos": torch.tensor([[1]]),
...         "instruction_start": torch.tensor([[2]]),
...         "system_prompt_start": torch.tensor([[3]]),
...         "system_prompt_end": torch.tensor([[4]]),
...         "context_start": torch.tensor([[5]]),
...         "instruction_end": torch.tensor([[6]]),
...         "eos": torch.tensor([[7]]),
...     },
...     "schema_tokens": {
...         "instruction": torch.tensor([[8, 9, 10, 11, 12]]),
...         "response": torch.tensor([[13, 14, 15]]),
...         "system_prompt": torch.tensor([[16, 17, 18, 19]]),
...         "context": torch.tensor([[20, 21, 22]]),
...     },
... }
>>> formatted_sample = mapper(sample)
>>> torch.testing.assert_close(
...     sample["schema_tokens"]["instruction"],
...     formatted_sample["input_ids"][:, mapper.prompt_indices["instruction"]],
... )

context `instance-attribute` ¶

context: slice

The slice of input_ids containing the context.

instruction `instance-attribute` ¶

instruction: slice

The slice of input_ids containing the instruction.

system_prompt `instance-attribute` ¶

system_prompt: slice

The slice of input_ids containing the system prompt.

InstructionSchemaMapper `dataclass` ¶

Bases: SchemaMapper

Maps samples from an arbitrary dataset to a universal schema for building an LLM instruction prompt.

Either define a subclass for easier reuse, or use this class directly.

Examples:

>>> sample = {
...     "question": "What is the capital of France?",
...     "response": "Paris",
...     "system_prompt": "Answer the following question:",
... }
>>> mapper = InstructionSchemaMapper(
...     instruction_key="question",
...     response_key="response",
...     system_prompt_key="system_prompt",
...     context_key=None,
... )
>>> mapped_sample = mapper(sample)
>>> mapped_sample
{'instruction': 'What is the capital of France?', 'response': 'Paris', 'context': '', 'system_prompt': 'Answer the following question:'}

context_key `instance-attribute` ¶

context_key: str | None

An optional dataset key/column corresponding to context to append to the instruction.

instruction_key `instance-attribute` ¶

instruction_key: str

The dataset key/column corresponding to the input.

response_key `instance-attribute` ¶

response_key: str | None

An optional dataset key/column corresponding to the expected model response to the instruction.

system_prompt_key `instance-attribute` ¶

system_prompt_key: str | None

An optional dataset key/column corresponding to the system prompt for the model.

Schema ¶

Bases: Schema

Universal schema for building an LLM instruction prompt.

Added in version 0.77.0. Renamed `InstructionSchema` to `InstructionSchemaMapper.Schema`.

context `instance-attribute` ¶

context: str

An optional context to append to the instruction.

instruction `instance-attribute` ¶

instruction: str

The input to the model.

response `instance-attribute` ¶

response: str

The optional expected model response to the instruction.

system_prompt `instance-attribute` ¶

system_prompt: str

An optional system prompt for the model.

InstructionSpecialStrings `dataclass` ¶

Special string components of an instruction-tuning prompt.

An instance of this class is expected to be defined for each model to dictate the structure of its prompt.

Added in version 0.77.0. Renamed `SpecialStrings` to `InstructionSpecialStrings`.

CONTEXT_START `instance-attribute` ¶

CONTEXT_START: Final[str]

The delimiter between the instruction and the context.

INSTRUCTION_END `instance-attribute` ¶

INSTRUCTION_END: Final[str]

The end of the instruction tag. The model is highly sensitive to this tag.

INSTRUCTION_START `instance-attribute` ¶

INSTRUCTION_START: Final[str]

The start of the instruction. The model is highly sensitive to this tag.

SYSTEM_PROMPT_END `instance-attribute` ¶

SYSTEM_PROMPT_END: Final[str]

The end of the system prompt.

SYSTEM_PROMPT_START `instance-attribute` ¶

SYSTEM_PROMPT_START: Final[str]

The start of the system prompt.

InstructionTokenizerMapper `dataclass` ¶

Bases: TokenizerMapper, ABC

Tokenizes and builds the intermediate tensor components of an instruction prompt.

always_include_context `class-attribute` `instance-attribute` ¶

always_include_context: bool = False

Whether to always include the start of context tokens in the prompt, even if no context is provided.

special_strings `class-attribute` `instance-attribute` ¶

special_strings: InstructionSpecialStrings = field(
    init=False
)

The special prompt strings to use.

special_tokens `class-attribute` `instance-attribute` ¶

special_tokens: SpecialTokens = field(init=False)

The tokenized special prompt strings.

tokenizer `instance-attribute` ¶

tokenizer: PreTrainedTokenizerBase

The LLM tokenizer to use.

PromptTokens ¶

Bases: TypedDict

Collection of all tokenized components of the prompt.

schema_tokens `instance-attribute` ¶

schema_tokens: SchemaTokens

The tokenized schema components of the prompt.

special_tokens `instance-attribute` ¶

special_tokens: SpecialTokens

The tokenized special components of the prompt.

SchemaTokens ¶

Bases: TypedDict

Tokenized intermediate prompt schema.

context `instance-attribute` ¶

context: Tensor

An optional context to append to the instruction.

instruction `instance-attribute` ¶

instruction: Tensor

The input to the model.

response `instance-attribute` ¶

response: Tensor

The expected model response to the instruction.

system_prompt `instance-attribute` ¶

system_prompt: Tensor

An optional system prompt for the model.

SpecialTokens ¶

Bases: TypedDict

Tokenized special components of the prompt.

bos `instance-attribute` ¶

bos: Tensor

The beginning of string token.

context_start `instance-attribute` ¶

context_start: Tensor

The delimiter between the instruction and the context.

eos `instance-attribute` ¶

eos: Tensor

The end of string token.

instruction_end `instance-attribute` ¶

instruction_end: Tensor

The end of the instruction tag.

instruction_start `instance-attribute` ¶

instruction_start: Tensor

The start of the instruction tag.

system_prompt_end `instance-attribute` ¶

system_prompt_end: Tensor

The end of the system prompt.

system_prompt_start `instance-attribute` ¶

system_prompt_start: Tensor

The start of the system prompt.

tokenize ¶

tokenize(text: str) -> torch.Tensor

Tokenize the text.

Parameters:

Name	Type	Description	Default
`text` ¶	`str`	The text to tokenize.	required

Returns:

Type	Description
`torch.Tensor`	An int64 tensor of token ids.

PreTrainFormatMapper `dataclass` ¶

Builds the tensor components of the transformers.PreTrainedModel pretraining prompt.

Added in version 0.77.0. Added support for pretraining which does not use a prompt template.

PreTrainSchemaMapper `dataclass` ¶

Bases: SchemaMapper

Maps samples from an arbitrary dataset to a universal schema for building an LLM instruction prompt.

Either define a subclass for easier reuse, or use this class directly.

Examples:

>>> sample = {
...     "question": "What is the capital of France?",
... }
>>> mapper = PreTrainSchemaMapper(
...     instruction_key="question",
... )
>>> mapped_sample = mapper(sample)
>>> mapped_sample
{'text': 'What is the capital of France?'}

Added in version 0.77.0. Added support for pretraining which does not use a prompt template.

instruction_key `instance-attribute` ¶

instruction_key: str

The dataset key/column corresponding to the input.

Schema ¶

Bases: Schema

Universal schema for building an LLM instruction prompt.

text `instance-attribute` ¶

text: str

The input to the model.

PreTrainTokenizerMapper `dataclass` ¶

Bases: TokenizerMapper

Tokenizes and builds the intermediate tensor components of a pretraining input which does not have a prompt.

Added in version 0.77.0. Added support for pretraining which does not use a prompt template.

special_tokens `class-attribute` `instance-attribute` ¶

special_tokens: SpecialTokens = field(init=False)

The tokenized special prompt strings.

tokenizer `instance-attribute` ¶

tokenizer: PreTrainedTokenizerBase

The LLM tokenizer to use.

PromptTokens ¶

Bases: TypedDict

Collection of all tokenized components of the prompt.

schema_tokens `instance-attribute` ¶

schema_tokens: SchemaTokens

The tokenized schema components of the prompt.

special_tokens `instance-attribute` ¶

special_tokens: SpecialTokens

The tokenized special components of the prompt.

SchemaTokens ¶

Bases: TypedDict

Tokenized intermediate prompt schema.

text `instance-attribute` ¶

text: Tensor

The input to the model.

SpecialTokens ¶

Bases: TypedDict

Tokenized special components of the prompt.

bos `instance-attribute` ¶

bos: Tensor

The beginning of string token.

eos `instance-attribute` ¶

eos: Tensor

The end of string token.

tokenize ¶

tokenize(text: str) -> torch.Tensor

Tokenize the text.

Parameters:

Name	Type	Description	Default
`text` ¶	`str`	The text to tokenize.	required

Returns:

Type	Description
`torch.Tensor`	An int64 tensor of token ids.

SchemaMapper `dataclass` ¶

Bases: ABC

Maps samples from an arbitrary dataset to a universal schema for building an LLM prompt.

Added in version 0.77.0. Base class for `InstructionSchemaMapper` and `ChatSchemaMapper`.

instruction_key `instance-attribute` ¶

instruction_key: str

The dataset key/column corresponding to the input.

Schema ¶

Bases: TypedDict

Base schema for building an LLM prompt.

TensorToListMapper `dataclass` ¶

Maps a dictionary of int64 tensors to a dictionary of lists of int.

TestMapper `dataclass` ¶

Formats the undifferentiated LlamaForCausalLM input for testing.

Added in version 0.77.0. Renamed `InstructionTestMapper` to `TestMapper`.

TestInput ¶

Bases: TypedDict, Generic[ContainerT]

Input for LlamaForCausalLM testing.

input_ids `instance-attribute` ¶

input_ids: ContainerT

The input token ids.

labels `instance-attribute` ¶

labels: ContainerT

The expected model response to the input_ids. When pretraining, the input_ids are used as the labels.

TokenizerMapper `dataclass` ¶

Bases: ABC

Tokenizes and builds the intermediate tensor components of a prompt.

Added in version 0.77.0. Base class for `InstructionTokenizerMapper` and `ChatTokenizerMapper`.

tokenizer `instance-attribute` ¶

tokenizer: PreTrainedTokenizerBase

The LLM tokenizer to use.

tokenize ¶

tokenize(text: str) -> torch.Tensor

Tokenize the text.

Parameters:

Name	Type	Description	Default
`text` ¶	`str`	The text to tokenize.	required

Returns:

Type	Description
`torch.Tensor`	An int64 tensor of token ids.

TrainMapper `dataclass` ¶

Formats the undifferentiated transformers.PreTrainedModel input for training.

Added in version 0.77.0. Renamed `InstructionTrainMapper` to `TrainMapper`.

TrainInput ¶

Bases: TypedDict, Generic[ContainerT]

Input for transformers.PreTrainedModel training.

input_ids `instance-attribute` ¶

input_ids: ContainerT

The input token ids.

TransformLayerChatFormatMapper `dataclass` ¶

Bases: TransformLayerFormatMapper, ChatFormatMapper

Builds the noise token mask for a chat prompt, which is required for training a TransformLayer.

Added in version 0.77.0.

Changed in version 0.100.0: Removed the option of passing `obfuscate_system_prompt` to the TokenizerWrapper

TransformLayerFormatMapper `dataclass` ¶

Base class for building noise token mask.

Parameters:

Name	Type	Description	Default
`transform_all_tokens` ¶	`bool`	Whether to to transform all the tokens, or only the instruction, context, and possibly the system prompt.	`False`

Added in version 0.77.0. Base class for `TransformLayerInstructionFormatMapper` and `TransformLayerChatFormatMapper`.

Changed in version 0.100.0: Removed the option of passing `obfuscate_system_prompt` to the TokenizerWrapper

TransformLayerInstructionFormatMapper `dataclass` ¶

Bases: TransformLayerFormatMapper, InstructionFormatMapper

Builds the noise token mask for a instruction prompt, which is required for training a TransformLayer.

PromptIndices ¶

Bases: TypedDict

Indices of the prompt components in the input_ids tensor.

Can be used to extract the prompt components from the input_ids tensor by slicing along the sequence dimension.

Examples:

Using the PromptIndices to extract the instruction from the input_ids tensor:

>>> mapper = InstructionFormatMapper()
>>> sample: universal.InstructionTokenizerMapper.PromptTokens = {
...     "special_tokens": {
...         "bos": torch.tensor([[1]]),
...         "instruction_start": torch.tensor([[2]]),
...         "system_prompt_start": torch.tensor([[3]]),
...         "system_prompt_end": torch.tensor([[4]]),
...         "context_start": torch.tensor([[5]]),
...         "instruction_end": torch.tensor([[6]]),
...         "eos": torch.tensor([[7]]),
...     },
...     "schema_tokens": {
...         "instruction": torch.tensor([[8, 9, 10, 11, 12]]),
...         "response": torch.tensor([[13, 14, 15]]),
...         "system_prompt": torch.tensor([[16, 17, 18, 19]]),
...         "context": torch.tensor([[20, 21, 22]]),
...     },
... }
>>> formatted_sample = mapper(sample)
>>> torch.testing.assert_close(
...     sample["schema_tokens"]["instruction"],
...     formatted_sample["input_ids"][:, mapper.prompt_indices["instruction"]],
... )

context `instance-attribute` ¶

context: slice

The slice of input_ids containing the context.

instruction `instance-attribute` ¶

instruction: slice

The slice of input_ids containing the instruction.

system_prompt `instance-attribute` ¶

system_prompt: slice

The slice of input_ids containing the system prompt.

call ¶

__call__(
    sample: PromptTokens,
) -> UndifferentiatedTransformLayerInput

Changed in version 0.74.0: The `noise_token_mask` was renamed to `noise_mask` to create a uniform interface everywhere.

Changed in version 0.100.0: Removed the option of passing `obfuscate_system_prompt` to the TokenizerWrapper

TransformLayerPreTrainFormatMapper `dataclass` ¶

Bases: TransformLayerFormatMapper, PreTrainFormatMapper

Builds the noise token mask for a pretraining scenario which does not use a templated prompt, which is required for training a TransformLayer.

Added in version 0.77.0. Added support for pretraining which does not use a prompt template.

TransformLayerTestMapper `dataclass` ¶

Bases: TestMapper

Formats the undifferentiated InstructionTransformLayer input for testing.

Added in version 0.77.0. Renamed `TransformLayerInstructionTestMapper` to `TransformLayerTestMapper`.

TestInput ¶

Bases: TypedDict, Generic[ContainerT]

Input for LlamaForCausalLM testing.

input_ids `instance-attribute` ¶

input_ids: ContainerT

The input token ids.

labels `instance-attribute` ¶

labels: ContainerT

The expected model response to the input_ids. When pretraining, the input_ids are used as the labels.

TransformLayerTestInput ¶

Bases: TestInput[ContainerT]

Input for InstructionTransformLayer testing.

input_ids `instance-attribute` ¶

input_ids: ContainerT

The input token ids.

labels `instance-attribute` ¶

labels: ContainerT

The expected model response to the input_ids. When pretraining, the input_ids are used as the labels.

noise_mask `instance-attribute` ¶

noise_mask: ContainerT

The mask that dictates which tokens in input_ids to obfuscate.

TransformLayerTrainMapper `dataclass` ¶

Bases: TrainMapper

Formats the undifferentiated InstructionTransformLayer input for training.

Added in version 0.77.0. Renamed `TransformLayerInstructionTrainMapper` to `TransformLayerTrainMapper`.

ignore_prompt_loss `class-attribute` `instance-attribute` ¶

ignore_prompt_loss: bool = True

Whether to ignore the loss on the prompt tokens.

TrainInput ¶

Bases: TypedDict, Generic[ContainerT]

Input for transformers.PreTrainedModel training.

input_ids `instance-attribute` ¶

input_ids: ContainerT

The input token ids.

TransformLayerTrainInput ¶

Bases: TrainInput[ContainerT]

Input for TransformLayer training.

input_ids `instance-attribute` ¶

input_ids: ContainerT

The input token ids.

loss_mask `instance-attribute` ¶

loss_mask: ContainerT

The mask that dictates which tokens in input_ids to use to calculate the loss.

noise_mask `instance-attribute` ¶

noise_mask: ContainerT

The mask that dictates which tokens in input_ids to obfuscate.

call ¶

__call__(
    sample: UndifferentiatedTransformLayerInput,
) -> TransformLayerTrainInput[torch.Tensor]

Changed in version 0.74.0: The `noise_token_mask` was renamed to `noise_mask` to create a uniform interface everywhere.

UndifferentiatedInput ¶

Bases: TypedDict

Formatted input for the transformers.PreTrainedModel that must be further formatted into either training or testing input.

Must be further formatted based on if the model is being trained or evaluated.

Added in version 0.77.0. Renamed `InstructionFormatMapper.UndifferentiatedInstructionInput` to `UndifferentiatedInput`.

input_ids `instance-attribute` ¶

input_ids: Tensor

The input token ids.

response `instance-attribute` ¶

response: NotRequired[Tensor]

The expected model response to the input_ids.

UndifferentiatedTransformLayerInput ¶

Bases: UndifferentiatedInput

Formatted input for the TransformLayer that must be further formatted into either training or testing input.

Must be further formatted based on if the model is being trained or evaluated.

Changed in version 0.74.0: The `noise_token_mask` was renamed to `noise_mask` to create a uniform interface everywhere.

Added in version 0.77.0. Renamed `TransformLayerInstructionFormatMapper.UndifferentiatedTransformLayerInstructionInput` to `UndifferentiatedTransformLayerInput`.

input_ids `instance-attribute` ¶

input_ids: Tensor

The input token ids.

noise_mask `instance-attribute` ¶

noise_mask: Tensor

The mask that dictates which tokens in input_ids to obfuscate.

response `instance-attribute` ¶

response: NotRequired[Tensor]

The expected model response to the input_ids.

universal

ChatFormatMapper dataclass ¶

ChatRoleStrings dataclass ¶

ASSISTANT_ROLE class-attribute instance-attribute ¶

SYSTEM_ROLE class-attribute instance-attribute ¶

USER_ROLE class-attribute instance-attribute ¶

ChatSchemaMapper dataclass ¶

instruction_key instance-attribute ¶

response_key instance-attribute ¶

system_prompt_key instance-attribute ¶

Schema ¶

content instance-attribute ¶

role instance-attribute ¶

ChatSpecialStrings dataclass ¶

MESSAGE_END instance-attribute ¶

ROLES instance-attribute ¶

ROLE_HEADER_END instance-attribute ¶

ROLE_HEADER_START instance-attribute ¶

ChatTokenizerMapper dataclass ¶

special_strings class-attribute instance-attribute ¶

special_tokens class-attribute instance-attribute ¶

tokenizer instance-attribute ¶

PromptTokens ¶

schema_tokens instance-attribute ¶

special_tokens instance-attribute ¶

SchemaTokens ¶

content instance-attribute ¶

role instance-attribute ¶

SpecialTokens ¶

assistant_role instance-attribute ¶

bos instance-attribute ¶

message_end instance-attribute ¶

role_header_end instance-attribute ¶

role_header_start instance-attribute ¶

system_role instance-attribute ¶

user_role instance-attribute ¶

tokenize ¶

text ¶

InstructionFormatMapper dataclass ¶

PromptIndices ¶

context instance-attribute ¶

instruction instance-attribute ¶

system_prompt instance-attribute ¶

InstructionSchemaMapper dataclass ¶

context_key instance-attribute ¶

instruction_key instance-attribute ¶

response_key instance-attribute ¶

system_prompt_key instance-attribute ¶

Schema ¶

context instance-attribute ¶

instruction instance-attribute ¶

response instance-attribute ¶

system_prompt instance-attribute ¶

InstructionSpecialStrings dataclass ¶

CONTEXT_START instance-attribute ¶

INSTRUCTION_END instance-attribute ¶

INSTRUCTION_START instance-attribute ¶

SYSTEM_PROMPT_END instance-attribute ¶

SYSTEM_PROMPT_START instance-attribute ¶

InstructionTokenizerMapper dataclass ¶

always_include_context class-attribute instance-attribute ¶

special_strings class-attribute instance-attribute ¶

special_tokens class-attribute instance-attribute ¶

tokenizer instance-attribute ¶

PromptTokens ¶

schema_tokens instance-attribute ¶

special_tokens instance-attribute ¶

SchemaTokens ¶

context instance-attribute ¶

instruction instance-attribute ¶

response instance-attribute ¶

system_prompt instance-attribute ¶

SpecialTokens ¶

bos instance-attribute ¶

context_start instance-attribute ¶

eos instance-attribute ¶

instruction_end instance-attribute ¶

instruction_start instance-attribute ¶

system_prompt_end instance-attribute ¶

system_prompt_start instance-attribute ¶

ChatFormatMapper `dataclass` ¶

ChatRoleStrings `dataclass` ¶

ASSISTANT_ROLE `class-attribute` `instance-attribute` ¶

SYSTEM_ROLE `class-attribute` `instance-attribute` ¶

USER_ROLE `class-attribute` `instance-attribute` ¶

ChatSchemaMapper `dataclass` ¶

instruction_key `instance-attribute` ¶

response_key `instance-attribute` ¶

system_prompt_key `instance-attribute` ¶

content `instance-attribute` ¶

role `instance-attribute` ¶

ChatSpecialStrings `dataclass` ¶

MESSAGE_END `instance-attribute` ¶

ROLES `instance-attribute` ¶

ROLE_HEADER_END `instance-attribute` ¶

ROLE_HEADER_START `instance-attribute` ¶

ChatTokenizerMapper `dataclass` ¶

special_strings `class-attribute` `instance-attribute` ¶

special_tokens `class-attribute` `instance-attribute` ¶

tokenizer `instance-attribute` ¶

schema_tokens `instance-attribute` ¶

special_tokens `instance-attribute` ¶

content `instance-attribute` ¶

role `instance-attribute` ¶

assistant_role `instance-attribute` ¶

bos `instance-attribute` ¶

message_end `instance-attribute` ¶

role_header_end `instance-attribute` ¶

role_header_start `instance-attribute` ¶

system_role `instance-attribute` ¶

user_role `instance-attribute` ¶

`text` ¶

InstructionFormatMapper `dataclass` ¶

context `instance-attribute` ¶

instruction `instance-attribute` ¶

system_prompt `instance-attribute` ¶

InstructionSchemaMapper `dataclass` ¶

context_key `instance-attribute` ¶

instruction_key `instance-attribute` ¶

response_key `instance-attribute` ¶

system_prompt_key `instance-attribute` ¶

context `instance-attribute` ¶

instruction `instance-attribute` ¶

response `instance-attribute` ¶

system_prompt `instance-attribute` ¶

InstructionSpecialStrings `dataclass` ¶

CONTEXT_START `instance-attribute` ¶

INSTRUCTION_END `instance-attribute` ¶

INSTRUCTION_START `instance-attribute` ¶

SYSTEM_PROMPT_END `instance-attribute` ¶

SYSTEM_PROMPT_START `instance-attribute` ¶

InstructionTokenizerMapper `dataclass` ¶

always_include_context `class-attribute` `instance-attribute` ¶

special_strings `class-attribute` `instance-attribute` ¶

special_tokens `class-attribute` `instance-attribute` ¶

tokenizer `instance-attribute` ¶

schema_tokens `instance-attribute` ¶

special_tokens `instance-attribute` ¶

context `instance-attribute` ¶

instruction `instance-attribute` ¶

response `instance-attribute` ¶

system_prompt `instance-attribute` ¶

bos `instance-attribute` ¶

context_start `instance-attribute` ¶

eos `instance-attribute` ¶

instruction_end `instance-attribute` ¶

instruction_start `instance-attribute` ¶

system_prompt_end `instance-attribute` ¶

system_prompt_start `instance-attribute` ¶

`text` ¶

PreTrainFormatMapper `dataclass` ¶

PreTrainSchemaMapper `dataclass` ¶

instruction_key `instance-attribute` ¶

text `instance-attribute` ¶

PreTrainTokenizerMapper `dataclass` ¶

special_tokens `class-attribute` `instance-attribute` ¶

tokenizer `instance-attribute` ¶

schema_tokens `instance-attribute` ¶

special_tokens `instance-attribute` ¶

text `instance-attribute` ¶