mistral
Mapper
classes (designed to be compatible with datasets.Dataset.map) useful for building Mistral prompts for Stained Glass Transform
training and testing.
MISTRAL_SPECIAL_STRINGS
module-attribute
¶
MISTRAL_SPECIAL_STRINGS: Final[InstructionSpecialStrings] = InstructionSpecialStrings(INSTRUCTION_START='[INST]', SYSTEM_PROMPT_START='', SYSTEM_PROMPT_END='', CONTEXT_START='###', INSTRUCTION_END='[/INST]')
Special string components of the Mistral prompt.
Based on: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2#instruction-format.
The prompt is structured as follows
Mistral makes no distinction between system prompts, prompts, or bodies/context in its instruction tuning format.
Characters like ###
and <<>>
are used to specify boundaries between sections of text.
https://docs.mistral.ai/guides/prompting-capabilities/
https://www.promptingguide.ai/models/mistral-7b#mistral-7b-instruct
MistralInstructionTokenizerMapper
dataclass
¶
Bases: InstructionTokenizerMapper
Tokenizes and builds the intermediate tensor components of a prompt.
always_include_context
class-attribute
instance-attribute
¶
always_include_context: bool = False
Whether to always include the start of context tokens in the prompt, even if no context is provided.
special_tokens
class-attribute
instance-attribute
¶
special_tokens: SpecialTokens = field(init=False)
The tokenized special prompt strings.
PromptTokens
¶
Bases: TypedDict
Collection of all tokenized components of the prompt.
schema_tokens
instance-attribute
¶
schema_tokens: SchemaTokens
The tokenized schema components of the prompt.
special_tokens
instance-attribute
¶
special_tokens: SpecialTokens
The tokenized special components of the prompt.
SchemaTokens
¶
Bases: TypedDict
Tokenized intermediate prompt schema.
MistralMultiturnTransformLayerMapper
¶
Based on https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2#instruction-format.
The prompt for multiturn chats is structured as follows
Added in version 0.87.0.
Changed in version 0.97.0: Messages can optionally start with a system prompt.
__call__
¶
Tokenizes and builds the intermediate tensor components of a multiturn prompt.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample |
Sequence[Schema]
|
A sequence of messages in the conversation. This argument name is used for consistency with other |
required |
Returns:
Type | Description |
---|---|
universal.UndifferentiatedTransformLayerInput
|
The intermediate tensor components of the multiturn prompt. |
__init__
¶
__init__(tokenizer: PreTrainedTokenizerBase) -> None
Tokenizes and builds tensors for TransformLayer
from multiturn messages.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tokenizer |
PreTrainedTokenizerBase
|
The tokenizer to use for tokenization. |
required |