hermes
Mapper
classes (designed to be compatible with datasets.Dataset.map) useful for building Hermes prompts for Stained Glass Transform
training and testing.
Classes:
Name | Description |
---|---|
HermesChatTokenizerMapper |
Tokenizes and builds the intermediate tensor components of a prompt. |
Attributes:
Name | Type | Description |
---|---|---|
HERMES_SPECIAL_STRINGS |
Final[ChatSpecialStrings]
|
Special string components of the Hermes prompt. |
HERMES_SPECIAL_STRINGS
module-attribute
¶
HERMES_SPECIAL_STRINGS: Final[ChatSpecialStrings] = (
ChatSpecialStrings(
ROLES=ChatRoleStrings(
SYSTEM_ROLE="system",
USER_ROLE="user",
ASSISTANT_ROLE="assistant",
),
ROLE_HEADER_START="<|im_start|>",
ROLE_HEADER_END="\n",
MESSAGE_END="<|im_end|>\n",
)
)
Special string components of the Hermes prompt.
Based on the Hugging Face Hub chat template for 'NousResearch/Hermes-3-Llama-3.1-8B'.
The prompt is structured as follows
{{bos_token}}{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system
You are a helpful assistant.<|im_end|>
' }}{% endif %}{{'<|im_start|>' + message['role'] + '
' + message['content'] + '<|im_end|>' + '
'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}
HermesChatTokenizerMapper
dataclass
¶
Bases: ChatTokenizerMapper
Tokenizes and builds the intermediate tensor components of a prompt.
Added in version 0.104.0.
Classes:
Name | Description |
---|---|
PromptTokens |
Collection of all tokenized components of the prompt. |
SchemaTokens |
Tokenized intermediate prompt schema. |
SpecialTokens |
Tokenized special components of the prompt. |
Methods:
Name | Description |
---|---|
tokenize |
Tokenize the text. |
Attributes:
Name | Type | Description |
---|---|---|
special_tokens |
SpecialTokens
|
The tokenized special prompt strings. |
tokenizer |
PreTrainedTokenizerBase
|
The LLM tokenizer to use. |
special_tokens
class-attribute
instance-attribute
¶
special_tokens: SpecialTokens = field(init=False)
The tokenized special prompt strings.
PromptTokens
¶
Bases: TypedDict
Collection of all tokenized components of the prompt.
Attributes:
Name | Type | Description |
---|---|---|
schema_tokens |
list[SchemaTokens]
|
The tokenized schema components of the prompt. |
special_tokens |
SpecialTokens
|
The tokenized special components of the prompt. |
schema_tokens
instance-attribute
¶
schema_tokens: list[SchemaTokens]
The tokenized schema components of the prompt.
special_tokens
instance-attribute
¶
special_tokens: SpecialTokens
The tokenized special components of the prompt.
SchemaTokens
¶
SpecialTokens
¶
Bases: TypedDict
Tokenized special components of the prompt.
Attributes:
Name | Type | Description |
---|---|---|
assistant_role |
Tensor
|
The assistant role. |
bos |
Tensor
|
The beginning of string token. |
message_end |
Tensor
|
The end of a message. |
role_header_end |
Tensor
|
The end of the role header. |
role_header_start |
Tensor
|
The start of the role header. |
system_role |
Tensor
|
The system role. |
user_role |
Tensor
|
The user role. |