hermes
Mapper
classes (designed to be compatible with datasets.Dataset.map) useful for building Hermes prompts for Stained Glass Transform
training and testing.
Modules:
Name | Description |
---|---|
universal |
Model-agnostic |
version |
Constants storing the version numbers for major changes to the codebase. |
Classes:
Name | Description |
---|---|
HermesChatTokenizerMapper |
Tokenizes and builds the intermediate tensor components of a prompt. |
Attributes:
Name | Type | Description |
---|---|---|
HERMES_SPECIAL_STRINGS |
Final[ChatSpecialStrings]
|
Special string components of the Hermes prompt. |
HERMES_SPECIAL_STRINGS
module-attribute
¶
HERMES_SPECIAL_STRINGS: Final[ChatSpecialStrings] = ChatSpecialStrings(ROLES=ChatRoleStrings(SYSTEM_ROLE='system', USER_ROLE='user', ASSISTANT_ROLE='assistant'), ROLE_HEADER_START='<|im_start|>', ROLE_HEADER_END='\n', MESSAGE_END='<|im_end|>\n')
Special string components of the Hermes prompt.
Based on the Hugging Face Hub chat template for 'NousResearch/Hermes-3-Llama-3.1-8B'.
The prompt is structured as follows
{{bos_token}}{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system
You are a helpful assistant.<|im_end|>
' }}{% endif %}{{'<|im_start|>' + message['role'] + '
' + message['content'] + '<|im_end|>' + '
'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}
HermesChatTokenizerMapper
dataclass
¶
Bases: ChatTokenizerMapper
Tokenizes and builds the intermediate tensor components of a prompt.
Added in version 0.104.0.
Classes:
Name | Description |
---|---|
PromptTokens |
Collection of all tokenized components of the prompt. |
SchemaTokens |
Tokenized intermediate prompt schema. |
SpecialTokens |
Tokenized special components of the prompt. |
Methods:
Name | Description |
---|---|
tokenize |
Tokenize the text. |
Attributes:
Name | Type | Description |
---|---|---|
special_tokens |
SpecialTokens
|
The tokenized special prompt strings. |
tokenizer |
PreTrainedTokenizerBase
|
The LLM tokenizer to use. |
special_tokens
class-attribute
instance-attribute
¶
special_tokens: SpecialTokens = field(init=False)
The tokenized special prompt strings.
PromptTokens
¶
Bases: TypedDict
Collection of all tokenized components of the prompt.
Attributes:
Name | Type | Description |
---|---|---|
schema_tokens |
list[SchemaTokens]
|
The tokenized schema components of the prompt. |
special_tokens |
SpecialTokens
|
The tokenized special components of the prompt. |
schema_tokens
instance-attribute
¶
schema_tokens: list[SchemaTokens]
The tokenized schema components of the prompt.
special_tokens
instance-attribute
¶
special_tokens: SpecialTokens
The tokenized special components of the prompt.
SchemaTokens
¶
SpecialTokens
¶
Bases: TypedDict
Tokenized special components of the prompt.
Attributes:
Name | Type | Description |
---|---|---|
assistant_role |
Tensor
|
The assistant role. |
bos |
Tensor
|
The beginning of string token. |
message_end |
Tensor
|
The end of a message. |
role_header_end |
Tensor
|
The end of the role header. |
role_header_start |
Tensor
|
The start of the role header. |
system_role |
Tensor
|
The system role. |
user_role |
Tensor
|
The user role. |