hermes

Mapper classes (designed to be compatible with datasets.Dataset.map) useful for building Hermes prompts for Stained Glass Transform training and testing.

Modules:

Name	Description
`universal`	Model-agnostic `Mapper` classes (designed to be compatible with datasets.Dataset.map) useful for building LLM prompts for Stained
`version`	Constants storing the version numbers for major changes to the codebase.

Classes:

Name	Description
`HermesChatTokenizerMapper`	Tokenizes and builds the intermediate tensor components of a prompt.

Attributes:

Name	Type	Description
`HERMES_SPECIAL_STRINGS`	`Final[ChatSpecialStrings]`	Special string components of the Hermes prompt.

HERMES_SPECIAL_STRINGS `module-attribute` ¶

HERMES_SPECIAL_STRINGS: Final[ChatSpecialStrings] = ChatSpecialStrings(ROLES=ChatRoleStrings(SYSTEM_ROLE='system', USER_ROLE='user', ASSISTANT_ROLE='assistant'), ROLE_HEADER_START='<|im_start|>', ROLE_HEADER_END='\n', MESSAGE_END='<|im_end|>\n')

Special string components of the Hermes prompt.

Based on the Hugging Face Hub chat template for 'NousResearch/Hermes-3-Llama-3.1-8B'.

The prompt is structured as follows

{{bos_token}}{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system
You are a helpful assistant.<|im_end|>
' }}{% endif %}{{'<|im_start|>' + message['role'] + '
' + message['content'] + '<|im_end|>' + '
'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}

HermesChatTokenizerMapper `dataclass` ¶

Bases: ChatTokenizerMapper

Tokenizes and builds the intermediate tensor components of a prompt.

Added in version 0.104.0.

Classes:

Name	Description
`PromptTokens`	Collection of all tokenized components of the prompt.
`SchemaTokens`	Tokenized intermediate prompt schema.
`SpecialTokens`	Tokenized special components of the prompt.

Methods:

Name	Description
`tokenize`	Tokenize the text.

Attributes:

Name	Type	Description
`special_tokens`	`SpecialTokens`	The tokenized special prompt strings.
`tokenizer`	`PreTrainedTokenizerBase`	The LLM tokenizer to use.

special_tokens `class-attribute` `instance-attribute` ¶

special_tokens: SpecialTokens = field(init=False)

The tokenized special prompt strings.

tokenizer `instance-attribute` ¶

tokenizer: PreTrainedTokenizerBase

The LLM tokenizer to use.

PromptTokens ¶

Bases: TypedDict

Collection of all tokenized components of the prompt.

Attributes:

Name	Type	Description
`schema_tokens`	`list[SchemaTokens]`	The tokenized schema components of the prompt.
`special_tokens`	`SpecialTokens`	The tokenized special components of the prompt.

schema_tokens `instance-attribute` ¶

schema_tokens: list[SchemaTokens]

The tokenized schema components of the prompt.

special_tokens `instance-attribute` ¶

special_tokens: SpecialTokens

The tokenized special components of the prompt.

SchemaTokens ¶

Bases: TypedDict

Tokenized intermediate prompt schema.

Attributes:

Name	Type	Description
`content`	`Tensor`	The content of the message.
`role`	`Tensor`	The role of the message.

content `instance-attribute` ¶

content: Tensor

The content of the message.

role `instance-attribute` ¶

role: Tensor

The role of the message.

SpecialTokens ¶

Bases: TypedDict

Tokenized special components of the prompt.

Attributes:

Name	Type	Description
`assistant_role`	`Tensor`	The assistant role.
`bos`	`Tensor`	The beginning of string token.
`message_end`	`Tensor`	The end of a message.
`role_header_end`	`Tensor`	The end of the role header.
`role_header_start`	`Tensor`	The start of the role header.
`system_role`	`Tensor`	The system role.
`user_role`	`Tensor`	The user role.

assistant_role `instance-attribute` ¶

assistant_role: Tensor

The assistant role.

bos `instance-attribute` ¶

bos: Tensor

The beginning of string token.

message_end `instance-attribute` ¶

message_end: Tensor

The end of a message.

role_header_end `instance-attribute` ¶

role_header_end: Tensor

The end of the role header.

role_header_start `instance-attribute` ¶

role_header_start: Tensor

The start of the role header.

system_role `instance-attribute` ¶

system_role: Tensor

The system role.

user_role `instance-attribute` ¶

user_role: Tensor

The user role.

tokenize ¶

tokenize(text: str) -> torch.Tensor

Tokenize the text.

Parameters:

Name	Type	Description	Default
`text` ¶	`str`	The text to tokenize.	required

Returns:

Type	Description
`torch.Tensor`	An int64 tensor of token ids.

hermes

HERMES_SPECIAL_STRINGS module-attribute ¶

HermesChatTokenizerMapper dataclass ¶

special_tokens class-attribute instance-attribute ¶

tokenizer instance-attribute ¶

PromptTokens ¶

schema_tokens instance-attribute ¶

special_tokens instance-attribute ¶

SchemaTokens ¶

content instance-attribute ¶

role instance-attribute ¶

SpecialTokens ¶

assistant_role instance-attribute ¶

bos instance-attribute ¶

message_end instance-attribute ¶

role_header_end instance-attribute ¶

role_header_start instance-attribute ¶

system_role instance-attribute ¶

user_role instance-attribute ¶

tokenize ¶

text ¶

HERMES_SPECIAL_STRINGS `module-attribute` ¶

HermesChatTokenizerMapper `dataclass` ¶

special_tokens `class-attribute` `instance-attribute` ¶

tokenizer `instance-attribute` ¶

schema_tokens `instance-attribute` ¶

special_tokens `instance-attribute` ¶

content `instance-attribute` ¶

role `instance-attribute` ¶

assistant_role `instance-attribute` ¶

bos `instance-attribute` ¶

message_end `instance-attribute` ¶

role_header_end `instance-attribute` ¶

role_header_start `instance-attribute` ¶

system_role `instance-attribute` ¶

user_role `instance-attribute` ¶

`text` ¶