Skip to content

hermes

Mapper classes (designed to be compatible with datasets.Dataset.map) useful for building Hermes prompts for Stained Glass Transform training and testing.

HERMES_SPECIAL_STRINGS module-attribute

HERMES_SPECIAL_STRINGS: Final[ChatSpecialStrings] = (
    ChatSpecialStrings(
        ROLES=ChatRoleStrings(
            SYSTEM_ROLE="system",
            USER_ROLE="user",
            ASSISTANT_ROLE="assistant",
        ),
        ROLE_HEADER_START="<|im_start|>",
        ROLE_HEADER_END="\n",
        MESSAGE_END="<|im_end|>\n",
    )
)

Special string components of the Hermes prompt.

Based on the Hugging Face Hub chat template for 'NousResearch/Hermes-3-Llama-3.1-8B'.

The prompt is structured as follows
{{bos_token}}{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system
You are a helpful assistant.<|im_end|>
' }}{% endif %}{{'<|im_start|>' + message['role'] + '
' + message['content'] + '<|im_end|>' + '
'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}

HermesChatTokenizerMapper dataclass

Bases: ChatTokenizerMapper

Tokenizes and builds the intermediate tensor components of a prompt.

Added in version 0.104.0.

special_tokens class-attribute instance-attribute

special_tokens: SpecialTokens = field(init=False)

The tokenized special prompt strings.

tokenizer instance-attribute

The LLM tokenizer to use.

PromptTokens

Bases: TypedDict

Collection of all tokenized components of the prompt.

schema_tokens instance-attribute

schema_tokens: list[SchemaTokens]

The tokenized schema components of the prompt.

special_tokens instance-attribute

special_tokens: SpecialTokens

The tokenized special components of the prompt.

SchemaTokens

Bases: TypedDict

Tokenized intermediate prompt schema.

content instance-attribute

content: Tensor

The content of the message.

role instance-attribute

role: Tensor

The role of the message.

SpecialTokens

Bases: TypedDict

Tokenized special components of the prompt.

assistant_role instance-attribute

assistant_role: Tensor

The assistant role.

bos instance-attribute

bos: Tensor

The beginning of string token.

message_end instance-attribute

message_end: Tensor

The end of a message.

role_header_end instance-attribute

role_header_end: Tensor

The end of the role header.

role_header_start instance-attribute

role_header_start: Tensor

The start of the role header.

system_role instance-attribute

system_role: Tensor

The system role.

user_role instance-attribute

user_role: Tensor

The user role.

tokenize

tokenize(text: str) -> torch.Tensor

Tokenize the text.

Parameters:

Name Type Description Default

text

str

The text to tokenize.

required

Returns:

Type Description
torch.Tensor

An int64 tensor of token ids.