serialization
Classes:
Name | Description |
---|---|
B64EncodedFile |
JSON-decodable dictionary designating that it contains base 64 encoded binary file. |
Functions:
Name | Description |
---|---|
deserialize_tokenizer |
Deserialize a tokenizer from a single json-serializable dictionary. |
is_b64_encoded_file |
Test if a mapping represents a base64 encoded binary file. |
serialize_tokenizer |
Serialize a tokenizer to a single json-serializable dictionary. |
B64EncodedFile
¶
Bases: TypedDict
JSON-decodable dictionary designating that it contains base 64 encoded binary file.
deserialize_tokenizer
¶
deserialize_tokenizer(
serialized_tokenizer: Mapping[
str, Mapping[str, Any] | str | B64EncodedFile
],
) -> transformers.PreTrainedTokenizerBase
Deserialize a tokenizer from a single json-serializable dictionary.
Warning
Because of implementation details internal to HuggingFace Tokenizers, this uses a temporary directory as a buffer when creating the tokenizer config JSON dictionaries.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
Mapping[str, Mapping[str, Any] | str | B64EncodedFile]
|
The serialized tokenizer to deserialize. |
required |
Returns:
Type | Description |
---|---|
transformers.PreTrainedTokenizerBase
|
The deserialized tokenizer. |
is_b64_encoded_file
¶
is_b64_encoded_file(
mapping: Mapping[str, Any] | B64EncodedFile,
) -> TypeIs[B64EncodedFile]
Test if a mapping represents a base64 encoded binary file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
Mapping[str, Any] | B64EncodedFile
|
Mapping under test. |
required |
Returns:
Type | Description |
---|---|
TypeIs[B64EncodedFile]
|
Whether the mapping is a B64 encoded file. |
serialize_tokenizer
cached
¶
serialize_tokenizer(
tokenizer: PreTrainedTokenizerBase,
) -> dict[str, dict[str, Any] | str | B64EncodedFile]
Serialize a tokenizer to a single json-serializable dictionary.
Warning
Because of implementation details internal to HuggingFace Tokenizers, this uses a temporary directory as a buffer when creating the tokenizer config JSON dictionaries.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
PreTrainedTokenizerBase
|
The tokenizer to serialize. |
required |
Returns:
Type | Description |
---|---|
dict[str, dict[str, Any] | str | B64EncodedFile]
|
A dictionary containing the serialized tokenizer. |