Skip to content

serialization

Classes:

Name Description
B64EncodedFile

JSON-decodable dictionary designating that it contains base 64 encoded binary file.

Functions:

Name Description
deserialize_tokenizer

Deserialize a tokenizer from a single json-serializable dictionary.

is_b64_encoded_file

Test if a mapping represents a base64 encoded binary file.

serialize_tokenizer

Serialize a tokenizer to a single json-serializable dictionary.

B64EncodedFile

Bases: TypedDict

JSON-decodable dictionary designating that it contains base 64 encoded binary file.

deserialize_tokenizer

deserialize_tokenizer(
    serialized_tokenizer: Mapping[
        str, Mapping[str, Any] | str | B64EncodedFile
    ],
) -> transformers.PreTrainedTokenizerBase

Deserialize a tokenizer from a single json-serializable dictionary.

Warning

Because of implementation details internal to HuggingFace Tokenizers, this uses a temporary directory as a buffer when creating the tokenizer config JSON dictionaries.

Parameters:

Name Type Description Default

serialized_tokenizer

Mapping[str, Mapping[str, Any] | str | B64EncodedFile]

The serialized tokenizer to deserialize.

required

Returns:

Type Description
transformers.PreTrainedTokenizerBase

The deserialized tokenizer.

is_b64_encoded_file

is_b64_encoded_file(
    mapping: Mapping[str, Any] | B64EncodedFile,
) -> TypeIs[B64EncodedFile]

Test if a mapping represents a base64 encoded binary file.

Parameters:

Name Type Description Default

mapping

Mapping[str, Any] | B64EncodedFile

Mapping under test.

required

Returns:

Type Description
TypeIs[B64EncodedFile]

Whether the mapping is a B64 encoded file.

serialize_tokenizer cached

serialize_tokenizer(
    tokenizer: PreTrainedTokenizerBase,
) -> dict[str, dict[str, Any] | str | B64EncodedFile]

Serialize a tokenizer to a single json-serializable dictionary.

Warning

Because of implementation details internal to HuggingFace Tokenizers, this uses a temporary directory as a buffer when creating the tokenizer config JSON dictionaries.

Parameters:

Name Type Description Default

tokenizer

PreTrainedTokenizerBase

The tokenizer to serialize.

required

Returns:

Type Description
dict[str, dict[str, Any] | str | B64EncodedFile]

A dictionary containing the serialized tokenizer.