serialization
Classes:
| Name | Description | 
|---|---|
| B64EncodedFile | JSON-decodable dictionary designating that it contains base 64 encoded binary file. | 
Functions:
| Name | Description | 
|---|---|
| deserialize_tokenizer | Deserialize a tokenizer from a single json-serializable dictionary. | 
| is_b64_encoded_file | Test if a mapping represents a base64 encoded binary file. | 
| serialize_tokenizer | Serialize a tokenizer to a single json-serializable dictionary. | 
    
              Bases: TypedDict
JSON-decodable dictionary designating that it contains base 64 encoded binary file.
deserialize_tokenizer(
    serialized_tokenizer: Mapping[
        str, Mapping[str, Any] | str | B64EncodedFile
    ],
) -> transformers.PreTrainedTokenizerBase
Deserialize a tokenizer from a single json-serializable dictionary.
Warning
Because of implementation details internal to HuggingFace Tokenizers, this uses a temporary directory as a buffer when creating the tokenizer config JSON dictionaries.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
|                    | Mapping[str, Mapping[str, Any] | str | B64EncodedFile] | The serialized tokenizer to deserialize. | required | 
Returns:
| Type | Description | 
|---|---|
| transformers.PreTrainedTokenizerBase | The deserialized tokenizer. | 
is_b64_encoded_file(
    mapping: Mapping[str, Any] | B64EncodedFile,
) -> TypeIs[B64EncodedFile]
Test if a mapping represents a base64 encoded binary file.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
|                    | Mapping[str, Any] | B64EncodedFile | Mapping under test. | required | 
Returns:
| Type | Description | 
|---|---|
| TypeIs[B64EncodedFile] | Whether the mapping is a B64 encoded file. | 
cached
  
¶
serialize_tokenizer(
    tokenizer: PreTrainedTokenizerBase,
) -> dict[str, dict[str, Any] | str | B64EncodedFile]
Serialize a tokenizer to a single json-serializable dictionary.
Warning
Because of implementation details internal to HuggingFace Tokenizers, this uses a temporary directory as a buffer when creating the tokenizer config JSON dictionaries.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
|                    | PreTrainedTokenizerBase | The tokenizer to serialize. | required | 
Returns:
| Type | Description | 
|---|---|
| dict[str, dict[str, Any] | str | B64EncodedFile] | A dictionary containing the serialized tokenizer. |