serialization

Utilities for serializing and deserializing large, nested dictionaries.

Classes:

Name	Description
`IndexFileMalformedError`	Exception raised when the index file is malformed or missing required keys.
`MissingIndexFileError`	Exception raised when the index file is missing in the ZIP archive.
`SchemaZIPSerializer`	Serialize and deserialize a large, nested dictionary into a ZIP archive

Functions:

Name	Description
`import_class_from_fully_qualified_name`	Dynamically imports and returns a class or attribute from a fully qualified name.

IndexFileMalformedError ¶

Bases: KeyError

Exception raised when the index file is malformed or missing required keys.

MissingIndexFileError ¶

Bases: KeyError

Exception raised when the index file is missing in the ZIP archive.

SchemaZIPSerializer ¶

Serialize and deserialize a large, nested dictionary into a ZIP archive (returned as bytes) based on a user-provided schema mapping of key paths to file templates. Supports dynamic per-key files via {key} in templates. Schema mapping keys are tuples representing the dictionary path, e.g. ('config', 'noise_tokenizer') Values are relative paths within the ZIP, which may include a {key} placeholder for dynamic subfiles. The ZIP will contain: - A root schema file (default: 'index.json') holding the mapping and a skeleton of the data with $ref placeholders. - One or more JSON files per the mapping. - Any additional files passed via extra_files Use dumps(data, extra_files=...) -> bytes to produce a ZIP, and loads(zip_bytes) -> (dict, extra_files) to reconstruct. Unmapped keys are included inline in the schema and preserved on deserialization.

Examples:

>>> data = {
...     "config": {
...         "settings": {"x": 1, "y": 2},
...         "noise_tokenizer": {"t1": {"a": 0.1}, "t2": {"a": 0.2}},
...     },
...     "users": {"alice": {"id": 1}, "bob": {"id": 2}},
...     "notes": {"misc": "inline"},
... }

Mappings must contain the root index file, and can contain any chain of nested keys. If a filename in the mapping contains a {key} placeholder, the serializer will create a separate file for each key in the dictionary at that path. (It is an error to use {key} in a mapping that does not point to a dictionary.) Mappings also do not need to be complete; any keys not in the mapping will be included inline in the skeleton in the index file.

>>> mapping = {
...     (): "index.json",
...     ("config", "settings"): "config/settings.json",
...     ("config", "noise_tokenizer"): "noise/{key}.json",
...     ("users",): "users.json",
... }
>>> extra_files = {"README.txt": "This is a test ZIP."}
>>> serializer = SchemaZIPSerializer(mapping)
>>> zip_bytes = serializer.dumps(data, extra_files=extra_files)
>>> isinstance(zip_bytes, bytes)
True
>>> restored, extras = SchemaZIPSerializer.loads(zip_bytes)
>>> restored == data
True
>>> extras["README.txt"].decode() == "This is a test ZIP."
True

Inspecting ZIP contents:

>>> zf = zipfile.ZipFile(io.BytesIO(zip_bytes))
>>> sorted(zf.namelist())
['README.txt', 'config/settings.json', 'index.json', 'noise/t1.json', 'noise/t2.json', 'users.json']

Deserialization still works regardless of the mapping used:

>>> mapping2 = {
...     (): "index.json",
... }
>>> zip_bytes_2 = SchemaZIPSerializer(mapping2).dumps(data)
>>> data2, _ = SchemaZIPSerializer.loads(zip_bytes_2)
>>> data2 == data
True

Added in version v0.144.0.

Methods:

Name	Description
`__init__`	Initialize the serializer with a mapping of paths to filename templates.
`dumps`	Serialize the data into a ZIP archive.
`loads`	Deserialize the ZIP archive back into its original data dictionary and extra files.

Attributes:

Name	Type	Description
`REF_KEY`	`Final[str]`	Key used to indicate a reference to a file in the ZIP archive. This generally should not be changed.
`SCHEMA_FILENAME`	`Final[str]`	Default filename for the root schema file in the ZIP. This generally should not be changed.

REF_KEY `class-attribute` `instance-attribute` ¶

REF_KEY: Final[str] = '$ref'

Key used to indicate a reference to a file in the ZIP archive. This generally should not be changed.

SCHEMA_FILENAME `class-attribute` `instance-attribute` ¶

SCHEMA_FILENAME: Final[str] = 'index.json'

Default filename for the root schema file in the ZIP. This generally should not be changed.

init ¶

__init__(mapping: Mapping[tuple[str, ...], str]) -> None

Initialize the serializer with a mapping of paths to filename templates.

Parameters:

Name	Type	Description	Default
`mapping` ¶	`Mapping[tuple[str, ...], str]`	A dictionary mapping tuples of strings (representing paths in the nested dictionary) to filename templates. The templates can include a `{key}` placeholder for dynamic subfiles, where each key in the subdictionary will get its own file, with the key replacing `{key}` in the filename.	required

dumps ¶

dumps(
    data: dict[str, Any],
    extra_files: dict[str, str | bytes] | None = None,
) -> bytes

Serialize the data into a ZIP archive.

Parameters:

Name	Type	Description	Default
`data` ¶	`dict[str, Any]`	The data to serialize. Must be a dictionary.	required
`extra_files` ¶	`dict[str, str \| bytes] \| None`	Optional additional files to include in the ZIP. Keys are filenames, values are file contents (str or bytes). If str, it will be encoded to bytes using UTF-8. If None, no extra files are added.	`None`

Returns:

Type	Description
`bytes`	The serialized ZIP archive as bytes.

loads `classmethod` ¶

loads(
    zip_bytes: bytes, index_file_name: str | None = None
) -> tuple[dict[str, Any], dict[str, bytes]]

Deserialize the ZIP archive back into its original data dictionary and extra files.

Parameters:

Name	Type	Description	Default
`zip_bytes` ¶	`bytes`	The ZIP archive as bytes.	required
`index_file_name` ¶	`str \| None`	The name of the root schema file in the ZIP. When not specified, defaults to the class constant. This should generally not be specified, but is exposed to allow for the ability to open old ZIP files if the class constant is changed.	`None`

Returns:

Type	Description
`tuple[dict[str, Any], dict[str, bytes]]`	A tuple containing: - The reconstructed data dictionary. - A dictionary of extra files, where keys are filenames and values are file contents as bytes.

Raises:

Type	Description
`MissingIndexFileError`	If the specified index file is missing from the ZIP archive.
`IndexFileMalformedError`	If the index file is malformed or missing required keys.

import_class_from_fully_qualified_name ¶

import_class_from_fully_qualified_name(
    fully_qualified_class_name: str,
) -> Any

Dynamically imports and returns a class or attribute from a fully qualified name.

Parameters:

Name	Type	Description	Default
`fully_qualified_class_name` ¶	`str`	The fully qualified name of the class or attribute to import, in the format 'module.submodule.ClassName' or 'module.submodule.ClassName.attribute'.	required

Returns:

Name	Type	Description
`Any`	`Any`	The imported class or attribute.

Raises:

Type	Description
`ValueError`	If the provided fully qualified name does not contain a module path.

serialization

IndexFileMalformedError ¶

MissingIndexFileError ¶

SchemaZIPSerializer ¶

REF_KEY `class-attribute` `instance-attribute` ¶

SCHEMA_FILENAME `class-attribute` `instance-attribute` ¶

init ¶

`mapping` ¶

dumps ¶

`data` ¶

`extra_files` ¶

loads `classmethod` ¶

`zip_bytes` ¶

`index_file_name` ¶

import_class_from_fully_qualified_name ¶

`fully_qualified_class_name` ¶

serialization

IndexFileMalformedError ¶

MissingIndexFileError ¶

SchemaZIPSerializer ¶

REF_KEY class-attribute instance-attribute ¶

SCHEMA_FILENAME class-attribute instance-attribute ¶

__init__ ¶

mapping ¶

dumps ¶

data ¶

extra_files ¶

loads classmethod ¶

zip_bytes ¶

index_file_name ¶

import_class_from_fully_qualified_name ¶

fully_qualified_class_name ¶

REF_KEY `class-attribute` `instance-attribute` ¶

SCHEMA_FILENAME `class-attribute` `instance-attribute` ¶

init ¶

`mapping` ¶

`data` ¶

`extra_files` ¶

loads `classmethod` ¶

`zip_bytes` ¶

`index_file_name` ¶

`fully_qualified_class_name` ¶