Skip to content

serialization

Utilities for serializing and deserializing large, nested dictionaries.

Classes:

Name Description
IndexFileMalformedError

Exception raised when the index file is malformed or missing required keys.

MissingIndexFileError

Exception raised when the index file is missing in the ZIP archive.

SchemaZIPSerializer

Serialize and deserialize a large, nested dictionary into a ZIP archive

Functions:

Name Description
import_class_from_fully_qualified_name

Dynamically imports and returns a class or attribute from a fully qualified name.

IndexFileMalformedError

Bases: KeyError

Exception raised when the index file is malformed or missing required keys.

MissingIndexFileError

Bases: KeyError

Exception raised when the index file is missing in the ZIP archive.

SchemaZIPSerializer

Serialize and deserialize a large, nested dictionary into a ZIP archive (returned as bytes) based on a user-provided schema mapping of key paths to file templates. Supports dynamic per-key files via {key} in templates. Schema mapping keys are tuples representing the dictionary path, e.g. ('config', 'noise_tokenizer') Values are relative paths within the ZIP, which may include a {key} placeholder for dynamic subfiles. The ZIP will contain: - A root schema file (default: 'index.json') holding the mapping and a skeleton of the data with $ref placeholders. - One or more JSON files per the mapping. - Any additional files passed via extra_files Use dumps(data, extra_files=...) -> bytes to produce a ZIP, and loads(zip_bytes) -> (dict, extra_files) to reconstruct. Unmapped keys are included inline in the schema and preserved on deserialization.

Examples:

>>> data = {
...     "config": {
...         "settings": {"x": 1, "y": 2},
...         "noise_tokenizer": {"t1": {"a": 0.1}, "t2": {"a": 0.2}},
...     },
...     "users": {"alice": {"id": 1}, "bob": {"id": 2}},
...     "notes": {"misc": "inline"},
... }

Mappings must contain the root index file, and can contain any chain of nested keys. If a filename in the mapping contains a {key} placeholder, the serializer will create a separate file for each key in the dictionary at that path. (It is an error to use {key} in a mapping that does not point to a dictionary.) Mappings also do not need to be complete; any keys not in the mapping will be included inline in the skeleton in the index file.

>>> mapping = {
...     (): "index.json",
...     ("config", "settings"): "config/settings.json",
...     ("config", "noise_tokenizer"): "noise/{key}.json",
...     ("users",): "users.json",
... }
>>> extra_files = {"README.txt": "This is a test ZIP."}
>>> serializer = SchemaZIPSerializer(mapping)
>>> zip_bytes = serializer.dumps(data, extra_files=extra_files)
>>> isinstance(zip_bytes, bytes)
True
>>> restored, extras = SchemaZIPSerializer.loads(zip_bytes)
>>> restored == data
True
>>> extras["README.txt"].decode() == "This is a test ZIP."
True

Inspecting ZIP contents:

>>> zf = zipfile.ZipFile(io.BytesIO(zip_bytes))
>>> sorted(zf.namelist())
['README.txt', 'config/settings.json', 'index.json', 'noise/t1.json', 'noise/t2.json', 'users.json']

Deserialization still works regardless of the mapping used:

>>> mapping2 = {
...     (): "index.json",
... }
>>> zip_bytes_2 = SchemaZIPSerializer(mapping2).dumps(data)
>>> data2, _ = SchemaZIPSerializer.loads(zip_bytes_2)
>>> data2 == data
True

Added in version v0.144.0.

Methods:

Name Description
__init__

Initialize the serializer with a mapping of paths to filename templates.

dumps

Serialize the data into a ZIP archive.

loads

Deserialize the ZIP archive back into its original data dictionary and extra files.

Attributes:

Name Type Description
REF_KEY Final[str]

Key used to indicate a reference to a file in the ZIP archive. This generally should not be changed.

SCHEMA_FILENAME Final[str]

Default filename for the root schema file in the ZIP. This generally should not be changed.

REF_KEY class-attribute instance-attribute

REF_KEY: Final[str] = '$ref'

Key used to indicate a reference to a file in the ZIP archive. This generally should not be changed.

SCHEMA_FILENAME class-attribute instance-attribute

SCHEMA_FILENAME: Final[str] = 'index.json'

Default filename for the root schema file in the ZIP. This generally should not be changed.

__init__

__init__(mapping: Mapping[tuple[str, ...], str]) -> None

Initialize the serializer with a mapping of paths to filename templates.

Parameters:

Name Type Description Default

mapping

Mapping[tuple[str, ...], str]

A dictionary mapping tuples of strings (representing paths in the nested dictionary) to filename templates. The templates can include a {key} placeholder for dynamic subfiles, where each key in the subdictionary will get its own file, with the key replacing {key} in the filename.

required

dumps

dumps(
    data: dict[str, Any],
    extra_files: dict[str, str | bytes] | None = None,
) -> bytes

Serialize the data into a ZIP archive.

Parameters:

Name Type Description Default

data

dict[str, Any]

The data to serialize. Must be a dictionary.

required

extra_files

dict[str, str | bytes] | None

Optional additional files to include in the ZIP. Keys are filenames, values are file contents (str or bytes). If str, it will be encoded to bytes using UTF-8. If None, no extra files are added.

None

Returns:

Type Description
bytes

The serialized ZIP archive as bytes.

loads classmethod

loads(
    zip_bytes: bytes, index_file_name: str | None = None
) -> tuple[dict[str, Any], dict[str, bytes]]

Deserialize the ZIP archive back into its original data dictionary and extra files.

Parameters:

Name Type Description Default

zip_bytes

bytes

The ZIP archive as bytes.

required

index_file_name

str | None

The name of the root schema file in the ZIP. When not specified, defaults to the class constant. This should generally not be specified, but is exposed to allow for the ability to open old ZIP files if the class constant is changed.

None

Returns:

Type Description
tuple[dict[str, Any], dict[str, bytes]]

A tuple containing: - The reconstructed data dictionary. - A dictionary of extra files, where keys are filenames and values are file contents as bytes.

Raises:

Type Description
MissingIndexFileError

If the specified index file is missing from the ZIP archive.

IndexFileMalformedError

If the index file is malformed or missing required keys.

import_class_from_fully_qualified_name

import_class_from_fully_qualified_name(
    fully_qualified_class_name: str,
) -> Any

Dynamically imports and returns a class or attribute from a fully qualified name.

Parameters:

Name Type Description Default

fully_qualified_class_name

str

The fully qualified name of the class or attribute to import, in the format 'module.submodule.ClassName' or 'module.submodule.ClassName.attribute'.

required

Returns:

Name Type Description
Any Any

The imported class or attribute.

Raises:

Type Description
ValueError

If the provided fully qualified name does not contain a module path.