Skip to content

serialization

Utilities for serializing and deserializing large, nested dictionaries.

Classes:

Name Description
IndexFileMalformedError

Exception raised when the index file is malformed or missing required keys.

MissingIndexFileError

Exception raised when the index file is missing in the ZIP archive.

SchemaZIPSerializer

Serialize and deserialize a large, nested dictionary into a ZIP archive

Functions:

Name Description
filter_keys_from_dict_missing_in_signature

Filter keys from a dictionary that are not in the signature of a given function.

get_fully_qualified_class_name_for_import

Get the fully qualified name of a class, including its module and any containing classes.

import_class_from_fully_qualified_name

Dynamically imports and returns a class or attribute from a fully qualified name.

IndexFileMalformedError

Bases: KeyError

Exception raised when the index file is malformed or missing required keys.

MissingIndexFileError

Bases: KeyError

Exception raised when the index file is missing in the ZIP archive.

SchemaZIPSerializer

Serialize and deserialize a large, nested dictionary into a ZIP archive (returned as bytes) based on a user-provided schema mapping of key paths to file templates. Supports dynamic per-key files via {key} in templates. Schema mapping keys are tuples representing the dictionary path, e.g. ('config', 'noise_tokenizer') Values are relative paths within the ZIP, which may include a {key} placeholder for dynamic subfiles. The ZIP will contain: - A root schema file (default: 'index.json') holding the mapping and a skeleton of the data with $ref placeholders. - One or more JSON files per the mapping. - Any additional files passed via extra_files Use dumps(data, extra_files=...) -> bytes to produce a ZIP, and loads(zip_bytes) -> (dict, extra_files) to reconstruct. Unmapped keys are included inline in the schema and preserved on deserialization.

Examples:

>>> data = {
...     "config": {
...         "settings": {"x": 1, "y": 2},
...         "noise_tokenizer": {"t1": {"a": 0.1}, "t2": {"a": 0.2}},
...     },
...     "users": {"alice": {"id": 1}, "bob": {"id": 2}},
...     "notes": {"misc": "inline"},
... }

Mappings must contain the root index file, and can contain any chain of nested keys. If a filename in the mapping contains a {key} placeholder, the serializer will create a separate file for each key in the dictionary at that path. (It is an error to use {key} in a mapping that does not point to a dictionary.) Mappings also do not need to be complete; any keys not in the mapping will be included inline in the skeleton in the index file.

>>> mapping = {
...     (): "index.json",
...     ("config", "settings"): "config/settings.json",
...     ("config", "noise_tokenizer"): "noise/{key}.json",
...     ("users",): "users.json",
... }
>>> extra_files = {"README.txt": "This is a test ZIP."}
>>> serializer = SchemaZIPSerializer(mapping, zipfile.ZIP_DEFLATED)
>>> zip_bytes = serializer.dumps(data, extra_files=extra_files)
>>> isinstance(zip_bytes, bytes)
True
>>> restored, extras = SchemaZIPSerializer.loads(zip_bytes)
>>> restored == data
True
>>> extras["README.txt"].decode() == "This is a test ZIP."
True

Inspecting ZIP contents:

>>> zf = zipfile.ZipFile(io.BytesIO(zip_bytes))
>>> sorted(zf.namelist())
['README.txt', 'config/settings.json', 'index.json', 'noise/t1.json', 'noise/t2.json', 'users.json']

Deserialization still works regardless of the mapping used:

>>> mapping2 = {
...     (): "index.json",
... }
>>> zip_bytes_2 = SchemaZIPSerializer(
...     mapping2, compression=zipfile.ZIP_DEFLATED
... ).dumps(data)
>>> data2, _ = SchemaZIPSerializer.loads(zip_bytes_2)
>>> data2 == data
True

Methods:

Name Description
__init__

Initialize the serializer with a mapping of paths to filename templates.

dumps

Serialize the data into a ZIP archive.

loads

Deserialize the ZIP archive back into its original data dictionary and extra files.

Attributes:

Name Type Description
REF_KEY Final[str]

Key used to indicate a reference to a file in the ZIP archive. This generally should not be changed.

SCHEMA_FILENAME Final[str]

Default filename for the root schema file in the ZIP. This generally should not be changed.

REF_KEY class-attribute instance-attribute

REF_KEY: Final[str] = '$ref'

Key used to indicate a reference to a file in the ZIP archive. This generally should not be changed.

SCHEMA_FILENAME class-attribute instance-attribute

SCHEMA_FILENAME: Final[str] = 'index.json'

Default filename for the root schema file in the ZIP. This generally should not be changed.

__init__

__init__(
    mapping: SerializationSchemaMapping, compression: int
) -> None

Initialize the serializer with a mapping of paths to filename templates.

Parameters:

Name Type Description Default

mapping

SerializationSchemaMapping

A dictionary mapping tuples of strings (representing paths in the nested dictionary) to filename templates. The templates can include a {key} placeholder for dynamic subfiles, where each key in the subdictionary will get its own file, with the key replacing {key} in the filename.

required

compression

int

The compression method to use for the ZIP file. Usually represented by zipfile.ZIP_DEFLATED or zipfile.ZIP_STORED.

required

dumps

dumps(
    data: dict[str, Any],
    extra_files: dict[str, str | bytes] | None = None,
) -> bytes

Serialize the data into a ZIP archive.

Parameters:

Name Type Description Default

data

dict[str, Any]

The data to serialize. Must be a dictionary.

required

extra_files

dict[str, str | bytes] | None

Optional additional files to include in the ZIP. Keys are filenames, values are file contents (str or bytes). If str, it will be encoded to bytes using UTF-8. If None, no extra files are added.

None

Returns:

Type Description
bytes

The serialized ZIP archive as bytes.

loads classmethod

loads(
    zip_bytes: bytes, index_file_name: str | None = None
) -> tuple[dict[str, Any], dict[str, bytes]]

Deserialize the ZIP archive back into its original data dictionary and extra files.

Parameters:

Name Type Description Default

zip_bytes

bytes

The ZIP archive as bytes.

required

index_file_name

str | None

The name of the root schema file in the ZIP. When not specified, defaults to the class constant. This should generally not be specified, but is exposed to allow for the ability to open old ZIP files if the class constant is changed.

None

Returns:

Type Description
tuple[dict[str, Any], dict[str, bytes]]

A tuple containing: - The reconstructed data dictionary. - A dictionary of extra files, where keys are filenames and values are file contents as bytes.

Raises:

Type Description
MissingIndexFileError

If the specified index file is missing from the ZIP archive.

IndexFileMalformedError

If the index file is malformed or missing required keys.

filter_keys_from_dict_missing_in_signature

filter_keys_from_dict_missing_in_signature(
    state: Mapping[str, Any], func: Callable[..., Any]
) -> dict[str, Any]

Filter keys from a dictionary that are not in the signature of a given function.

This is useful for cleaning up state dictionaries before passing them to a function that may not expect all the keys.

If func accepts a **kwargs parameter (VAR_KEYWORD), all keys are considered valid and a shallow copy of state is returned without any filtering or warnings.

Parameters:

Name Type Description Default

state

Mapping[str, Any]

The original dictionary containing the state.

required

func

Callable[..., Any]

The function whose signature will be used to filter the keys.

required

Returns:

Type Description
dict[str, Any]

A new dictionary containing only the keys that are present in the function's signature, or a shallow copy of

dict[str, Any]

all keys when the function accepts **kwargs.

get_fully_qualified_class_name_for_import

get_fully_qualified_class_name_for_import(
    cls: type[Any],
) -> str

Get the fully qualified name of a class, including its module and any containing classes.

Parameters:

Name Type Description Default

cls

type[Any]

The class to get the fully qualified name for.

required

Examples:

>>> type_str = get_fully_qualified_class_name_for_import(SchemaZIPSerializer)
>>> print(type_str)
stainedglass_core.utils.serialization.SchemaZIPSerializer
>>> imported_type = import_class_from_fully_qualified_name(type_str)
>>> print(imported_type)
<class 'stainedglass_core.utils.serialization.SchemaZIPSerializer'>

Returns:

Type Description
str

The fully qualified name of the class, in the format 'module.submodule.ClassName' or 'module.submodule.OuterClass.InnerClass'.

import_class_from_fully_qualified_name

import_class_from_fully_qualified_name(
    fully_qualified_class_name: str,
) -> Any

Dynamically imports and returns a class or attribute from a fully qualified name.

Parameters:

Name Type Description Default

fully_qualified_class_name

str

The fully qualified name of the class or attribute to import, in the format 'module.submodule.ClassName' or 'module.submodule.ClassName.attribute'.

required

Returns:

Name Type Description
Any Any

The imported class or attribute.

Raises:

Type Description
ValueError

If the provided fully qualified name does not contain a module path.