serialization

Utilities for serializing and deserializing large, nested dictionaries.

Classes:

Name	Description
`IndexFileMalformedError`	Exception raised when the index file is malformed or missing required keys.
`MissingIndexFileError`	Exception raised when the index file is missing in the ZIP archive.
`SchemaZIPSerializer`	Serialize and deserialize a large, nested dictionary into a ZIP archive

Functions:

Name	Description
`filter_keys_from_dict_missing_in_signature`	Filter keys from a dictionary that are not in the signature of a given function.
`get_fully_qualified_class_name_for_import`	Get the fully qualified name of a class, including its module and any containing classes.
`import_class_from_fully_qualified_name`	Dynamically imports and returns a class or attribute from a fully qualified name.

IndexFileMalformedError ¶

Bases: KeyError

Exception raised when the index file is malformed or missing required keys.

MissingIndexFileError ¶

Bases: KeyError

Exception raised when the index file is missing in the ZIP archive.

SchemaZIPSerializer ¶

Serialize and deserialize a large, nested dictionary into a ZIP archive (returned as bytes) based on a user-provided schema mapping of key paths to file templates. Supports dynamic per-key files via {key} in templates. Schema mapping keys are tuples representing the dictionary path, e.g. ('config', 'noise_tokenizer') Values are relative paths within the ZIP, which may include a {key} placeholder for dynamic subfiles. The ZIP will contain: - A root schema file (default: 'index.json') holding the mapping and a skeleton of the data with $ref placeholders. - One or more JSON files per the mapping. - Any additional files passed via extra_files Use dumps(data, extra_files=...) -> bytes to produce a ZIP, and loads(zip_bytes) -> (dict, extra_files) to reconstruct. Unmapped keys are included inline in the schema and preserved on deserialization.

Examples:

>>> data = {
...     "config": {
...         "settings": {"x": 1, "y": 2},
...         "noise_tokenizer": {"t1": {"a": 0.1}, "t2": {"a": 0.2}},
...     },
...     "users": {"alice": {"id": 1}, "bob": {"id": 2}},
...     "notes": {"misc": "inline"},
... }

Mappings must contain the root index file, and can contain any chain of nested keys. If a filename in the mapping contains a {key} placeholder, the serializer will create a separate file for each key in the dictionary at that path. (It is an error to use {key} in a mapping that does not point to a dictionary.) Mappings also do not need to be complete; any keys not in the mapping will be included inline in the skeleton in the index file.

>>> mapping = {
...     (): "index.json",
...     ("config", "settings"): "config/settings.json",
...     ("config", "noise_tokenizer"): "noise/{key}.json",
...     ("users",): "users.json",
... }
>>> extra_files = {"README.txt": "This is a test ZIP."}
>>> serializer = SchemaZIPSerializer(mapping, zipfile.ZIP_DEFLATED)
>>> zip_bytes = serializer.dumps(data, extra_files=extra_files)
>>> isinstance(zip_bytes, bytes)
True
>>> restored, extras = SchemaZIPSerializer.loads(zip_bytes)
>>> restored == data
True
>>> extras["README.txt"].decode() == "This is a test ZIP."
True

Inspecting ZIP contents:

>>> zf = zipfile.ZipFile(io.BytesIO(zip_bytes))
>>> sorted(zf.namelist())
['README.txt', 'config/settings.json', 'index.json', 'noise/t1.json', 'noise/t2.json', 'users.json']

Deserialization still works regardless of the mapping used:

>>> mapping2 = {
...     (): "index.json",
... }
>>> zip_bytes_2 = SchemaZIPSerializer(
...     mapping2, compression=zipfile.ZIP_DEFLATED
... ).dumps(data)
>>> data2, _ = SchemaZIPSerializer.loads(zip_bytes_2)
>>> data2 == data
True

Methods:

Name	Description
`__init__`	Initialize the serializer with a mapping of paths to filename templates.
`dumps`	Serialize the data into a ZIP archive.
`loads`	Deserialize the ZIP archive back into its original data dictionary and extra files.

Attributes:

Name	Type	Description
`REF_KEY`	`Final[str]`	Key used to indicate a reference to a file in the ZIP archive. This generally should not be changed.
`SCHEMA_FILENAME`	`Final[str]`	Default filename for the root schema file in the ZIP. This generally should not be changed.

REF_KEY `class-attribute` `instance-attribute` ¶

REF_KEY: Final[str] = '$ref'

Key used to indicate a reference to a file in the ZIP archive. This generally should not be changed.

SCHEMA_FILENAME `class-attribute` `instance-attribute` ¶

SCHEMA_FILENAME: Final[str] = 'index.json'

Default filename for the root schema file in the ZIP. This generally should not be changed.

init ¶

__init__(
    mapping: SerializationSchemaMapping, compression: int
) -> None

Initialize the serializer with a mapping of paths to filename templates.

Parameters:

Name	Type	Description	Default
`mapping` ¶	`SerializationSchemaMapping`	A dictionary mapping tuples of strings (representing paths in the nested dictionary) to filename templates. The templates can include a `{key}` placeholder for dynamic subfiles, where each key in the subdictionary will get its own file, with the key replacing `{key}` in the filename.	required
`compression` ¶	`int`	The compression method to use for the ZIP file. Usually represented by zipfile.ZIP_DEFLATED or zipfile.ZIP_STORED.	required

dumps ¶

dumps(
    data: dict[str, Any],
    extra_files: dict[str, str | bytes] | None = None,
) -> bytes

Serialize the data into a ZIP archive.

Parameters:

Name	Type	Description	Default
`data` ¶	`dict[str, Any]`	The data to serialize. Must be a dictionary.	required
`extra_files` ¶	`dict[str, str \| bytes] \| None`	Optional additional files to include in the ZIP. Keys are filenames, values are file contents (str or bytes). If str, it will be encoded to bytes using UTF-8. If None, no extra files are added.	`None`

Returns:

Type	Description
`bytes`	The serialized ZIP archive as bytes.

loads `classmethod` ¶

loads(
    zip_bytes: bytes, index_file_name: str | None = None
) -> tuple[dict[str, Any], dict[str, bytes]]

Deserialize the ZIP archive back into its original data dictionary and extra files.

Parameters:

Name	Type	Description	Default
`zip_bytes` ¶	`bytes`	The ZIP archive as bytes.	required
`index_file_name` ¶	`str \| None`	The name of the root schema file in the ZIP. When not specified, defaults to the class constant. This should generally not be specified, but is exposed to allow for the ability to open old ZIP files if the class constant is changed.	`None`

Returns:

Type	Description
`tuple[dict[str, Any], dict[str, bytes]]`	A tuple containing: - The reconstructed data dictionary. - A dictionary of extra files, where keys are filenames and values are file contents as bytes.

Raises:

Type	Description
`MissingIndexFileError`	If the specified index file is missing from the ZIP archive.
`IndexFileMalformedError`	If the index file is malformed or missing required keys.

filter_keys_from_dict_missing_in_signature ¶

filter_keys_from_dict_missing_in_signature(
    state: Mapping[str, Any], func: Callable[..., Any]
) -> dict[str, Any]

Filter keys from a dictionary that are not in the signature of a given function.

This is useful for cleaning up state dictionaries before passing them to a function that may not expect all the keys.

If func accepts a **kwargs parameter (VAR_KEYWORD), all keys are considered valid and a shallow copy of state is returned without any filtering or warnings.

Parameters:

Name	Type	Description	Default
`state` ¶	`Mapping[str, Any]`	The original dictionary containing the state.	required
`func` ¶	`Callable[..., Any]`	The function whose signature will be used to filter the keys.	required

Returns:

Type	Description
`dict[str, Any]`	A new dictionary containing only the keys that are present in the function's signature, or a shallow copy of
`dict[str, Any]`	all keys when the function accepts `**kwargs`.

get_fully_qualified_class_name_for_import ¶

get_fully_qualified_class_name_for_import(
    cls: type[Any],
) -> str

Get the fully qualified name of a class, including its module and any containing classes.

Parameters:

Name	Type	Description	Default
`cls` ¶	`type[Any]`	The class to get the fully qualified name for.	required

Examples:

>>> type_str = get_fully_qualified_class_name_for_import(SchemaZIPSerializer)
>>> print(type_str)
stainedglass_core.utils.serialization.SchemaZIPSerializer

>>> imported_type = import_class_from_fully_qualified_name(type_str)
>>> print(imported_type)
<class 'stainedglass_core.utils.serialization.SchemaZIPSerializer'>

Returns:

Type	Description
`str`	The fully qualified name of the class, in the format 'module.submodule.ClassName' or 'module.submodule.OuterClass.InnerClass'.

import_class_from_fully_qualified_name ¶

import_class_from_fully_qualified_name(
    fully_qualified_class_name: str,
) -> Any

Dynamically imports and returns a class or attribute from a fully qualified name.

Parameters:

Name	Type	Description	Default
`fully_qualified_class_name` ¶	`str`	The fully qualified name of the class or attribute to import, in the format 'module.submodule.ClassName' or 'module.submodule.ClassName.attribute'.	required

Returns:

Name	Type	Description
`Any`	`Any`	The imported class or attribute.

Raises:

Type	Description
`ValueError`	If the provided fully qualified name does not contain a module path.

serialization

IndexFileMalformedError ¶

MissingIndexFileError ¶

SchemaZIPSerializer ¶

REF_KEY `class-attribute` `instance-attribute` ¶

SCHEMA_FILENAME `class-attribute` `instance-attribute` ¶

init ¶

`mapping` ¶

`compression` ¶

dumps ¶

`data` ¶

`extra_files` ¶

loads `classmethod` ¶

`zip_bytes` ¶

`index_file_name` ¶

filter_keys_from_dict_missing_in_signature ¶

`state` ¶

`func` ¶

get_fully_qualified_class_name_for_import ¶

`cls` ¶

import_class_from_fully_qualified_name ¶

`fully_qualified_class_name` ¶

serialization

IndexFileMalformedError ¶

MissingIndexFileError ¶

SchemaZIPSerializer ¶

REF_KEY class-attribute instance-attribute ¶

SCHEMA_FILENAME class-attribute instance-attribute ¶

__init__ ¶

mapping ¶

compression ¶

dumps ¶

data ¶

extra_files ¶

loads classmethod ¶

zip_bytes ¶

index_file_name ¶

filter_keys_from_dict_missing_in_signature ¶

state ¶

func ¶

get_fully_qualified_class_name_for_import ¶

cls ¶

import_class_from_fully_qualified_name ¶

fully_qualified_class_name ¶

REF_KEY `class-attribute` `instance-attribute` ¶

SCHEMA_FILENAME `class-attribute` `instance-attribute` ¶

init ¶

`mapping` ¶

`compression` ¶

`data` ¶

`extra_files` ¶

loads `classmethod` ¶

`zip_bytes` ¶

`index_file_name` ¶

`state` ¶

`func` ¶

`cls` ¶

`fully_qualified_class_name` ¶