serialization
Utilities for serializing and deserializing large, nested dictionaries.
Classes:
Name | Description |
---|---|
IndexFileMalformedError |
Exception raised when the index file is malformed or missing required keys. |
MissingIndexFileError |
Exception raised when the index file is missing in the ZIP archive. |
SchemaZIPSerializer |
Serialize and deserialize a large, nested dictionary into a ZIP archive |
Functions:
Name | Description |
---|---|
import_class_from_fully_qualified_name |
Dynamically imports and returns a class or attribute from a fully qualified name. |
IndexFileMalformedError
¶
Bases: KeyError
Exception raised when the index file is malformed or missing required keys.
MissingIndexFileError
¶
Bases: KeyError
Exception raised when the index file is missing in the ZIP archive.
SchemaZIPSerializer
¶
Serialize and deserialize a large, nested dictionary into a ZIP archive
(returned as bytes) based on a user-provided schema mapping of key paths
to file templates. Supports dynamic per-key files via {key}
in templates.
Schema mapping keys are tuples representing the dictionary path, e.g.
('config', 'noise_tokenizer')
Values are relative paths within the ZIP, which may include a {key}
placeholder
for dynamic subfiles.
The ZIP will contain:
- A root schema file (default: 'index.json') holding the mapping and a
skeleton of the data with $ref
placeholders.
- One or more JSON files per the mapping.
- Any additional files passed via extra_files
Use dumps(data, extra_files=...) -> bytes
to produce a ZIP, and loads(zip_bytes) -> (dict, extra_files)
to reconstruct.
Unmapped keys are included inline in the schema and preserved on deserialization.
Examples:
>>> data = {
... "config": {
... "settings": {"x": 1, "y": 2},
... "noise_tokenizer": {"t1": {"a": 0.1}, "t2": {"a": 0.2}},
... },
... "users": {"alice": {"id": 1}, "bob": {"id": 2}},
... "notes": {"misc": "inline"},
... }
Mappings must contain the root index file, and can contain any chain of
nested keys. If a filename in the mapping contains a {key}
placeholder, the serializer will create a separate file for each key
in the dictionary at that path. (It is an error to use {key}
in a
mapping that does not point to a dictionary.) Mappings also do not
need to be complete; any keys not in the mapping will be included
inline in the skeleton in the index file.
>>> mapping = {
... (): "index.json",
... ("config", "settings"): "config/settings.json",
... ("config", "noise_tokenizer"): "noise/{key}.json",
... ("users",): "users.json",
... }
>>> extra_files = {"README.txt": "This is a test ZIP."}
>>> serializer = SchemaZIPSerializer(mapping)
>>> zip_bytes = serializer.dumps(data, extra_files=extra_files)
>>> isinstance(zip_bytes, bytes)
True
>>> restored, extras = SchemaZIPSerializer.loads(zip_bytes)
>>> restored == data
True
>>> extras["README.txt"].decode() == "This is a test ZIP."
True
Inspecting ZIP contents:
>>> zf = zipfile.ZipFile(io.BytesIO(zip_bytes))
>>> sorted(zf.namelist())
['README.txt', 'config/settings.json', 'index.json', 'noise/t1.json', 'noise/t2.json', 'users.json']
Deserialization still works regardless of the mapping used:
>>> mapping2 = {
... (): "index.json",
... }
>>> zip_bytes_2 = SchemaZIPSerializer(mapping2).dumps(data)
>>> data2, _ = SchemaZIPSerializer.loads(zip_bytes_2)
>>> data2 == data
True
Added in version v0.144.0.
Methods:
Name | Description |
---|---|
__init__ |
Initialize the serializer with a mapping of paths to filename templates. |
dumps |
Serialize the data into a ZIP archive. |
loads |
Deserialize the ZIP archive back into its original data dictionary and extra files. |
Attributes:
Name | Type | Description |
---|---|---|
REF_KEY |
Final[str]
|
Key used to indicate a reference to a file in the ZIP archive. This generally should not be changed. |
SCHEMA_FILENAME |
Final[str]
|
Default filename for the root schema file in the ZIP. This generally should not be changed. |
REF_KEY
class-attribute
instance-attribute
¶
Key used to indicate a reference to a file in the ZIP archive. This generally should not be changed.
SCHEMA_FILENAME
class-attribute
instance-attribute
¶
Default filename for the root schema file in the ZIP. This generally should not be changed.
__init__
¶
Initialize the serializer with a mapping of paths to filename templates.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
Mapping[tuple[str, ...], str]
|
A dictionary mapping tuples of strings (representing paths in the nested dictionary) to filename templates. The templates
can include a |
required |
dumps
¶
Serialize the data into a ZIP archive.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
dict[str, Any]
|
The data to serialize. Must be a dictionary. |
required |
|
dict[str, str | bytes] | None
|
Optional additional files to include in the ZIP. Keys are filenames, values are file contents (str or bytes). If str, it will be encoded to bytes using UTF-8. If None, no extra files are added. |
None
|
Returns:
Type | Description |
---|---|
bytes
|
The serialized ZIP archive as bytes. |
loads
classmethod
¶
loads(
zip_bytes: bytes, index_file_name: str | None = None
) -> tuple[dict[str, Any], dict[str, bytes]]
Deserialize the ZIP archive back into its original data dictionary and extra files.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
bytes
|
The ZIP archive as bytes. |
required |
|
str | None
|
The name of the root schema file in the ZIP. When not specified, defaults to the class constant. This should generally not be specified, but is exposed to allow for the ability to open old ZIP files if the class constant is changed. |
None
|
Returns:
Type | Description |
---|---|
tuple[dict[str, Any], dict[str, bytes]]
|
A tuple containing: - The reconstructed data dictionary. - A dictionary of extra files, where keys are filenames and values are file contents as bytes. |
Raises:
Type | Description |
---|---|
MissingIndexFileError
|
If the specified index file is missing from the ZIP archive. |
IndexFileMalformedError
|
If the index file is malformed or missing required keys. |
import_class_from_fully_qualified_name
¶
import_class_from_fully_qualified_name(
fully_qualified_class_name: str,
) -> Any
Dynamically imports and returns a class or attribute from a fully qualified name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str
|
The fully qualified name of the class or attribute to import, in the format 'module.submodule.ClassName' or 'module.submodule.ClassName.attribute'. |
required |
Returns:
Name | Type | Description |
---|---|---|
Any |
Any
|
The imported class or attribute. |
Raises:
Type | Description |
---|---|
ValueError
|
If the provided fully qualified name does not contain a module path. |