torchao
torchao integration modules.
Modules:
| Name | Description |
|---|---|
xnnpack_quantizer |
XNNPACK-compatible quantization for Stained Glass Transform modules using torchao. |
Classes:
| Name | Description |
|---|---|
StainedGlassTransformForTextQuantizer |
Quantizer for |
Functions:
| Name | Description |
|---|---|
build_embedding_config |
Build an XNNPACK-compatible embedding quantization config from dtype parameters. |
build_linear_config |
Build an XNNPACK-compatible linear quantization config from string parameters. |
quantize_module_ |
Quantize a module's embedding and linear layers in-place using torchao. |
Attributes:
| Name | Type | Description |
|---|---|---|
WeightDtype |
TypeAlias
|
Allowed weight dtype strings for XNNPACK-compatible quantization configs. |
WeightDtype
module-attribute
¶
Allowed weight dtype strings for XNNPACK-compatible quantization configs.
"int4" and "int8" resolve to the corresponding torch.dtype integer
types. Pass None to the build_*_config helpers to skip quantizing the
corresponding layer type.
StainedGlassTransformForTextQuantizer
¶
Quantizer for StainedGlassTransformForText modules.
Switch the transform to eval mode for quantization, then
delegate to quantize_module_ for the actual quantization.
Example::
>>> from stainedglass_core import transform as sg_transform
>>> from stainedglass_core.integrations import torchao as sg_torchao
>>> sgt = sg_transform.StainedGlassTransformForText.from_pretrained(
... "MODEL_ID", noise_layer_attention="sdpa"
... ) # doctest: +SKIP
>>> sgt = sg_torchao.StainedGlassTransformForTextQuantizer.quantize(sgt) # doctest: +SKIP
Added in version v3.22.0. Added API for quantizing StainedGlassTransformForText modules using torchao.
Warning
This API is experimental and subject to change: The StainedGlassTransformForTextQuantizer API is currently under development and is subject to change.
Methods:
| Name | Description |
|---|---|
quantize |
Quantize a |
quantize
classmethod
¶
quantize(sgt: StainedGlassTransformForText, *, embedding_config: IntxWeightOnlyConfig | _Unset | None = <stainedglass_core.integrations.torchao.xnnpack_quantizer._Unset object at 0x7f5b70b99940>, linear_config: Int8DynamicActivationIntxWeightConfig | _Unset | None = <stainedglass_core.integrations.torchao.xnnpack_quantizer._Unset object at 0x7f5b70b99940>, filter_fn: Callable[[Module, str], bool] | None = None) -> sg_text.StainedGlassTransformForText
Quantize a StainedGlassTransformForText in-place.
Quantize embedding and linear layers using XNNPACK-compatible torchao quantization schemes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
StainedGlassTransformForText
|
The |
required |
|
IntxWeightOnlyConfig | _Unset | None
|
Config for |
<stainedglass_core.integrations.torchao.xnnpack_quantizer._Unset object at 0x7f5b70b99940>
|
|
Int8DynamicActivationIntxWeightConfig | _Unset | None
|
Config for |
<stainedglass_core.integrations.torchao.xnnpack_quantizer._Unset object at 0x7f5b70b99940>
|
|
Callable[[Module, str], bool] | None
|
Optional filter function for linear quantization. |
None
|
Returns:
| Type | Description |
|---|---|
sg_text.StainedGlassTransformForText
|
The same |
sg_text.StainedGlassTransformForText
|
in-place. |
build_embedding_config
¶
build_embedding_config(
weight_dtype: WeightDtype | None = "int8",
) -> (
torchao.quantization.quant_api.IntxWeightOnlyConfig
| None
)
Build an XNNPACK-compatible embedding quantization config from dtype parameters.
Create an IntxWeightOnlyConfig with PerAxis(0) granularity suitable
for nn.Embedding layers in the XNNPACK backend.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
WeightDtype | None
|
Torch integer dtype name ( |
'int8'
|
Returns:
| Type | Description |
|---|---|
torchao.quantization.quant_api.IntxWeightOnlyConfig | None
|
An |
build_linear_config
¶
build_linear_config(
weight_dtype: WeightDtype | None = "int4",
weight_group_size: int = 32,
) -> (
torchao.quantization.quant_api.Int8DynamicActivationIntxWeightConfig
| None
)
Build an XNNPACK-compatible linear quantization config from string parameters.
Create an Int8DynamicActivationIntxWeightConfig with PerGroup
granularity suitable for nn.Linear layers in the XNNPACK backend.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
WeightDtype | None
|
Torch integer dtype name ( |
'int4'
|
|
int
|
Group size for |
32
|
Returns:
| Type | Description |
|---|---|
torchao.quantization.quant_api.Int8DynamicActivationIntxWeightConfig | None
|
An |
torchao.quantization.quant_api.Int8DynamicActivationIntxWeightConfig | None
|
weight_dtype is |
quantize_module_
¶
quantize_module_(module: Module, *, embedding_config: IntxWeightOnlyConfig | _Unset | None = <stainedglass_core.integrations.torchao.xnnpack_quantizer._Unset object at 0x7f5b70b99940>, linear_config: Int8DynamicActivationIntxWeightConfig | IntxWeightOnlyConfig | _Unset | None = <stainedglass_core.integrations.torchao.xnnpack_quantizer._Unset object at 0x7f5b70b99940>, filter_fn: Callable[[Module, str], bool] | None = None) -> None
Quantize a module's embedding and linear layers in-place using torchao.
Apply XNNPACK-compatible quantization schemes: IntxWeightOnlyConfig for
nn.Embedding layers and Int8DynamicActivationIntxWeightConfig for
nn.Linear layers.
The default linear filter function safely skips linear layers with None
weights, which occur in StainedGlassTransformForText when unused base
model layers are removed during serialization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
Module
|
The |
required |
|
IntxWeightOnlyConfig | _Unset | None
|
Config for |
<stainedglass_core.integrations.torchao.xnnpack_quantizer._Unset object at 0x7f5b70b99940>
|
|
Int8DynamicActivationIntxWeightConfig | IntxWeightOnlyConfig | _Unset | None
|
Config for |
<stainedglass_core.integrations.torchao.xnnpack_quantizer._Unset object at 0x7f5b70b99940>
|
|
Callable[[Module, str], bool] | None
|
Optional filter function for linear quantization. Receives
|
None
|
Example
from stainedglass_core.integrations import torchao as sg_torchao
sg_torchao.quantize_module_(model) # doctest: +SKIP sg_torchao.quantize_module_(model, linear_config=None) # doctest: +SKIP sg_torchao.quantize_module_(model, embedding_config=None) # doctest: +SKIP
Warning
This API is experimental and subject to change: The quantize_module_ API is currently under development and is subject to change.