Computing the Voronoi l2 Distance of a LLM's Embedding Table¶

The inverse power law geometric catalyst loss uses a parameter inverse_l2_power_law_characteristic_length which is recommended to be set as the 5th percentile of the Voronoi l2 distances between embeddings. This notebook provides an example of computing this value for the Llama 3.2 1B Instruction model.

Import necessary modules¶

In [1]:

Copied!

from __future__ import annotations

from typing import Final

import torch
import transformers
from torch import nn

from stainedglass_core import utils
from __future__ import annotations

from typing import Final

import torch
import transformers
from torch import nn

from stainedglass_core import utils

/home/kyle/.conda/envs/sgc/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Load the LLM and set the percentile to evaluate.¶

In [ ]:

Copied!





PERCENTILE: Final[float] = 0.05  # Fifth percentile
PRETRAINED_MODEL_NAME_OR_PATH: Final[str] = (
    "/models/meta-llama/Llama-3.2-1B-Instruct"
)
PERCENTILE: Final[float] = 0.05  # Fifth percentile
PRETRAINED_MODEL_NAME_OR_PATH: Final[str] = (
    "/models/meta-llama/Llama-3.2-1B-Instruct"
)

In [ ]:

Copied!

model = transformers.AutoModelForCausalLM.from_pretrained(
    PRETRAINED_MODEL_NAME_OR_PATH
)
model = transformers.AutoModelForCausalLM.from_pretrained(
    PRETRAINED_MODEL_NAME_OR_PATH
)

Improve memory efficiency¶

Because the script only needs the embedding table of the LLM, we can create a copy of the embedding table's weight matrix and then delete the references to the embedding table and the model. This saves significant GPU memory, but is not formally required to compute the Voronoi l2 distance.

We also infer the device to use in the subsequent cell.

In [3]:

Copied!





DEVICE: torch.device
if torch.cuda.is_available():
    DEVICE = torch.device("cuda")
elif torch.backends.mps.is_available():
    DEVICE = torch.device("mps")
else:
    DEVICE = torch.device("cpu")

embeddings_table = model.get_input_embeddings()
assert isinstance(embeddings_table, nn.Embedding)
embeddings_table_matrix = embeddings_table.weight.detach().clone().to(DEVICE)
del embeddings_table
del model
DEVICE: torch.device
if torch.cuda.is_available():
    DEVICE = torch.device("cuda")
elif torch.backends.mps.is_available():
    DEVICE = torch.device("mps")
else:
    DEVICE = torch.device("cpu")

embeddings_table = model.get_input_embeddings()
assert isinstance(embeddings_table, nn.Embedding)
embeddings_table_matrix = embeddings_table.weight.detach().clone().to(DEVICE)
del embeddings_table
del model

Compute the Voronoi l2 distance from the LLM's embedding table weight¶

In [ ]:

Copied!





voronoi_l2_distance_percentile = (
    utils.voronoi.compute_voronoi_l2_distance_percentile(
        embeddings_table_matrix, PERCENTILE
    )
)
print(
    f"The {PERCENTILE} percentile of the Voronoi l2 distance is {voronoi_l2_distance_percentile.item()}"
)
voronoi_l2_distance_percentile = (
    utils.voronoi.compute_voronoi_l2_distance_percentile(
        embeddings_table_matrix, PERCENTILE
    )
)
print(
    f"The {PERCENTILE} percentile of the Voronoi l2 distance is {voronoi_l2_distance_percentile.item()}"
)

100%|██████████| 128255/128255 [04:46<00:00, 447.32it/s]

The 0.05 percentile of the Voronoi l2 distance is 0.517906665802002