Skip to content

reconstruction

Utilities for measuring the reconstruction of (transformed) embeddings to tokens.

Functions:

Name Description
reconstruction_rank

Compute the reconstruction rank for each element of input_ids from its corresponding element in ranked_neighbors.

reconstruction_rank_histogram

Count the number of observations of each reconstruction rank.

symmetric_tokens_transformed_at_k

Measure if each reconstruction rank is outside the top-k and bottom-k ranks.

tokens_transformed_at_k

Measure if each reconstruction rank is outside the top-k ranks.

reconstruction_rank

reconstruction_rank(
    input_ids: Tensor, ranked_neighbors: Tensor
) -> torch.Tensor

Compute the reconstruction rank for each element of input_ids from its corresponding element in ranked_neighbors.

Parameters:

Name Type Description Default

input_ids

Tensor

The ground truth clean indices of the embeddings of shape (batch_size, sequence_length, 1).

required

ranked_neighbors

Tensor

A ranked list of which clean embeddings were closest to each transformed embedding of shape (batch_size, sequence_length, num_embeddings).

required

Returns:

Type Description
torch.Tensor

The reconstruction rank for each element of input_ids of shape (batch_size, sequence_length).

Raises:

Type Description
ValueError

If input_ids's shape is not 3D with a final dimension of 1.

Added in version v3.36.0.

reconstruction_rank_histogram

reconstruction_rank_histogram(
    num_embeddings: int, reconstruction_ranks: Tensor
) -> torch.Tensor

Count the number of observations of each reconstruction rank.

Parameters:

Name Type Description Default

num_embeddings

int

The number of embeddings in the model.

required

reconstruction_ranks

Tensor

A tensor of reconstruction ranks consisting of integers from 0 to num_embeddings - 1 representing the number of clean embeddings closer (by some metric) to each transformed embedding than its corresponding clean embedding.

required

Returns:

Type Description
torch.Tensor

A 1-D tensor of shape (num_embeddings, ) of the number of observations of each reconstruction rank.

Added in version v3.36.0.

symmetric_tokens_transformed_at_k

symmetric_tokens_transformed_at_k(
    num_embeddings: int,
    reconstruction_ranks: Tensor,
    k: int,
) -> torch.Tensor

Measure if each reconstruction rank is outside the top-k and bottom-k ranks.

Parameters:

Name Type Description Default

num_embeddings

int

The number of embeddings in the model.

required

reconstruction_ranks

Tensor

A tensor of reconstruction ranks consisting of integers from 0 to num_embeddings - 1 representing the number of clean embeddings closer (by some metric) to each transformed embedding than its corresponding clean embedding.

required

k

int

The cutoff rank to be considered "transformed". num_embeddings - k is the cutoff rank for the bottom-k ranks. Should be at least 1 and at most ceil(num_embeddings / 2).

required

Returns:

Type Description
torch.Tensor

A boolean tensor of whether each reconstruction rank is >= k and < num_embeddings - k.

Added in version v3.36.0.

tokens_transformed_at_k

tokens_transformed_at_k(
    reconstruction_ranks: Tensor, k: int
) -> torch.Tensor

Measure if each reconstruction rank is outside the top-k ranks.

Parameters:

Name Type Description Default

reconstruction_ranks

Tensor

A tensor of reconstruction ranks consisting of integers from 0 to num_embeddings - 1 representing the number of clean embeddings closer (by some metric) to each transformed embedding than its corresponding clean embedding.

required

k

int

The cutoff rank to be considered "transformed". Should be at least 1 and at most num_embeddings.

required

Returns:

Type Description
torch.Tensor

A boolean tensor of whether each reconstruction rank is >= k.

Added in version v3.36.0.