Skip to content

mixture_of_experts

Classes:

Name Description
DeepSpeedMoE

Provides easier access to the auxiliary loss.

DeepSpeedMoE

Bases: MoE

Provides easier access to the auxiliary loss.

Methods:

Name Description
__init__

Construct a DeepSpeedMoE layer.

forward

Return the hidden states and save the auxiliary loss.

__init__

__init__(*args: Any, **kwargs: Any)

Construct a DeepSpeedMoE layer.

Parameters:

Name Type Description Default

*args

Any

The positional arguments to pass to the Mixture of Experts.

()

**kwargs

Any

The keyword arguments to pass to the Mixture of Experts.

{}

forward

forward(
    hidden_states: Tensor, used_token: Tensor | None = None
) -> Tensor

Return the hidden states and save the auxiliary loss.

Parameters:

Name Type Description Default

hidden_states

Tensor

The hidden states to evaluate.

required

used_token

Tensor | None

Mask only used tokens.

None

Returns:

Type Description
Tensor

The hidden states after evaluation.