Skip to content

mixture_of_experts

DeepSpeedMoE

Bases: MoE

Provides easier access to the auxiliary loss.

__init__

__init__(*args: Any, **kwargs: Any)

Construct a DeepSpeedMoE layer.

Parameters:

Name Type Description Default
*args Any

The positional arguments to pass to the Mixture of Experts.

()
**kwargs Any

The keyword arguments to pass to the Mixture of Experts.

{}

forward

forward(hidden_states: Tensor, used_token: Tensor | None = None) -> Tensor

Return the hidden states and save the auxiliary loss.

Parameters:

Name Type Description Default
hidden_states Tensor

The hidden states to evaluate.

required
used_token Tensor | None

Mask only used tokens.

None

Returns:

Type Description
Tensor

The hidden states after evaluation.