mixture_of_experts

DeepSpeedMoE ¶

Bases: MoE

Provides easier access to the auxiliary loss.

__init__(*args: Any, **kwargs: Any)

Construct a DeepSpeedMoE layer.

Parameters:

Name	Type	Description	Default
`*args` ¶	`Any`	The positional arguments to pass to the Mixture of Experts.	`()`
`**kwargs` ¶	`Any`	The keyword arguments to pass to the Mixture of Experts.	`{}`

forward(
    hidden_states: Tensor, used_token: Tensor | None = None
) -> Tensor

Return the hidden states and save the auxiliary loss.

Parameters:

Name	Type	Description	Default
`hidden_states` ¶	`Tensor`	The hidden states to evaluate.	required
`used_token` ¶	`Tensor \| None`	Mask only used tokens.	`None`

Returns:

Type	Description
`Tensor`	The hidden states after evaluation.