mixture_of_experts

Classes:

Name	Description
`DeepSpeedMoE`	Provides easier access to the auxiliary loss.

DeepSpeedMoE ¶

Bases: MoE

Provides easier access to the auxiliary loss.

Methods:

Name	Description
`__init__`	Construct a `DeepSpeedMoE` layer.
`forward`	Return the hidden states and save the auxiliary loss.

__init__(*args: Any, **kwargs: Any)

Construct a DeepSpeedMoE layer.

Parameters:

Name	Type	Description	Default
`*args` ¶	`Any`	The positional arguments to pass to the Mixture of Experts.	`()`
`**kwargs` ¶	`Any`	The keyword arguments to pass to the Mixture of Experts.	`{}`

forward(
    hidden_states: Tensor, used_token: Tensor | None = None
) -> Tensor

Return the hidden states and save the auxiliary loss.

Parameters:

Name	Type	Description	Default
`hidden_states` ¶	`Tensor`	The hidden states to evaluate.	required
`used_token` ¶	`Tensor \| None`	Mask only used tokens.	`None`

Returns:

Type	Description
`Tensor`	The hidden states after evaluation.