mixture_of_experts
Classes:
Name | Description |
---|---|
DeepSpeedMoE |
Provides easier access to the auxiliary loss. |
DeepSpeedMoE
¶
Bases: MoE
Provides easier access to the auxiliary loss.
Methods:
Name | Description |
---|---|
__init__ |
Construct a |
forward |
Return the hidden states and save the auxiliary loss. |
__init__
¶
forward
¶
forward(
hidden_states: Tensor, used_token: Tensor | None = None
) -> Tensor