mixture_of_experts
DeepSpeedMoE
¶
Bases: MoE
Provides easier access to the auxiliary loss.
__init__
¶
forward
¶
forward(
hidden_states: Tensor, used_token: Tensor | None = None
) -> Tensor