mixture_of_experts DeepSpeedMoE ¶ Bases: MoE Provides easier access to the auxiliary loss. __init__ ¶ __init__(*args: Any, **kwargs: Any) Construct a DeepSpeedMoE layer. Parameters: Name Type Description Default *args Any The positional arguments to pass to the Mixture of Experts. () **kwargs Any The keyword arguments to pass to the Mixture of Experts. {} forward ¶ forward(hidden_states: Tensor, used_token: Tensor | None = None) -> Tensor Return the hidden states and save the auxiliary loss. Parameters: Name Type Description Default hidden_states Tensor The hidden states to evaluate. required used_token Tensor | None Mask only used tokens. None Returns: Type Description Tensor The hidden states after evaluation.