mixture_of_experts DeepSpeedMoE ¶ Bases: MoE Provides easier access to the auxiliary loss. __init__ ¶ __init__(*args: Any, **kwargs: Any) Construct a DeepSpeedMoE layer. Parameters: Name Type Description Default *args Any The positional arguments to pass to the Mixture of Experts. () **kwargs Any The keyword arguments to pass to the Mixture of Experts. {} forward ¶ forward( hidden_states: Tensor, used_token: Tensor | None = None ) -> Tensor Return the hidden states and save the auxiliary loss. Parameters: Name Type Description Default hidden_states Tensor The hidden states to evaluate. required used_token Tensor | None Mask only used tokens. None Returns: Type Description Tensor The hidden states after evaluation.