mixture_of_experts
Classes:
| Name | Description | 
|---|---|
| DeepSpeedMoE | Provides easier access to the auxiliary loss. | 
    
              Bases: MoE
Provides easier access to the auxiliary loss.
Methods:
| Name | Description | 
|---|---|
| __init__ | Construct a  | 
| forward | Return the hidden states and save the auxiliary loss. | 
    
forward(
    hidden_states: Tensor, used_token: Tensor | None = None
) -> Tensor