Warmup deep_gemm kernels. DeepGEMM JIT's the kernels. The warmup aims to JIT all the kernels that would be used during model execution beforehand.
 module-attribute  ¶
   
  Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
  
 _deepgemm_grouped_fp8_gemm_nt_contiguous_warmup(
    w1: Tensor,
    w2: Tensor,
    w1_scale: Tensor,
    w2_scale: Tensor,
    num_topk: int,
    max_tokens: int,
)
Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
  
  Extract weights, weight scales and num_topk from FusedMoE module.
Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
  
  Extract weights, weight scales and quantization block sizes from the given LinearBase module.
Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
  
  Return True if the input module/layer could be processed with DeepGEMM.
Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
  
  Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
  
  Generate M values that cover all possible DeepGEMM kernel configurations. Reference: https://github.com/deepseek-ai/DeepGEMM/blob/79f48ee15a82dd5fad5cd9beaa393c1f755e6b55/csrc/jit_kernels/heuristics/common.hpp
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| max_tokens | int | Maximum number of tokens to warmup for | required | 
| n | int | The actual N dimension from the weight tensor | required | 
| device | device | The torch device to get properties from. | required |