Bases: QuantizationConfig
Int8 Quantization Config class for TPU Backend.
Source code in vllm/model_executor/layers/quantization/tpu_int8.py
  
 __init__(activation_scheme: str = 'none') -> None
Source code in vllm/model_executor/layers/quantization/tpu_int8.py
   classmethod  ¶
 from_config(config: dict[str, Any]) -> Int8TpuConfig
 staticmethod  ¶
    
 get_name() -> QuantizationMethods
 
 get_quant_method(
    layer: Module, prefix: str
) -> Optional[TPUInt8LinearMethod]
 
  Bases: LinearMethodBase
Int8 Linear method for TPU Quant.
Source code in vllm/model_executor/layers/quantization/tpu_int8.py
  
 __init__(quant_config: Int8TpuConfig)
 
  Source code in vllm/model_executor/layers/quantization/tpu_int8.py
  
  Source code in vllm/model_executor/layers/quantization/tpu_int8.py
  
 create_weights(
    layer: Module,
    input_size_per_partition: int,
    output_partition_sizes: list[int],
    input_size: int,
    output_size: int,
    params_dtype: dtype,
    **extra_weight_attrs,
)
Source code in vllm/model_executor/layers/quantization/tpu_int8.py
  
 process_weights_after_loading(layer: Module) -> None