_flashinfer_rotary_embedding(
    positions: Tensor,
    query: Tensor,
    key: Tensor,
    head_size: int,
    cos_sin_cache: Tensor,
    is_neox: bool,
) -> None
Custom op wrapper for flashinfer's rotary embedding.
This is an in-place operation that modifies query and key tensors directly.
Source code in vllm/model_executor/layers/rotary_embedding/common.py
  
    
  Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| x | Tensor | [num_tokens, num_heads, head_size] | required | 
| cos | Tensor | [num_tokens, head_size // 2] | required | 
| sin | Tensor | [num_tokens, head_size // 2] | required | 
| is_neox_style | bool | Whether to use the Neox-style or GPT-J-style rotary positional embeddings. | required | 
Source code in vllm/model_executor/layers/rotary_embedding/common.py
  
  Source code in vllm/model_executor/layers/rotary_embedding/common.py
  cached  ¶
 dispatch_rotary_emb_function(
    default: Callable[..., Tensor] | None = None,
) -> Callable[..., Tensor]
Source code in vllm/model_executor/layers/rotary_embedding/common.py
  
    
    
 yarn_find_correction_dim(
    num_rotations: int,
    dim: int,
    base: float = 10000,
    max_position_embeddings: int = 2048,
) -> float
Source code in vllm/model_executor/layers/rotary_embedding/common.py
   
 yarn_find_correction_range(
    low_rot: int,
    high_rot: int,
    dim: int,
    base: float = 10000,
    max_position_embeddings: int = 2048,
) -> tuple[int, int]