Searching protocol for "triton kernels"
Optimize diffusion model kernels
Optimize diffusion model inference speed.
Optimize diffusion model GPU kernels.
Overlap GPU compute with data loads.
Boost paged attention decode performance.
Optimize diffusion model inference speed.
Add custom CUDA kernels to sgl-kernel
Extend sgl-kernel with custom CUDA kernels.
GPU-accelerated OpenFold3 for structure prediction.