Searching protocol for "RMSNorm"
Optimize diffusion model inference speed.
Optimize diffusion model inference speed.
Stabilize deep networks, accelerate training.
Train modular MLP backbones with Flax NNX.
Speed up CUDA kernels for Diffusers.
Automate GPU kernel schema extraction.
Boost CUDA kernels for Diffusers on H100
Optimize NVIDIA GPU kernels for AI models.
Automate reference test generation for kernel validation.
Optimize diffusion model GPU kernels.
Optimize diffusion model kernels
Design transformer architectures.