Searching protocol for "nsys"
Deep GPU performance analysis
Pinpoint code optimization targets.
Find and fix GPU kernel performance bottlenecks.
Master CUDA kernel development and profiling.
Optimize diffusion model inference speed.
Optimize diffusion model GPU kernels.
Optimize CPU/GPU usage, save resources and time.