Searching protocol for "memory models"
Verify concurrent programs under weak memory models.
Shrink LLMs, boost performance.
Keep memory continuity across model switches.
Memory-efficient fine-tuning for large models
Shrink LLMs, boost GPU efficiency.
Accelerate transformer models
Boost GPU performance and efficiency.
Clarify memory concepts with C++ mental models
Shrink LLMs, boost performance.
Lean, fast model quantization for inference.
Accelerate transformer models.
Accelerate transformer models