Name: inference
Availability: InStock
Author: atrawog

System Documentation

What problem does it solve?

This skill enables fast and memory-efficient inference for large language models by leveraging a vLLM backend and unsloth, reducing latency and resource usage in both interactive and batch scenarios.

Core Features & Use Cases

Fast_inference with vLLM backend for 2x speedups
Model loading and merging LoRA adapters for efficient deployment
Thinking-model output parsing and memory management for robust workflows
Batch and interactive inference in Python environments (notebooks & apps)

Quick Start

Run a sample inference by loading a pre-quantized thinking model and enabling fast_inference to observe accelerated generation.

Please help me install this Skill: Name: inference Download link: https://github.com/atrawog/overthink-plugins/archive/main.zip#inference Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

inference

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper