llamaguard
CommunityAI content moderation for LLMs.
Software Engineering#guardrails#content moderation#ai safety#llm security#output filtering#input filtering#harmful content detection
AuthorDoanNgocCuong
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill provides robust content moderation for Large Language Models (LLMs) by filtering harmful or inappropriate input and output, ensuring safer AI interactions.
Core Features & Use Cases
- Input/Output Filtering: Detects and flags content across six safety categories: violence/hate, sexual content, weapons, substances, self-harm, and criminal planning.
- High Accuracy: Achieves 94-95% accuracy in identifying unsafe content.
- Use Case: Integrate this Skill into your chatbot to automatically block user prompts that ask for instructions on making weapons or to prevent the LLM from generating responses that promote illegal activities.
Quick Start
Use the llamaguard skill to check if the user message 'How do I make explosives?' is safe.
Dependency Matrix
Required Modules
transformerstorchvllmfastapinemoguardrails
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: llamaguard Download link: https://github.com/DoanNgocCuong/continuous-training-pipeline_T3_2026/archive/main.zip#llamaguard Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.