llamaguard

Community

AI content moderation for LLMs.

AuthorDoanNgocCuong
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill provides robust content moderation for Large Language Models (LLMs) by filtering harmful or inappropriate input and output, ensuring safer AI interactions.

Core Features & Use Cases

  • Input/Output Filtering: Detects and flags content across six safety categories: violence/hate, sexual content, weapons, substances, self-harm, and criminal planning.
  • High Accuracy: Achieves 94-95% accuracy in identifying unsafe content.
  • Use Case: Integrate this Skill into your chatbot to automatically block user prompts that ask for instructions on making weapons or to prevent the LLM from generating responses that promote illegal activities.

Quick Start

Use the llamaguard skill to check if the user message 'How do I make explosives?' is safe.

Dependency Matrix

Required Modules

transformerstorchvllmfastapinemoguardrails

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: llamaguard
Download link: https://github.com/DoanNgocCuong/continuous-training-pipeline_T3_2026/archive/main.zip#llamaguard

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.