Guardrails
Multi-layered security validation for LLM inputs and outputs.
Overview
Fortified LLM Client provides five types of guardrails to protect against unsafe or malicious LLM interactions:
- Regex - Fast pattern-based validation (custom patterns, length limits)
- Llama Guard - MLCommons safety taxonomy (13 categories S1-S13)
- Llama Prompt Guard - Jailbreak detection
- GPT OSS Safeguard - GPT-4 based policy validation
- Composite - Composable multi-provider validation
Key Concepts
Defense in Depth
Layer multiple guardrails for comprehensive protection:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[guardrails.input]
type = "composite"
execution = "parallel"
aggregation = "all_must_pass"
# Layer 1: Fast regex checks
[[guardrails.input.providers]]
type = "regex"
max_length_bytes = 1048576
patterns_file = "patterns/input.txt"
# Layer 2: LLM-based validation
[[guardrails.input.providers]]
type = "llama_guard"
api_url = "http://localhost:11434/v1/chat/completions"
model = "llama-guard3:8b"
System Prompts Are Trusted
Important: Guardrails ONLY validate user-provided inputs, NOT system prompts.
- System prompts - Developer-controlled, trusted content
- User prompts - User-provided, must be validated
Input vs Output Guardrails
- Input Guardrails - Validate before sending to LLM (prevents harmful inputs)
- Output Guardrails - Validate LLM responses (ensures safe outputs)
Configuration Formats
Guardrails can be configured in two ways:
- Separate Input/Output: Use
[guardrails.input]and[guardrails.output]for different configurations - Unified: Use
[guardrails]to apply the same configuration to both input and output
See the Configuration Guide for details.
Quick Start
CLI (Simple Validation)
1
2
3
4
5
6
fortified-llm-client \
--api-url http://localhost:11434/v1/chat/completions \
--model llama3 \
--user-text "Your prompt" \
--enable-input-validation \
--max-input-length 1MB
Config File (Advanced Validation)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
api_url = "http://localhost:11434/v1/chat/completions"
model = "llama3"
[guardrails.input]
type = "regex"
max_length_bytes = 1048576
patterns_file = "patterns/input.txt"
severity_threshold = "medium"
[guardrails.output]
type = "regex"
max_length_bytes = 2097152
patterns_file = "patterns/output.txt"
severity_threshold = "high"
Pattern file format (patterns/input.txt):
1
2
3
CRITICAL | SSN | \b\d{3}-\d{2}-\d{4}\b
HIGH | Credit Card | \b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b
MEDIUM | Email | [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[A-Z|a-z]{2,}
Guardrail Types
| Type | Speed | Accuracy | Use Case |
|---|---|---|---|
| Regex | Fast (<10ms) | Good | Custom patterns, length limits (input & output) |
| Llama Guard | Slow (1-3s) | Excellent | Comprehensive safety (S1-S13) |
| Llama Prompt Guard | Slow (1-3s) | Excellent | Advanced jailbreak detection |
| GPT OSS Safeguard | Slow (2-5s) | Excellent | Custom policy validation |
| Composite | Variable | Best | Combine multiple strategies |
Section Contents
- Regex Guardrails - Fast pattern-based validation (input & output)
- Llama Guard - MLCommons safety taxonomy
- Llama Prompt Guard - Jailbreak detection
- GPT OSS Safeguard - Policy-based validation
- Composite Guardrails - Multi-provider strategies
- Custom Policies - Creating custom policy files
Choosing the Right Guardrail
For Development/Testing
1
2
3
4
5
[guardrails.input]
type = "patterns" # Fast, low cost
[guardrails.input.patterns]
detect_prompt_injection = true
For Production (Balanced)
1
2
3
4
5
6
7
8
9
10
11
[guardrails.input]
type = "hybrid"
execution_mode = "sequential"
# Fast check first
[[guardrails.input.hybrid.providers]]
type = "patterns"
# LLM check only if patterns pass
[[guardrails.input.hybrid.providers]]
type = "llama_guard"
For High-Security Environments
1
2
3
4
5
6
7
8
9
10
11
12
13
[guardrails.input]
type = "hybrid"
execution_mode = "parallel"
aggregation_mode = "all" # All must pass
[[guardrails.input.hybrid.providers]]
type = "patterns"
[[guardrails.input.hybrid.providers]]
type = "llama_guard"
[[guardrails.input.hybrid.providers]]
type = "gpt_oss_safeguard"
Next Steps
Start with Regex Guardrails for basic validation, then explore Composite Guardrails for defense-in-depth strategies.