Guardrails

Multi-layered security validation for LLM inputs and outputs.

Overview

Fortified LLM Client provides five types of guardrails to protect against unsafe or malicious LLM interactions:

  1. Regex - Fast pattern-based validation (custom patterns, length limits)
  2. Llama Guard - MLCommons safety taxonomy (13 categories S1-S13)
  3. Llama Prompt Guard - Jailbreak detection
  4. GPT OSS Safeguard - GPT-4 based policy validation
  5. Composite - Composable multi-provider validation

Key Concepts

Defense in Depth

Layer multiple guardrails for comprehensive protection:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[guardrails.input]
type = "composite"
execution = "parallel"
aggregation = "all_must_pass"

# Layer 1: Fast regex checks
[[guardrails.input.providers]]
type = "regex"
max_length_bytes = 1048576
patterns_file = "patterns/input.txt"

# Layer 2: LLM-based validation
[[guardrails.input.providers]]
type = "llama_guard"
api_url = "http://localhost:11434/v1/chat/completions"
model = "llama-guard3:8b"

System Prompts Are Trusted

Important: Guardrails ONLY validate user-provided inputs, NOT system prompts.

  • System prompts - Developer-controlled, trusted content
  • User prompts - User-provided, must be validated

Input vs Output Guardrails

  • Input Guardrails - Validate before sending to LLM (prevents harmful inputs)
  • Output Guardrails - Validate LLM responses (ensures safe outputs)

Configuration Formats

Guardrails can be configured in two ways:

  1. Separate Input/Output: Use [guardrails.input] and [guardrails.output] for different configurations
  2. Unified: Use [guardrails] to apply the same configuration to both input and output

See the Configuration Guide for details.

Quick Start

CLI (Simple Validation)

1
2
3
4
5
6
fortified-llm-client \
  --api-url http://localhost:11434/v1/chat/completions \
  --model llama3 \
  --user-text "Your prompt" \
  --enable-input-validation \
  --max-input-length 1MB

Config File (Advanced Validation)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
api_url = "http://localhost:11434/v1/chat/completions"
model = "llama3"

[guardrails.input]
type = "regex"
max_length_bytes = 1048576
patterns_file = "patterns/input.txt"
severity_threshold = "medium"

[guardrails.output]
type = "regex"
max_length_bytes = 2097152
patterns_file = "patterns/output.txt"
severity_threshold = "high"

Pattern file format (patterns/input.txt):

1
2
3
CRITICAL | SSN | \b\d{3}-\d{2}-\d{4}\b
HIGH | Credit Card | \b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b
MEDIUM | Email | [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[A-Z|a-z]{2,}

Guardrail Types

Type Speed Accuracy Use Case
Regex Fast (<10ms) Good Custom patterns, length limits (input & output)
Llama Guard Slow (1-3s) Excellent Comprehensive safety (S1-S13)
Llama Prompt Guard Slow (1-3s) Excellent Advanced jailbreak detection
GPT OSS Safeguard Slow (2-5s) Excellent Custom policy validation
Composite Variable Best Combine multiple strategies

Section Contents

Choosing the Right Guardrail

For Development/Testing

1
2
3
4
5
[guardrails.input]
type = "patterns"  # Fast, low cost

[guardrails.input.patterns]
detect_prompt_injection = true

For Production (Balanced)

1
2
3
4
5
6
7
8
9
10
11
[guardrails.input]
type = "hybrid"
execution_mode = "sequential"

# Fast check first
[[guardrails.input.hybrid.providers]]
type = "patterns"

# LLM check only if patterns pass
[[guardrails.input.hybrid.providers]]
type = "llama_guard"

For High-Security Environments

1
2
3
4
5
6
7
8
9
10
11
12
13
[guardrails.input]
type = "hybrid"
execution_mode = "parallel"
aggregation_mode = "all"  # All must pass

[[guardrails.input.hybrid.providers]]
type = "patterns"

[[guardrails.input.hybrid.providers]]
type = "llama_guard"

[[guardrails.input.hybrid.providers]]
type = "gpt_oss_safeguard"

Next Steps

Start with Regex Guardrails for basic validation, then explore Composite Guardrails for defense-in-depth strategies.


Table of contents