Evaluation Pipeline
Step-by-step execution flow for LLM evaluation.
Overview
The evaluation pipeline in lib.rs::evaluate_internal() follows a strict sequence to ensure security, efficiency, and correctness.
Pipeline Steps
Step 1: PDF Extraction (Optional)
When: pdf_input is provided
Process:
- Validate PDF file exists
- Check file size ≤ MAX_PDF_SIZE_BYTES (50MB)
- Create temporary directory
- Execute Docling CLI:
docling <pdf> --output <temp> --format markdown - Read extracted Markdown text
- Replace
user_promptwith extracted text - Clean up temporary files
Code: src/pdf.rs::extract_pdf_text()
Error Handling: Fails if Docling not installed or extraction fails
Step 2: Input Guardrails (Optional)
When: Input guardrails configured in config file
Process:
- Load guardrail configuration from config file
- Create appropriate
GuardrailProvider(patterns, llama_guard, hybrid, etc.) - Validate
user_prompt(NOT system_prompt - system prompts are trusted) - If validation fails, return
ValidationErrorimmediately
Code: src/guardrails/config.rs::create_guardrail_provider()
Important: Only user-provided content is validated. System prompts are developer-controlled and trusted.
Example:
1
2
3
if let Some(input_guardrail) = &guardrails.input {
input_guardrail.validate(&config.user_prompt).await?;
}
Step 3: Token Validation (Optional)
When: validate_tokens is true
Process:
- Determine context limit (auto-detect or use
context_limitoverride) - Estimate system prompt tokens
- Estimate user prompt tokens
- Calculate response buffer (from
max_tokensor default) - Total = system + user + response_buffer
- If total > context_limit, return
ValidationError
Code: src/token_estimator.rs::estimate_tokens()
Benefit: Fails early before API call, saving cost and latency
Step 4: LLM Invocation
Process:
- Detect provider from API URL (or use explicit
provideroverride) - Create provider instance (
OpenAIProviderorOllamaProvider) - Build
LlmRequestwith all parameters - Call
provider.invoke(request) - Parse response and extract content
Code:
src/client.rs::LlmClient::call()src/providers/openai.rs::OpenAIProvider::invoke()src/providers/ollama.rs::OllamaProvider::invoke()
Error Handling: Returns ApiError for network/auth failures
Step 5: Output Guardrails (Optional)
When: Output guardrails configured in config file
Process:
- Load output guardrail configuration
- Create
GuardrailProvider - Validate LLM response content
- If validation fails, return
ValidationError
Code: src/guardrails/output.rs
Use Case: Detect toxic content, low quality responses, policy violations
Step 6: Metadata Generation
Process:
- Calculate total latency (pipeline start to end)
- Collect metadata:
- Model name
- Estimated tokens (from step 3 or post-hoc)
- Latency in milliseconds
- Timestamp (ISO 8601)
- Provider type
- Request parameters (temperature, max_tokens, etc.)
- Create
EvaluationResultwith content + metadata
Code: src/lib.rs::evaluate_internal()
Output:
1
2
3
4
5
6
EvaluationResult {
content: String, // LLM response
metadata: Metadata {
model, tokens_estimated, latency_ms, timestamp, ...
}
}
Complete Flow Diagram
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
Input (EvaluationConfig)
│
▼
┌─────────────────────────┐
│ PDF Extraction? │ ← Step 1 (optional)
│ Extract text → replace │
│ user_prompt │
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ Input Guardrails? │ ← Step 2 (optional)
│ Validate user_prompt │
│ (NOT system_prompt) │
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ Token Validation? │ ← Step 3 (optional)
│ Estimate & check limit │
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ LLM Invocation │ ← Step 4 (required)
│ Detect provider │
│ Call API │
│ Parse response │
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ Output Guardrails? │ ← Step 5 (optional)
│ Validate LLM response │
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ Metadata Generation │ ← Step 6 (required)
│ Collect stats, format │
└───────────┬─────────────┘
│
▼
Output (EvaluationResult)
Error Handling
Each step can fail with specific error types:
| Step | Error Type | Example |
|---|---|---|
| PDF Extraction | PdfError |
Docling not installed, file too large |
| Input Guardrails | ValidationError |
PII detected, prompt injection |
| Token Validation | ValidationError |
Token count exceeds limit |
| LLM Invocation | ApiError |
Network failure, invalid API key |
| Output Guardrails | ValidationError |
Toxic content detected |
| Metadata | InternalError |
Timestamp formatting error |
Pipeline behavior: First error stops execution, returns immediately
Performance Characteristics
Typical latency breakdown:
- PDF Extraction: 500ms - 5s (depends on PDF size/complexity)
- Input Guardrails: 10ms (patterns) to 2s (LLM-based)
- Token Validation: <10ms
- LLM Invocation: 1s - 30s (depends on model, response length)
- Output Guardrails: 10ms (patterns) to 2s (LLM-based)
- Metadata Generation: <1ms
Optimization tips:
- Use pattern-based guardrails before LLM-based (sequential hybrid)
- Enable token validation to fail fast for oversized prompts
- Cache PDF extractions if processing same file multiple times
See Also
- Layers - Architecture overview
- Providers - LLM provider details
- Guardrails - Validation strategies