You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/my-website/docs/proxy/guardrails/pillar_security.md
+141-1Lines changed: 141 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -233,7 +233,7 @@ curl -X POST "http://localhost:4000/v1/chat/completions" \
233
233
}'
234
234
```
235
235
236
-
This provides clear, explicit conversation tracking that works seamlessly with LiteLLM's session management.
236
+
This provides clear, explicit conversation tracking that works seamlessly with LiteLLM's session management. When using monitor mode, the session ID is returned in the `x-pillar-session-id` response header for easy correlation and tracking.
237
237
238
238
### Actions on Flagged Content
239
239
@@ -251,6 +251,73 @@ Logs the violation but allows the request to proceed:
251
251
on_flagged_action: "monitor"
252
252
```
253
253
254
+
**Response Headers:**
255
+
256
+
You can opt in to receiving detection details in response headers by configuring `include_scanners: true` and/or `include_evidence: true`. When enabled, these headers are included for **every request**—not just flagged ones—enabling comprehensive metrics, false positive analysis, and threat investigation.
- **`x-pillar-evidence`**: URL-encoded JSON array of detection evidence (may contain items even when `flagged` is `false`) — requires `include_evidence: true`
261
+
- **`x-pillar-session-id`**: URL-encoded session ID for correlation and investigation
262
+
263
+
:::info Understanding `flagged` vs Scanner Results
264
+
The `flagged` field is Pillar's **policy-level blocking recommendation**, which may differ from individual scanner results:
265
+
266
+
- **`flagged: true`** → Pillar recommends blocking based on your configured policies
267
+
- **`flagged: false`** → Pillar does not recommend blocking, but individual scanners may still detect content
268
+
269
+
For example, the `toxic_language` scanner might detect profanity (`scanners.toxic_language: true`) while `flagged` remains `false` if your Pillar policy doesn't block on toxic language alone. This allows you to:
270
+
- Monitor threats without blocking users
271
+
- Build metrics on detection rates vs block rates
272
+
- Analyze false positive rates by comparing scanner results to user feedback
273
+
:::
274
+
275
+
The `x-pillar-scanners`, `x-pillar-evidence`, and `x-pillar-session-id` headers use URL encoding (percent-encoding) to convert JSON data into an ASCII-safe format. This is necessary because HTTP headers only support ISO-8859-1 characters and cannot contain raw JSON special characters (`{`, `"`, `:`) or Unicode text. To read these headers, first URL-decode the value, then parse it as JSON.
276
+
277
+
LiteLLM truncates the `x-pillar-evidence` header to a maximum of 8 KB per header to avoid proxy limits. Note that most proxies and servers also enforce a total header size limit of approximately 32 KB across all headers combined. When truncation occurs, each affected evidence item includes an `"evidence_truncated": true` flag and the metadata contains `pillar_evidence_truncated: true`.
LiteLLM mirrors the encoded values onto `metadata["pillar_response_headers"]` so you can inspect exactly what was returned. When truncation occurs, it sets `metadata["pillar_evidence_truncated"]` to `true` and marks affected evidence items with `"evidence_truncated": true`. Evidence text is shortened with a `...[truncated]` suffix, and entire evidence entries may be removed if necessary to stay under the 8 KB header limit. Check these flags to determine if full evidence details are available in your logs.
313
+
:::
314
+
315
+
This allows your application to:
316
+
- Track threats without blocking legitimate users
317
+
- Implement custom handling logic based on threat types
318
+
- Build analytics and alerting on security events
319
+
- Correlate threats across requests using session IDs
320
+
254
321
### Resilience and Error Handling
255
322
256
323
#### Graceful Degradation (`fallback_on_error`)
@@ -544,6 +611,79 @@ curl -X POST "http://localhost:4000/v1/chat/completions" \
544
611
}
545
612
```
546
613
614
+
</TabItem>
615
+
<TabItem value="monitor" label="Monitor Mode with Headers">
616
+
617
+
**Monitor mode request with scanner detection:**
618
+
619
+
```bash
620
+
# Test with content that triggers scanner detection
621
+
curl -v -X POST "http://localhost:4000/v1/chat/completions" \
"messages": [{"role": "user", "content": "how do I rob a bank?"}],
627
+
"max_tokens": 50
628
+
}'
629
+
```
630
+
631
+
**Expected response (Allowed with headers):**
632
+
633
+
The request succeeds and returns the LLM response. Headers are included for **all requests** when `include_scanners` and `include_evidence` are enabled—even when `flagged` is `false`:
Notice that `x-pillar-flagged: false` but `safety: true` in the scanners. This is because `flagged` represents Pillar's policy-level blocking recommendation, while individual scanners report their own detections.
# [{'category': 'safety', 'type': 'non_violent_crimes', 'evidence': 'how do I rob a bank?', ...}]
660
+
```
661
+
662
+
```json
663
+
{
664
+
"id": "chatcmpl-xyz123",
665
+
"object": "chat.completion",
666
+
"model": "gpt-4.1-mini",
667
+
"choices": [
668
+
{
669
+
"index": 0,
670
+
"message": {
671
+
"role": "assistant",
672
+
"content": "I'm sorry, but I can't assist with that request."
673
+
},
674
+
"finish_reason": "stop"
675
+
}
676
+
],
677
+
"usage": {
678
+
"prompt_tokens": 14,
679
+
"completion_tokens": 11,
680
+
"total_tokens": 25
681
+
}
682
+
}
683
+
```
684
+
685
+
**Note:** In monitor mode, scanner results and evidence are included in response headers for every request, allowing you to build metrics and analyze detection patterns. The `flagged` field indicates whether Pillar's policy recommends blocking—your application can use the detailed scanner data for custom alerting, analytics, or false positive analysis.
0 commit comments