Guardrails

Maintain safety, stability, and compliance during agent execution.

Overview

Guardrails are pre-deployed scanners that evaluate user inputs and model outputs to help maintain safe, responsible, and compliant AI interactions. You enable the scanners you need—no deployment required.

User Input → Input Scanners → Agent Processing → Output Scanners → Response

Available Scanners

Scanner	Description	Applies To
Regex	Validates prompts using user-defined regular expression patterns. Supports defining desirable (“good”) and undesirable (“bad”) patterns for fine-grained validation.	Input
Anonymize	Removes sensitive data from user prompts to maintain privacy and prevent exposure of personal information.	Input
Ban topics	Blocks specific topics (for example, religion) from appearing in prompts to avoid sensitive or inappropriate discussions.	Input
Prompt injection	Detects attempts to manipulate or override model behavior, protecting the LLM from malicious or crafted inputs.	Input
Toxicity	Analyzes prompts or responses for toxic or harmful language to ensure safe and respectful interactions.	Input, Output
Bias detection	Examines model outputs for potential bias to help maintain neutrality and fairness in generated responses.	Output
Deanonymize	Replaces placeholders in model outputs with actual values to restore necessary information when needed.	Output
Relevance	Measures similarity between the user’s prompt and the model’s output and provides a relevance score to ensure responses stay contextually aligned.	Output

View Guardrails

To view all pre-deployed guardrails available on the platform:

Go to Settings > Manage guardrails.

Enable Scanners

All scanners are pre-deployed and available by default. You must enable the required scanners in each agentic app or tool where you want to use them. To enable scanners:

Open the Guardrails settings for your app or tool:
- Agentic apps: Go to Agentic apps, select your app, then go to Settings > PII & Guardrails > Guardrails.
- Tools: Go to Tools, select the tool you want to configure, then select Guardrails.
On the Guardrails page, review the Input scanners and Output scanners tabs. Turn on the toggle next to each scanner you want to apply.
To configure a scanner, click it, adjust the available options, then click Save. The available options vary by scanner. For example, the Toxicity scanner includes Risk Threshold and Detection Sensitivity, while the Regex scanner includes Scanner mode (Block or Allow), pattern entry fields, and a Risk threshold slider.

Test Scanners

After enabling and configuring scanners, verify they perform as expected. You can test an individual scanner or the full set, then adjust settings as needed. To test guardrails:

On the Guardrails page, click Test.
In the Prompt input box, enter a prompt or select Input template to choose a template.

Click Test. Under Scores and Results, review the output. Guardrails Test Results

Field	Description
Validity	Indicates whether the prompt meets the scanner’s criteria. For example, if no toxicity is detected, Validity is set to True.
Risk Score	Indicates the prompt’s risk level, calculated as: `(Threshold − Scanner Score) / Threshold`. For the Relevance scanner, the score is 1 if similarity falls below the threshold; otherwise 0.
Duration	The time taken by the scanner to process the prompt.

Based on the results, adjust scanner settings and retest as needed.

Building Agents

Platform Services

Operations

References

Overview

Available Scanners

View Guardrails

Enable Scanners

Test Scanners

Building Agents

Platform Services

Operations

References

​Overview

​Available Scanners

​View Guardrails

​Enable Scanners

​Test Scanners

Overview

Available Scanners

View Guardrails

Enable Scanners

Test Scanners