Unusual High Confidence Content Filter Blocks Detected

Last updated a month ago on 2024-11-21
Created 8 months ago on 2024-05-05

About

Detects repeated high-confidence 'BLOCKED' actions coupled with specific 'Content Filter' policy violation having codes such as 'MISCONDUCT', 'HATE', 'SEXUAL', INSULTS', 'PROMPT_ATTACK', 'VIOLENCE' indicating persistent misuse or attempts to probe the model's ethical boundaries.
Tags
Domain: LLMData Source: AWS BedrockData Source: AWS S3Use Case: Policy ViolationMitre Atlas: T0051Mitre Atlas: T0054
Severity
medium
Risk Score
47
False Positive Examples
New model deployments.Testing updates to compliance policies.
License
Elastic License v2(opens in a new tab or window)

Definition

Integration Pack
Prebuilt Security Detection Rules
Related Integrations

(opens in a new tab or window)

Query
from logs-aws_bedrock.invocation-*
| MV_EXPAND gen_ai.compliance.violation_code
| MV_EXPAND gen_ai.policy.confidence
| MV_EXPAND gen_ai.policy.name 
| where gen_ai.policy.action == "BLOCKED" and gen_ai.policy.name == "content_policy" and gen_ai.policy.confidence LIKE "HIGH" and gen_ai.compliance.violation_code IN ("HATE", "MISCONDUCT", "SEXUAL", "INSULTS", "PROMPT_ATTACK", "VIOLENCE")
| keep user.id, gen_ai.compliance.violation_code
| stats block_count_per_violation = count() by user.id, gen_ai.compliance.violation_code 
| SORT block_count_per_violation DESC 
| keep user.id, gen_ai.compliance.violation_code, block_count_per_violation
| STATS violation_count = SUM(block_count_per_violation) by user.id
| WHERE violation_count > 5 
| SORT violation_count DESC 

Install detection rules in Elastic Security

Detect Unusual High Confidence Content Filter Blocks Detected in the Elastic Security detection engine by installing this rule into your Elastic Stack.

To setup this rule, check out the installation guide for Prebuilt Security Detection Rules(opens in a new tab or window).