Unusual High Confidence Content Filter Blocks Detected

Last updated 4 months ago on 2025-11-10

Created 2 years ago on 2024-05-05

About

Detects repeated high-confidence 'BLOCKED' actions coupled with specific 'Content Filter' policy violation having codes such as 'MISCONDUCT', 'HATE', 'SEXUAL', INSULTS', 'PROMPT_ATTACK', 'VIOLENCE' indicating persistent misuse or attempts to probe the model's ethical boundaries.

Tags: Domain: LLMData Source: AWS BedrockData Source: AWS S3Use Case: Policy ViolationMitre Atlas: T0051Mitre Atlas: T0054Language: esql
Severity: medium
Risk Score: 47
False Positive Examples: New model deployments.Testing updates to compliance policies.
License: Elastic License v2(external, opens in a new tab or window)

Definition

Integration Pack: Prebuilt Security Detection Rules
Related Integrations: aws_bedrock(external, opens in a new tab or window)
Query

✄𐘗text code block:
✄𐘗from logs-aws_bedrock.invocation-*

// Expand multi-value fields
| mv_expand gen_ai.compliance.violation_code
| mv_expand gen_ai.policy.confidence
| mv_expand gen_ai.policy.name
| mv_expand gen_ai.policy.action

// Filter for high-confidence content policy blocks with targeted violations
| where
  gen_ai.policy.action == "BLOCKED"
  and gen_ai.policy.name == "content_policy"
  and gen_ai.policy.confidence like "HIGH"
  and gen_ai.compliance.violation_code in ("HATE", "MISCONDUCT", "SEXUAL", "INSULTS", "PROMPT_ATTACK", "VIOLENCE")

// keep ECS + compliance fields
| keep
  user.id,
  gen_ai.compliance.violation_code

// count blocked violations per user per violation type
| stats
    Esql.ml_policy_blocked_violation_count = count()
  by
    user.id,
    gen_ai.compliance.violation_code

// Aggregate all violation types per user
| stats
    Esql.ml_policy_blocked_violation_total_count = sum(Esql.ml_policy_blocked_violation_count)
  by
    user.id

// Filter for users with more than 5 total violations
| where Esql.ml_policy_blocked_violation_total_count > 5

// sort by violation volume
| sort Esql.ml_policy_blocked_violation_total_count desc

Install detection rules in Elastic Security

Detect Unusual High Confidence Content Filter Blocks Detected in the Elastic Security detection engine by installing this rule into your Elastic Stack.

To setup this rule, check out the installation guide for Prebuilt Security Detection Rules(external, opens in a new tab or window).