-
Notifications
You must be signed in to change notification settings - Fork 613
[FR] Add keep metadata check to esql schema test #5441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Enhancement - GuidelinesThese guidelines serve as a reminder set of considerations when addressing adding a new schema feature to the code. Documentation and Context
Code Standards and Practices
Testing
Additional Schema Related Checks
|
|
If #5433 merges and we are fine with Updated: Complete |
Mikaayenson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe clarify in docs/tests that this is specifically about ensuring those metadata fields are in the output row (i.e., in keep), not just in METADATA
| # Match | followed by optional whitespace/newlines and then 'keep' | ||
| keep_pattern = re.compile(r"\|\s*keep\b", re.IGNORECASE | re.DOTALL) | ||
| if not keep_pattern.search(query_lower): | ||
| keep_pattern = re.compile(r"\|\s*keep\b\s+([^\|]+)", re.IGNORECASE | re.DOTALL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work with things like keep Esql.* / keep aws.*?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is used to grab all of the fields that are kept as a single string. It is then parsed later into each individual field to check for the appropriate metadata or presence of an *. For cases like keep Esql.* and keep aws.* they are treated as part of the keep string.
For example, a rule with query like this: (also see example in our unit test)
from logs-endpoint.events.process-*, logs-windows.sysmon_operational-*, logs-system.security-*, logs-windows.*, winlogbeat-*, logs-crowdstrike.fdr*, logs-m365_defender.event-* METADATA _id, _version, _index
| where
@timestamp > now() - 8 hours and
event.category == "process" and
event.type == "start" and
process.name == "rundll32.exe" and
process.command_line like "*DavSetCookie*"
| keep Esql.*, aws.*, event.*, host.*, process.*, user.*, *
Will have a keep_pattern.search(query_lower).group(1) of 'esql.*, aws.*, event.*, host.*, process.*, user.*, *\n' which is how we are using the keep_pattern.
Co-authored-by: Mika Ayenson, PhD <Mikaayenson@users.noreply.github.com>
Co-authored-by: Jonhnathan <26856693+w0rk3r@users.noreply.github.com>
shashank-elastic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified Both Aggregate and non-aggregate ESQL rules.
Details
❯ python -m detection_rules view-rule rules/integrations/aws_bedrock/aws_bedrock_guardrails_multiple_violations_in_single_request.toml
Loaded config file: /Users/shashankks/elastic_workspace/detection-rules/.detection-rules-cfg.json
█▀▀▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄ ▄ █▀▀▄ ▄ ▄ ▄ ▄▄▄ ▄▄▄
█ █ █▄▄ █ █▄▄ █ █ █ █ █ █▀▄ █ █▄▄▀ █ █ █ █▄▄ █▄▄
█▄▄▀ █▄▄ █ █▄▄ █▄▄ █ ▄█▄ █▄█ █ ▀▄█ █ ▀▄ █▄▄█ █▄▄ █▄▄ ▄▄█
/Users/shashankks/elastic_workspace/detection-rules/detection_rules/index_mappings.py:366: ElasticsearchWarning: No limit defined, adding default limit of [1000]
response = elastic_client.esql.query(query=query)
{
"author": [
"Elastic"
],
"description": "Identifies multiple violations of AWS Bedrock guardrails within a single request, resulting in a block action, increasing the likelihood of malicious intent. Multiple violations implies that a user may be intentionally attempting to cirvumvent security controls, access sensitive information, or possibly exploit a vulnerability in the system.",
"false_positives": [
"Legitimate misunderstanding by users or overly strict policies"
],
"from": "now-60m",
"interval": "10m",
"language": "esql",
"license": "Elastic License v2",
"name": "AWS Bedrock Guardrails Detected Multiple Policy Violations Within a Single Blocked Request",
"note": "## Triage and analysis\n\n### Investigating AWS Bedrock Guardrails Detected Multiple Policy Violations Within a Single Blocked Request\n\nAmazon Bedrock Guardrail is a set of features within Amazon Bedrock designed to help businesses apply robust safety and privacy controls to their generative AI applications.\n\nIt enables users to set guidelines and filters that manage content quality, relevancy, and adherence to responsible AI practices.\n\nThrough Guardrail, organizations can define \"denied topics\" to prevent the model from generating content on specific, undesired subjects,\nand they can establish thresholds for harmful content categories, including hate speech, violence, or offensive language.\n\n#### Possible investigation steps\n\n- Identify the user account and the user request that caused multiple policy violations and whether it should perform this kind of action.\n- Investigate the user activity that might indicate a potential brute force attack.\n- Investigate other alerts associated with the user account during the past 48 hours.\n- Consider the time of day. If the user is a human (not a program or script), did the activity take place during a normal time of day?\n- Examine the account's prompts and responses in the last 24 hours.\n- If you suspect the account has been compromised, scope potentially compromised assets by tracking Amazon Bedrock model access, prompts generated, and responses to the prompts by the account in the last 24 hours.\n\n### False positive analysis\n\n- Verify the user account that caused multiple policy violations, is not testing any new model deployments or updated compliance policies in Amazon Bedrock guardrails.\n\n### Response and remediation\n\n- Initiate the incident response process based on the outcome of the triage.\n- Disable or limit the account during the investigation and response.\n- Identify the possible impact of the incident and prioritize accordingly; the following actions can help you gain context:\n - Identify the account role in the cloud environment.\n - Identify if the attacker is moving laterally and compromising other Amazon Bedrock Services.\n - Identify any regulatory or legal ramifications related to this activity.\n- Review the permissions assigned to the implicated user group or role behind these requests to ensure they are authorized and expected to access bedrock and ensure that the least privilege principle is being followed.\n- Determine the initial vector abused by the attacker and take action to prevent reinfection via the same vector.\n- Using the incident response data, update logging and audit policies to improve the mean time to detect (MTTD) and the mean time to respond (MTTR).\n",
"query": "from logs-aws_bedrock.invocation-*\n\n// Expand multi-value policy action field\n| mv_expand gen_ai.policy.action\n\n// Filter for policy-blocked requests\n| where gen_ai.policy.action == \"BLOCKED\"\n\n// count number of policy matches per request (multi-valued)\n| eval Esql.ml_policy_violations_mv_count = mv_count(gen_ai.policy.name)\n\n// Filter for requests with more than one policy match\n| where Esql.ml_policy_violations_mv_count > 1\n\n// keep relevant fields\n| keep\n gen_ai.policy.action,\n Esql.ml_policy_violations_mv_count,\n user.id,\n gen_ai.request.model.id,\n cloud.account.id\n\n// Aggregate requests with multiple violations\n| stats\n Esql.ml_policy_violations_total_unique_requests_count = count(*)\n by\n Esql.ml_policy_violations_mv_count,\n user.id,\n gen_ai.request.model.id,\n cloud.account.id\n\n// sort by number of unique requests\n| sort Esql.ml_policy_violations_total_unique_requests_count desc\n",
"references": [
"https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-components.html",
"https://atlas.mitre.org/techniques/AML.T0051",
"https://atlas.mitre.org/techniques/AML.T0054",
"https://www.elastic.co/security-labs/elastic-advances-llm-security"
],
"related_integrations": [
{
"package": "aws_bedrock",
"version": "^1.1.0"
}
],
"required_fields": [
{
"ecs": false,
"name": "Esql.ml_policy_violations_mv_count",
"type": "integer"
},
{
"ecs": false,
"name": "Esql.ml_policy_violations_total_unique_requests_count",
"type": "long"
},
{
"ecs": true,
"name": "cloud.account.id",
"type": "keyword"
},
{
"ecs": false,
"name": "gen_ai.request.model.id",
"type": "keyword"
},
{
"ecs": true,
"name": "user.id",
"type": "keyword"
}
],
"risk_score": 21,
"rule_id": "f4c2515a-18bb-47ce-a768-1dc4e7b0fe6c",
"setup": "## Setup\n\nThis rule requires that guardrails are configured in AWS Bedrock. For more information, see the AWS Bedrock documentation:\n\nhttps://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-create.html\n",
"severity": "low",
"tags": [
"Domain: LLM",
"Data Source: AWS Bedrock",
"Data Source: AWS S3",
"Resources: Investigation Guide",
"Use Case: Policy Violation",
"Mitre Atlas: T0051",
"Mitre Atlas: T0054"
],
"timestamp_override": "event.ingested",
"type": "esql",
"version": 6
}
detection-rules on 5440-fr-unit-test-for-keep-metadata [$?] is 📦 v1.5.25 via 🐍 v3.12.8 (.venv) on ☁️ shashank.suryanarayana@elastic.co took 8s
❯ python -m detection_rules view-rule rules/windows/defense_evasion_posh_obfuscation_high_number_proportion.toml
Loaded config file: /Users/shashankks/elastic_workspace/detection-rules/.detection-rules-cfg.json
█▀▀▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄ ▄ █▀▀▄ ▄ ▄ ▄ ▄▄▄ ▄▄▄
█ █ █▄▄ █ █▄▄ █ █ █ █ █ █▀▄ █ █▄▄▀ █ █ █ █▄▄ █▄▄
█▄▄▀ █▄▄ █ █▄▄ █▄▄ █ ▄█▄ █▄█ █ ▀▄█ █ ▀▄ █▄▄█ █▄▄ █▄▄ ▄▄█
/Users/shashankks/elastic_workspace/detection-rules/detection_rules/index_mappings.py:366: ElasticsearchWarning: No limit defined, adding default limit of [1000]
response = elastic_client.esql.query(query=query)
{
"author": [
"Elastic"
],
"description": "Identifies PowerShell scripts with a disproportionately high number of numeric characters, often indicating the presence of obfuscated or encoded payloads. This behavior is typical of obfuscation methods involving byte arrays, character code manipulation, or embedded encoded strings used to deliver and execute malicious content.",
"from": "now-9m",
"language": "esql",
"license": "Elastic License v2",
"name": "Potential PowerShell Obfuscation via High Numeric Character Proportion",
"note": " ## Triage and analysis\n\n> **Disclaimer**:\n> This investigation guide was created using generative AI technology and has been reviewed to improve its accuracy and relevance. While every effort has been made to ensure its quality, we recommend validating the content and adapting it to suit your specific environment and operational needs.\n\n### Investigating Potential PowerShell Obfuscation via High Numeric Character Proportion\n\nPowerShell is a powerful scripting language used for system administration, but adversaries exploit its capabilities to obfuscate malicious scripts. Obfuscation often involves encoding payloads using numeric characters, making detection challenging. The detection rule identifies scripts with a high proportion of numeric characters, signaling potential obfuscation. By analyzing script length and numeric density, it flags suspicious activity, aiding in defense evasion detection.\n\n### Possible investigation steps\n\n- Review the script block text from the alert to understand the context and identify any obvious signs of obfuscation or malicious intent.\n- Examine the file path and host name fields to determine the origin and location of the script execution, which can help assess the potential impact and scope.\n- Check the user ID and agent ID fields to identify the user and system involved, which may provide insights into whether the activity is expected or suspicious.\n- Analyze the powershell.sequence and powershell.total fields to understand the sequence of script execution and the total number of scripts executed, which can indicate whether this is part of a larger pattern of behavior.\n- Investigate any related logs or alerts from the same host or user to identify patterns or correlations that might suggest broader malicious activity.\n\n### False positive analysis\n\n- Scripts with legitimate numeric-heavy content such as data processing or mathematical calculations may trigger the rule. To handle this, identify and whitelist specific scripts or script patterns that are known to be safe.\n- Automated scripts that generate or manipulate large datasets often contain high numeric content. Consider creating exceptions for scripts executed by trusted users or from known safe directories.\n- PowerShell scripts used for legitimate software installations or updates might include encoded data blocks. Review and exclude these scripts by verifying their source and purpose.\n- Scripts containing large hexadecimal strings for legitimate purposes, such as cryptographic operations, may be flagged. Use the exclusion pattern to filter out these known safe operations.\n- Regularly review and update the exclusion list to ensure it reflects the current environment and any new legitimate scripts that may be introduced.\n\n### Response and remediation\n\n- Immediately isolate the affected host to prevent further execution of potentially malicious scripts and limit lateral movement within the network.\n- Review the PowerShell script block text and script block ID to identify any malicious payloads or encoded strings. If confirmed malicious, remove or quarantine the script.\n- Conduct a thorough scan of the isolated host using updated antivirus and anti-malware tools to detect and remove any additional threats or remnants of the obfuscated script.\n- Analyze the file path and user ID associated with the script execution to determine if unauthorized access or privilege escalation occurred. Revoke any suspicious user access and reset credentials if necessary.\n- Escalate the incident to the security operations center (SOC) for further investigation and correlation with other alerts to assess the scope and impact of the threat across the network.\n- Implement enhanced monitoring and logging for PowerShell activities on all endpoints to detect similar obfuscation attempts in the future, focusing on scripts with high numeric character proportions.\n- Review and update endpoint protection policies to restrict the execution of scripts with high numeric density, ensuring compliance with security best practices and reducing the risk of obfuscation-based attacks.\n",
"query": "from logs-windows.powershell_operational* metadata _id, _version, _index\n| where event.code == \"4104\"\n\n// Filter out smaller scripts that are unlikely to implement obfuscation using the patterns we are looking for\n| eval Esql.script_block_length = length(powershell.file.script_block_text)\n| where Esql.script_block_length > 1000\n\n// replace the patterns we are looking for with the \ud83d\udd25 emoji to enable counting them\n// The emoji is used because it's unlikely to appear in scripts and has a consistent character length of 1\n| eval Esql.script_block_tmp = replace(powershell.file.script_block_text, \"\"\"[0-9]\"\"\", \"\ud83d\udd25\")\n\n// count how many patterns were detected by calculating the number of \ud83d\udd25 characters inserted\n| eval Esql.script_block_pattern_count = Esql.script_block_length - length(replace(Esql.script_block_tmp, \"\ud83d\udd25\", \"\"))\n\n// Calculate the ratio of special characters to total length\n| eval Esql.script_block_ratio = Esql.script_block_pattern_count::double / Esql.script_block_length::double\n\n// keep the fields relevant to the query, although this is not needed as the alert is populated using _id\n| keep\n Esql.script_block_pattern_count,\n Esql.script_block_ratio,\n Esql.script_block_length,\n Esql.script_block_tmp,\n powershell.file.*,\n file.directory,\n file.path,\n powershell.sequence,\n powershell.total,\n _id,\n _version,\n _index,\n host.name,\n agent.id,\n user.id\n\n// Filter for scripts with high numeric character ratio\n| where Esql.script_block_ratio > 0.35\n\n// Exclude Windows Defender Noisy Patterns\n| where not (\n file.directory == \"C:\\\\ProgramData\\\\Microsoft\\\\Windows Defender Advanced Threat Protection\\\\Downloads\" or\n file.directory like (\n \"C:\\\\\\\\ProgramData\\\\\\\\Microsoft\\\\\\\\Windows Defender Advanced Threat Protection\\\\\\\\DataCollection*\",\n \"C:\\\\\\\\Program Files\\\\\\\\SentinelOne\\\\\\\\Sentinel Agent*\"\n )\n )\n // ESQL requires this condition, otherwise it only returns matches where file.directory exists.\n or file.directory is null\n| where not powershell.file.script_block_text like \"*[System.IO.File]::Open('C:\\\\\\\\ProgramData\\\\\\\\Microsoft\\\\\\\\Windows Defender Advanced Threat Protection\\\\\\\\DataCollection*\"\n| where not powershell.file.script_block_text : \"26a24ae4-039d-4ca4-87b4-2f64180311f0\"\n",
"related_integrations": [
{
"package": "windows",
"version": "^3.0.0"
}
],
"required_fields": [
{
"ecs": false,
"name": "Esql.script_block_length",
"type": "integer"
},
{
"ecs": false,
"name": "Esql.script_block_pattern_count",
"type": "integer"
},
{
"ecs": false,
"name": "Esql.script_block_ratio",
"type": "double"
},
{
"ecs": false,
"name": "Esql.script_block_tmp",
"type": "keyword"
},
{
"ecs": false,
"name": "_id",
"type": "keyword"
},
{
"ecs": false,
"name": "_index",
"type": "keyword"
},
{
"ecs": false,
"name": "_version",
"type": "long"
},
{
"ecs": true,
"name": "agent.id",
"type": "keyword"
},
{
"ecs": true,
"name": "file.directory",
"type": "keyword"
},
{
"ecs": true,
"name": "file.path",
"type": "keyword"
},
{
"ecs": true,
"name": "host.name",
"type": "keyword"
},
{
"ecs": false,
"name": "powershell.file.script_block_entropy_bits",
"type": "double"
},
{
"ecs": false,
"name": "powershell.file.script_block_hash",
"type": "keyword"
},
{
"ecs": false,
"name": "powershell.file.script_block_id",
"type": "keyword"
},
{
"ecs": false,
"name": "powershell.file.script_block_length",
"type": "long"
},
{
"ecs": false,
"name": "powershell.file.script_block_surprisal_stdev",
"type": "double"
},
{
"ecs": false,
"name": "powershell.file.script_block_text",
"type": "text"
},
{
"ecs": false,
"name": "powershell.file.script_block_unique_symbols",
"type": "long"
},
{
"ecs": false,
"name": "powershell.sequence",
"type": "long"
},
{
"ecs": false,
"name": "powershell.total",
"type": "long"
},
{
"ecs": true,
"name": "user.id",
"type": "keyword"
}
],
"risk_score": 21,
"rule_id": "f9abcddc-a05d-4345-a81d-000b79aa5525",
"setup": "## Setup\n\nThe 'PowerShell Script Block Logging' logging policy must be enabled.\nSteps to implement the logging policy with Advanced Audit Configuration:\n\n```\nComputer Configuration >\nAdministrative Templates >\nWindows PowerShell >\nTurn on PowerShell Script Block Logging (Enable)\n```\n\nSteps to implement the logging policy via registry:\n\n```\nreg add \"hklm\\SOFTWARE\\Policies\\Microsoft\\Windows\\PowerShell\\ScriptBlockLogging\" /v EnableScriptBlockLogging /t REG_DWORD /d 1\n```\n",
"severity": "low",
"tags": [
"Domain: Endpoint",
"OS: Windows",
"Use Case: Threat Detection",
"Tactic: Defense Evasion",
"Data Source: PowerShell Logs",
"Resources: Investigation Guide"
],
"threat": [
{
"framework": "MITRE ATT&CK",
"tactic": {
"id": "TA0005",
"name": "Defense Evasion",
"reference": "https://attack.mitre.org/tactics/TA0005/"
},
"technique": [
{
"id": "T1027",
"name": "Obfuscated Files or Information",
"reference": "https://attack.mitre.org/techniques/T1027/"
},
{
"id": "T1140",
"name": "Deobfuscate/Decode Files or Information",
"reference": "https://attack.mitre.org/techniques/T1140/"
}
]
},
{
"framework": "MITRE ATT&CK",
"tactic": {
"id": "TA0002",
"name": "Execution",
"reference": "https://attack.mitre.org/tactics/TA0002/"
},
"technique": [
{
"id": "T1059",
"name": "Command and Scripting Interpreter",
"reference": "https://attack.mitre.org/techniques/T1059/",
"subtechnique": [
{
"id": "T1059.001",
"name": "PowerShell",
"reference": "https://attack.mitre.org/techniques/T1059/001/"
}
]
}
]
}
],
"timestamp_override": "event.ingested",
"type": "esql",
"version": 8
}
detection-rules on 5440-fr-unit-test-for-keep-metadata [$?] is 📦 v1.5.25 via 🐍 v3.12.8 (.venv) on ☁️ shashank.suryanarayana@elastic.co took 9s
❯ python -m detection_rules view-rule rules/windows/defense_evasion_posh_obfuscation_high_number_proportion.toml
Loaded config file: /Users/shashankks/elastic_workspace/detection-rules/.detection-rules-cfg.json
█▀▀▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄ ▄ █▀▀▄ ▄ ▄ ▄ ▄▄▄ ▄▄▄
█ █ █▄▄ █ █▄▄ █ █ █ █ █ █▀▄ █ █▄▄▀ █ █ █ █▄▄ █▄▄
█▄▄▀ █▄▄ █ █▄▄ █▄▄ █ ▄█▄ █▄█ █ ▀▄█ █ ▀▄ █▄▄█ █▄▄ █▄▄ ▄▄█
Error loading rule in /Users/shashankks/elastic_workspace/detection-rules/rules/windows/defense_evasion_posh_obfuscation_high_number_proportion.toml
Error (EsqlSemanticError): Rule: Potential PowerShell Obfuscation via High Numeric Character Proportion contains a keep clause without metadata fields '_id', '_version', and '_index' -> Add '_id', '_version', '_index' to the keep command.
detection-rules on 5440-fr-unit-test-for-keep-metadata [$!?] is 📦 v1.5.25 via 🐍 v3.12.8 (.venv) on ☁️ shashank.suryanarayana@elastic.co
❯
Pull Request
Issue link(s):
Resolves #5440
Related to #5439
Summary - What I changed
I added a schema test to enforce adding metadata fields to the
keepportion of non-aggregate ES|QL queries.Note unit tests will continue to fail until the rules are updated to match the new
keeprequirement.How To Test
Use view rule on an ES|QL rule that does not have the metadata fields
_id, _version, _indexin itskeepline.E.g.
Checklist
bug,enhancement,schema,maintenance,Rule: New,Rule: Deprecation,Rule: Tuning,Hunt: New, orHunt: Tuningso guidelines can be generatedmeta:rapid-mergelabel if planning to merge within 24 hoursContributor checklist