Automated Sigma Rule Generation from MISP Threat Intelligence

TL;DR — I open-sourced Sigma Importer, a way of generating thousands of qualitative vendor-agnostic Sigma rules from threat intelligence feeds such as MISP.

Whether it is for general data-sciences or more specifically for threat intelligence, collecting data is hard but using it wisely is even complexer.

The open-source community has brought us many solutions to problems which arose throughout the constant fight with threat actors. Although some projects are less known, some are widespread and actively used within the infosec community. Readers might recognize projects such as MISP, formerly known as Malware Information Sharing Platform, which enables organizations to share threat intelligence; or Sigma, another open-source project which provides a generic and open signature format to share vendor-agnostic detection logic.

Throughout the years, my current employer NVISO has benefited from many similar open-source projects and today I have the pleasure to share back. This post will introduce the Sigma Importer – a way of generating thousands of qualitative vendor-agnostic Sigma rules from threat intelligence feeds such as MISP.

For the impatient ones like me, you can directly skip to the presentation as well as the project itself.

The Why

Where MISP has been focused on sharing threat intelligence, Sigma has been focused on very advanced and fine-tuned detection. The projects are not related at all and even their usage targets different use-cases.

MISP provides tons of Indicators of Compromise (later referred-to as “IoC”) shared as a constant feed. This constant feed is extremely valuable to rapidly spread threat intelligence among the community. Using these indicators wisely is not straight-forward and often ends up being implemented in a “Spray and Pray” approach.

As one can easily guess, this approach often leads to many false positives. Either the provided intelligence is incorrect (you would be surprised by the amount of times Google’s DNS is flagged as an IoC), or the detection logic is missing additional context. Although the former problem can be fixed by ensuring your MISP community “thinks twice”, the latter is more complex to handle. As an example, given a malicious shortened URL (protocol, domain, path and parameters) reported by MISP, it would make no sense to generate alerts if the only IoC observed on an endpoint is the URL shortener’s domain.

Figure 1: A schema of individual MISP attributes generating false positives

Sigma, on the other hand, gives us a more robust detection method at the cost of speed. The rules part of Sigma are until now generated manually and need a thorough development to exclude false positives. While MISP focused more on the “what” (file names, hashes, IPs, domains, …), Sigma focused on the “how” (access masks, parent-child relationships, command-line arguments, …).

As Sigma aims to detect specific behaviors rather than simple observables (as opposed to MISP), known threats will only be detected at execution time if its behavior is among the few ones for which there exist detection rules. Furthermore, the fewer Sigma rules and longer development cycle inherently means that recent and/or uncommon threats are often not detected timely. As an example, Sigma rules detecting known malware keywords won’t trigger on new variants unless someone manually improves the detection rules.

Figure 2: A schema of Sigma rules generating false negatives

While MISP is more likely to generate false positives, Sigma’s rules are proner to false negatives. The quality-quantity balance between MISP and Sigma is easily imaginable when you realize MISP has hundreds of thousands of individual IoCs compared to Sigma’s hundreds of manually crafted and tested rules.

So why not take the best of two worlds?

Generating Sigma Rules from MISP Threat Intelligence

With NVISO’s Managed Detection & Response customer base constantly increasing, we recently hit the milestone where some additional optimization was needed to drastically reduce the amount of false-positives needing manual analyst-triage. Many improvements were performed (context-based whitelisting, general rule improvements, …) but one of the giants I also wanted to tame was the MISP threat intelligence matching. As introduced in the first part of this post, Indicators of Compromise retrieved through MISP are not of a sufficient quality if handled individually. We need to mix these single IoCs (MISP attributes) with additional context which can be provided by MISP objects and events.

Today I am releasing Sigma Importer (a.k.a. “sigmai”, in reference to the Sigma Compiler “sigmac”). This small utility tool is designed to do the exact opposite of Sigma by converting vendor-specific data sources (i.e. MISP) into the Sigma generic and open signature format. By leveraging the Sigma format we can ensure our threat intelligence is matched with a relevant detection logic while targeting the entire Sigma arsenal (ArcSight, Elastic, QRadar, Splunk, Microsoft Defender ATP, …).

Figure 3: A schema of Sigma Importer generating Sigma rules from MISP threat intelligence

Demo or It Didn’t Happen…

… I’ve always wished to yell; So here we go! To install sigmai, you can either grab the binaries or, with Go installed, simply build it from source as follow:

go get github.com/0xThiebaut/sigmai

Once the above command ran, you should be able to run the sigmai utility. The following minimal example will import a specific event (i.e. 60870).

sigmai -s misp --misp-url https://misp.thiebaut.dev/ --misp-key "SECRET" --misp-events 60870 -q

Once the import is done, you will be presented a quite lengthy rule covering different log-sources. The below snippet is a trimmed version giving you an idea of how a multi-document Sigma rule generated by sigmai is structured.

action: global
title: 'OSINT: Emissary Panda – A potential new malicious tool'
id: 5b0562d3-8460-4482-93c4-05a3ac12042b
status: experimental
description: See MISP event 60870
author: dcso.de
level: medium
tags:
- APT
- tlp:white
- iep:traffic-light-protocol="WHITE"
- DCSO:tie="ALL"
- DCSO:sharing="PUBLIC"
- osint:source-type="blog-post"
---
logsource:
  category: firewall
detection:
  condition: event60870
  event60870:
    dst_ip:
    - 159.65.80.157
    - 103.59.144.183
---
logsource:
  product: windows
detection:
  condition:
  - event60870
  - all of event60870attr2049468mapping*
  - event60870object33488
  - event60870object33489
  - event60870object33490 and all of event60870object33490attr2049523mapping*
  - event60870object33491 and all of event60870object33491attr2049526mapping*
  event60870:
    DestinationIp:
    - 159.65.80.157
    - 103.59.144.183
  event60870attr2049468mappingFilename:
  - Image|endswith: '%APPDATA%\systemconfig\sys.bin.url'
  - ParentImage|endswith: '%APPDATA%\systemconfig\sys.bin.url'
  - CommandLine|contains: '%APPDATA%\systemconfig\sys.bin.url'
  - ParentCommandLine|contains: '%APPDATA%\systemconfig\sys.bin.url'
  - ProcessName: '%APPDATA%\systemconfig\sys.bin.url'
  - ParentProcessName: '%APPDATA%\systemconfig\sys.bin.url'
  event60870object33488:
    Hashes|contains:
    - c69d60b82252b6e7eaaeb710d5e1ebe5
    - 4c0211c91b4b9f99e52f4d385e6e3960b321a3b0
    - 4d65d371a789aabe1beadcc10b38da1f998cd3ec87d4cc1cfbf0af014b783822
    - 768:NHO6X9W62QIPe1HhDIRmnTEDtcvyvfSl0zeM:NHOymWBDLYg0zB
  event60870object33489:
    Hashes|contains: 93b972951685b4ae284583dbc3959725
  event60870object33490:
    Hashes|contains: 2b2bb4c132d808572f180fe4db3a0a3143a37fdece667f8e78778ee1e9717606
  event60870object33490attr2049523mappingFilename:
  - Image|endswith: sys.bin.url
  - ParentImage|endswith: sys.bin.url
  - CommandLine|contains: sys.bin.url
  - ParentCommandLine|contains: sys.bin.url
  - ProcessName: sys.bin.url
  - ParentProcessName: sys.bin.url
  event60870object33491:
    Hashes|contains: 3e718f39dfb2f6b8fba366fefa8b7c127db1e6795f3caad2d4a9f3753eea0adc
  event60870object33491attr2049526mappingFilename:
  - Image|endswith: sys.bin.url
  - ParentImage|endswith: sys.bin.url
  - CommandLine|contains: sys.bin.url
  - ParentCommandLine|contains: sys.bin.url
  - ProcessName: sys.bin.url
  - ParentProcessName: sys.bin.url
---
// Many more log-sources (proxy, webserver, ...) are trimmed for readability...

Convinced it works, but still want more? Let me tell you what makes sigmai better than an average Python script…

The Power of Sigma Importer

The Quality

First and foremost, the generated rules are “good”. But what does this really mean?

Mapping threat intelligence to Sigma rules is far from a simple 1:1 mapping. For starters, some MISP fields do translate to multiple Sigma fields (i.e. MISP’s domain could be Sigma’s c-uri, cs-referrer, r-dns, …). Similarly, some Sigma fields can represent multiple MISP fields (i.e. Sigma’s Hashes|contains could be one of MISP’s md5, sha256, ssdeep, …). Although this already introduces N:N mappings, we do have more edge-cases.

Some MISP attributes are composed and hence cause composed Sigma fields themselves. If we take as example MISP’s filename|md5 attribute, Sigma would represent it as the Hashes|contains attribute combined with any of Image|endswith, CommandLine|contains, ProcessName, and many more. This additional complexity just turned our mapping into a conditional N:N mapping.

With the attribute mappings covered above, we can now add an additional layer which are the MISP objects. These objects are grouped sets of attributes representing a complex IoC. As such, MISP can represent a file by creating an object containing multiple IoCs (i.e. filename, md5, sha256, …). If translated to Sigma, additional conditions would need to apply as we aim to match the filename and any of the hashes. The presence of these objects adds an additional layer of conditions to the already conditional N:N mapping.

So how can sigmai handle this? I would say an “algorithm”.

Algorithm: Something programmers use when they don’t want to explain their code.
Definition by Urban Dictionary

With over 200 lines of conversion magic, sigmai manages to perform smart translations resulting in optimized rules which nonetheless are human-friendly. The generation of qualitative rules which are both optimized and readable is what I truly qualify as “good”.

The Features

Besides generating good rules, sigmai supports a wide variety of features. As I am already impressed you reached this far, I will sum up what’s further described on the sigmai repository. Most notably, you should find the following features more than useful for day-to-day analysis or incident response:

Customize (set, clear, add, remove) the rules’ level, status and tags.
Run sigmai as daemon to continuously import your latest threat intelligence.
Select what you import: Either specific events or those matching time ranges, tags, severities and/or keywords (i.e. your ongoing incident’s malware family).
Filter what you import: Warning-lists, published events or IDS attributes, sigmai lets you select exactly what you need.
Choose how to save the rules, either to separate files or to the standard output.

The Optimization

I designed sigmai to be aimed at production infrastructure from start, meaning lots of stuff has been optimized. As I am sure interested readers will dive in our code, I’ll just briefly mention some nice sigmai internals which are critical when we are talking about hundreds of thousands of events to import.

First and foremost, sigmai does concurrency. With the default settings, 20 jobs are safely running alongside each-other to import and convert events into rules. Testing showed that sigmai is powerful enough to slow-down or error MISP while fetching data. If your instance is boosted on RAM, I would definitely suggest you to enjoy some more workers.

To ensure your MISP instance can breathe, sigmai fetches data using pages with a size defaulting to 500. As suggested for the concurrency, feel free to play with these values if your instance can handle more through the appropriate sigmai command-line flags.

Finally, the above wouldn’t make sense without streams (and channels in Go). To ensure sigmai has the lowest footprint possible, I aimed to keep as few things as possible in memory. Concretely, this means that while we are still reading data from MISP using streams, the generated Sigma rules are already being outputted using Go’s channels (a stream equivalence). You could say we get the information in as quickly as possible while getting rid of it even quicker.

The Importance of a Meaningful Threat Intelligence Feed

Although I would have dreamed of the opposite, sigmai is not a silver bullet. The quality of the rules generated will never surpass the quality of your threat intelligence feed. Way too often we observe false positives caused by automated or reckless submissions which an analyst has to manually fix afterwards. To make the most out of sigmai, curate your threat intelligence feeds and make sure to have a decent multi-MISP architecture to vet incoming events.

Further Expanding the Open-Source Horizon

While I’m thrilled with sigmai, lots of improvements are yet to be made to further improve the industry’s detection capabilities. On sigmai’s side, other sources besides MISP could be implemented (STIX, commercial feeds, …). While I’ll leave the community to express their needs, I’ll continue driving my research towards other ideas…