Automated Anomaly-Detection in DNS Zones

TL;DR — I open-sourced Dnsbeat, a custom Elastic Beat which periodically parses DNS zones using zone transfers. It supports advanced configurations such as outbound IP address selection, TSIG authentication and fail-over name servers. A proof-of-concept is available.

Are you aware of what happens in your DNS zones? And in the ones of your company? According to the US CERT, attackers have already successfully redirected and intercepted web and mail traffic by compromising name servers. Besides the usual mitigations of credential rotation and 2FA usage, the US CERT advises organizations to audit DNS records. And to do so, I open-sourced Dnsbeat.

The objective of Dnsbeat is to perform regular zone transfers of the monitored domains. All entries of the zone are then parsed and send to the Elastic Stack in order to reflect the state of the zone.

+--------------+   Zone Transfers   +---------+ Entry State Events +---------------+
| Name Servers +------------------->+ Dnsbeat +------------------->+ Elastic Stack |
+--------------+                    +---------+                    +---------------+

Features

A single Dnsbeat instance can monitor multiple zones, each with their completely independent configuration. Each zone supports the configuration of:

The optional outbound IP address to use for the zone transfer request.
A list of name servers to contact for the zone transfer, each with an optional configuration of TSIG authentication.
An optional file path for instances where the zone is recovered through other means (such as Azure PowerShell for Microsoft Azure-hosted zones).
The polling period as well as different server-related timeouts.

The processing of the entries retrieved by Dnsbeat is fully passive and PAP:RED-compliant, no ping, nslookup or other operations are performed. Organisations looking at performing PAP:GREEN or lower actions can configure Dnsbeat to use the Elastic Logstash. Sending events through Logstash can enrich them through filters such as Geoip, which locates IPs, and Dns, which resolves domains such as those referred to by CNAME entries.

Finally, Dnsbeat ships with multiple default Kibana dashboards, allowing you to start monitoring as soon as results flow in.

Figure 1: A Screenshot of the Kibana Dnsbeat Overview Dashboards

Limitations

Some limitations do apply. Given the working of zone transfers, Dnsbeat must be configured for each zone as, as I far as I’m aware of, there is no such thing as a wildcard transfer. Additionally, per RFC 5936, all zone transfers must occur over TCP. This requires the used name servers to allow inbound TCP on port 53 or any other configured port.

Another notable limitation, although this might be implemented at a later stage, is the absence of incremental zone transfers and updates. The current working of Dnsbeat is to reflect the state of the zone, rather than changes made to it. This has the implicit effect that new IoCs can be cross-referenced with the most recent zone states.

A Honeypot as Proof-of-Concept

To demonstrate the usage of Dnsbeat, let’s build a “too permissive” honeypot name server. The objective of this proof-of-concept will be to detect anomalies in the name server’s state.

Architecture

The core architecture we will use is the same as the one we described initially. A single name server is configured to authorize the outbound transfer of our sample honeypot.local domain to the Dnsbeat host. The Dnsbeat host is then itself configured to transfer any monitored domain into an Elastic Stack.

The vulnerability we will introduce is a too permissive allow-update clause in the Bind9 name server’s configuration, authorizing malicious actors to add and delete records from our zone.

zone "honeypot.local." IN {
      type master;
      file "/etc/bind/master/db.honeypot.local";
      allow-update {
            any;
      };
};

Readers comfortable with docker-compose are welcome to play with the final setup, which includes the to-be introduced improvements.

Adding Statistical Analysis

Detecting unwanted DNS states is easy on small zones… but how about large and numerous zones?

The art of statistical analysis can help us identify rare combinations as well as abnormal values. The ee-outliers tool, build by NVISO, is a handy Elasticsearch-based software which allows us to build models to identify outliers. As shown below, ee-outliers regularly polls our Elastic Stack given a base query, computes statistical models and tags outlier records.

                 Statistical Queries
+---------------+------------------->+-------------+
| Elastic Stack |                    | ee-outliers |
+---------------+<-------------------+-------------+
                    Outlier Events

There is an infinite amount of things one can monitor through DNS statistical analysis. The following examples outlines some basic use cases which do apply to most organizations.

To create additional models and fully understand the following samples, take some time to understand the ee-outliers configuration.

Rare Entry

In a typical company, any record change is worth investigating as most DNS records are usually stale. Some of the exceptions are the widely used RRSIG records, which must often be rotated, and SOA record, which should be incremented at each change.

The following statistical model aggregates all stale records based on the event.dataset field which matches the zone’s domain (honeypot.local. in our case). It then flags any record whose event.original field, which is the zone’s record, is observed in less than 10% of the maximum observed cases.

In short, this model will trigger on any new record which is not of the RRSIG or SOA type.

[terms_rare_record]
es_query_filter=NOT dns.type: ("RRSIG" OR "SOA")

aggregator=event.dataset
target=event.original
target_count_method=within_aggregator
trigger_on=low
trigger_method=pct_of_max_value
trigger_sensitivity=10

outlier_type=Rare Value
outlier_reason=Rare Record
outlier_summary=Rare record for '{event.dataset}'

As this model will flag any changed record, alerts generated from it should be considered as a low severity ones.

Rare CNAME Target

Companies often rely on a select set of hosting providers such as Amazon Web Services or Microsoft Azure. Detecting a variation or anomaly in hosting providers can be done by monitoring CNAME targets’ second-level domain (dns.rdata.sld) for rare values.

The following outliers model creates buckets per monitored domain (dns.sld) and flags any canonical name record’s target (dns.rdata.sld) which is observed in less than 10% of the most observed cases.

[terms_rare_cname_target]
es_query_filter=dns.type: "CNAME" AND _exists_: dns.rdata.sld AND _exists_: dns.sld

aggregator=dns.sld
target=dns.rdata.sld
target_count_method=within_aggregator
trigger_on=low
trigger_method=pct_of_max_value
trigger_sensitivity=10

outlier_type=Rare Value
outlier_reason=Rare Target
outlier_summary=Rare target '{dns.rdata.sld}' for '{dns.sld}'

Alerts generated from this model should be of medium severity as variations are possible but could also have a high impact.

Rare Mail Entry

Some entries only change exceptionally such as mail exchanger (MX) records or text (TXT) records related to email authentication (SPF, DKIM and DMARC).

The following model aggregates all mail-related records based on the event.dataset field which matches the zone’s domain. It then flags any record whose event.original field, which is the zone’s record, is observed in less than 10% of the maximum observed cases.

In short, this model will act as the terms_rare_record model, only does it monitor mail-related records.

[terms_rare_mail_record]
es_query_filter=dns.type: "MX" OR (dns.type: "TXT" and dns.rdata.txt: ("*spf*" OR "*DKIM*" OR "*DMARC*"))

aggregator=event.dataset
target=event.original
target_count_method=within_aggregator
trigger_on=low
trigger_method=pct_of_max_value
trigger_sensitivity=10

outlier_type=Rare Value
outlier_reason=Rare Mail Record
outlier_summary=Rare '{dns.type}' mail-related record for '{event.dataset}'

Obviously, any alerts resulting from this model should be treated as high severity alerts and be prioritized for investigation.

Sample Results

To exploit the vulnerability in our honeypot, we can use the nsupdate command which will launch the dynamic DNS update utility. From there, the server command will allow us to target the honeypot name server, which in our case is available at 127.0.0.1:5053.

nsupdate
server 127.0.0.1 5053

Lets simulate the injection of a MX record of higher priority to hijack incoming mails.

update add honeypot.local 300 MX 1 malicious.example.com.
send

After the outliers have executed, which takes up to 1 minute in our proof-of-concept code, you can search for entries whose outliers.reason is “Rare Mail Record”. As observable in the following image, our malicious MX record has been flagged as “Rare Mail Record” as well as “Rare Record”.

Figure 2: A Screenshot of the Kibana Rare Mail Record Results

Altering a CNAME Record

To test our “Rare CNAME Target” model, lets alter the lorem.honeypot.local domain to point to a previously unseen target.

update delete lorem.honeypot.local CNAME
update add lorem.honeypot.local 300 CNAME malicious.example.com.
send

After the outliers have processed the zone’s state, you can search for entries whose outliers.reason is “Rare CNAME Target”. As observable in the beneath capture, our malicious CNAME record has been flagged as “Rare CNAME Target” as well as “Rare Record”.

Figure 3: A Screenshot of the Kibana Rare Target Results

Additional Remarks

Throughout the building of the proof-of-concept, we relied on a trigger_sensitivity of 10%. This number has been chosen to work nicely with our zone configuration. If you are looking to apply the same architecture in production, do not hesitate to fine-tune this number to reach the desired balance between false positives and false negatives.

The proof-of-concept does not cover the alert generation due to the complexity of automating its deployment. Readers looking into integrating the above-generated outliers should have a look at YELP’s Elastalert tool which supports many alerting methods (Email, Jira, Microsoft Teams, Slack, TheHive, …).