JSON All the Logs!
My recent obsession has been creating all of my logs in JSON format. The reasons for that are pretty simple: I like to log with Elasticsearch, so creating JSON formatted logs makes working with Elasticsearch easier. Command line tools like 'jq' make parsing JSON logs on the command line simpler than "good old" standard Syslog format and a string of 'cut,' 'sed,' and 'awk' commands.
Before going into examples, first a few caveats when it comes to creating JSON logs:
Proper Output Encoding
Make sure you properly encode special characters in JSON. For example, characters like '{, ''},' quotes, slashes, and such need to be escaped. Some log generation tools will do it for you (see examples below)
Proper and Consistent Types
This comes down to quotes or no quotes around numbers. If you want a value to be treated as a number: Do not enclose it in quotes. But if a value isn't always a number (consider what happens if the field doesn't exist), it should be enclosed in quotes. For example, port numbers are... usually... numbers. But what if it is an ICMP packet and there are no ports? Or, let's say a firewall log showing a fragment that isn't the first fragment getting blocked? I have seen people use "0" as a port, which works, but can lead to confusion.
Consistent Names
Downstream processing is a lot easier if the source IP address is always called "srcip" (or whatever name you come up with). This will reduce the effort needed to normalize naming later. Of course, depending on the software originating the logs, you do not always have a choice.
Your Downstream Log Processing Pipeline
I mentioned Elasticsearch above, and it loves JSON. But many systems process logs and yours may not be ready to process JSON. Converting from JSON to the format of choice may still be more straightforward than writing hundreds of different Logstash rules, but if you are already set up for whatever format you use, the change may not be worth it.
Creating JSON logs can be pretty simple, depending on your software. Here are a few examples:
Nginx
The Nginx web server has a helper function to escape JSON properly. To output logs in JSON format, use something like this:
log_format main escape=json '{"host": $http_host", "srcip": "$remote_addr", time_local": "$time_local", "request": "$request" ...}';
Apache
Apache uses a similar log format directive. But there is no simple JSON escape. I have to test Apache a bit more to see if it will escape various characters appropriately.
LogFormat "{ \"time\":\"%t\", \"srcip\":\"%a\", \"host\":\"%V\", \"request\":\"%U\", \"query\":\"%q\",...}" json
rsyslog
Recent versions of rsyslog do support JSON encoded logs. You will have to define a custom template like:
template(name="outfmt" type="list" option.jsonf="on") {
property(outname="@timestamp"
name="timereported"
dateFormat="rfc3339" format="jsonf")
property(outname="host"
name="hostname" format="jsonf")
property(outname="severity"
name="syslogseverity-text" caseConversion="upper" format="jsonf")
property(outname="facility"
name="syslogfacility-text" format="jsonf")
property(outname="syslog-tag"
name="syslogtag" format="jsonf")
property(outname="source"
name="app-name" format="jsonf")
property(outname="message"
name="msg" format="jsonf")
}
see: https://rainer.gerhards.net/2018/02/simplifying-rsyslog-json-generation.html
Python
For any software written in Python, there is a JSON-logging module (see https://pypi.org/project/json-logging/ )
Bind
For the Bind name server, I only found limited JSON support in the statistics channel. It is an optional feature (with XML being another option)
see: https://bind9.readthedocs.io/en/latest/chapter10.html?highlight=json#optional-features
Missing Pieces
So far, the two items I am missing the most are firewall logs (BSD or iptables/nftalbes) and Postfix or other mail servers. Got any other tricks? Let us know.
[comments are sadly not working right now. Hope to have them back today/tomorrow]
---
Johannes B. Ullrich, Ph.D. , Dean of Research, SANS.edu
Twitter|
Comments