On monitoring logs

November 11, 2016

Like many people, I have a personal server (actually several). Until recently I have not been good at monitoring logs. I was lazy and it was never very easy. I recently set up a new physical server, and I decided I should be more diligent about monitoring it. In this post I talk about the system I built to monitor logs.

I started out with these basic requirements:

I want to see log messages since the last time I looked at the log.
I want to be able to ignore log messages based on patterns.
I want to know what logs I'm monitoring and what lines get skipped.

Essentially I wanted to see only messages I might care about. There are a lot of regular status logs that are not interesting for me to look at.

I decided to write a program in Go. I knew there exist solutions to do this type of thing, such as Logwatch. I did not try existing solutions because of the last requirement. I wanted to have a complete understanding of the log gathering process.

My first version worked roughly like this:

For each line in each log file:
- Determine its time (either by a timestamp on the line, or from a prior line, or if nothing else, use mtime)
- If it is before our requested date/time, skip it.
- If it matches an exclusion pattern, skip it.
- Print out the log message.

As input the program took:

A date/time argument saying "include messages on or after this".
A configuration file that defines each log filename (by pattern) to examine.
A configuration file that defines how to exclude messages from each log file. This used regular expressions.

This let me see log messages I cared about since the last time I ran it. However, it was awkward to use:

The log files needed to be available locally. In practice this meant I was copying over the contents of /var/log from the server.
I needed to give the date/time argument manually.

There was also the up front cost of adding patterns to exclude log messages I did not care to see.

This did work for me. I monitored logs on one server this way for a few weeks. But I wanted to monitor the logs on all of my servers, not just one.

I decided to start using this program to monitor logs on two more servers. I found this to involve more manual effort than I wanted.

I updated the program to automatically track its starting time so it could know the date/time to start collecting logs from without me having to tell it. Then I ran the program from cron on each host.

It was time consuming: I was receiving emails from three hosts that I needed to look at. I needed to take each report and add patterns to exclude messages I didn't care about. Once I had added the patterns, I needed to update the configuration file on each host.

At this point I started looking at solutions others had built. I read an interesting post about Fluentd, so I read more about whether Fluentd could work the way I wanted. That is, could I make it monitor arbitrary raw log files? At first I thought it was not able to do this, but today I found it has a core plugin called in_tail which would work.

I also looked into Logstash. There is a product from the same company called Filebeat which I thought would do what I needed. I found today that there is a core Logstash plugin called file that would work too, so in theory Filebeat would not be needed. I could run a Logstash instance on each host.

There are drawbacks with the above though:

They rely on a daemon running to tail the logs all the time.
There are the complexities of log rotation.
Time investment and risks of setting up new daemons that I am not an expert about.

Both have capabilities to deal with log rotation, but it's a worry. My program avoids the log rotation problem by looking at all log files when it runs, so it does not need to track a file position or whether a log rotated. Instead it decides depending on each log line's time whether it should report the line. This has the drawback that it usually must read/parse messages that could be skipped entirely using the tail method of course.

I tried out using Logstash using the file input plugin and the Elasticsearch output plugin to gather logs to a central location. It did work well, but I thought to myself: If I can write something similar myself that would work exactly how I want, why not?

I decided to improve how my program worked to reduce the work load on me. I split it up into three programs:

logauditsubmit is a log submission client and runs via cron (daily). It has a configuration file that tells it what files to monitor. It reads them in, skips any older than the last run time, and then submits them via an HTTP request to logauditd.
logauditd acts as a central log ingestion point. It receives log lines from logauditsubmit clients (HTTP) and inserts them into a PostgreSQL database.
logaudit queries for logs from the PostgreSQL database. It asks for lines since its last run time, and then applies regular expressions to each line to know whether to report it or not.

This means I don't need to make regular changes on each host (assuming there are no new log files to configure). Each day, each host reports its new logs to the logauditd server. Afterwards, I run a single report to see those logs. If there are lines I don't want to see any more, I have a single configuration file to update (logaudit's).

This system is fairly rudimentary but it is working well for me so far since I don't need realtime logs. I only care to look at the logs periodically so having them once a day works fine.

I have the project up on Github.

The One and the Many

On monitoring logs