Log Insight Agent: Text vs. Glob vs. Regex

Recently someone reached out to me with an issue where the Log Insight agent was not collecting files that it should. I quickly uncovered the issue and thought that others may experience it so I figured I would discuss it.
li-agent

Problem

The user had log files like the following:

$ ls
http_access_log
http_error_log

The user had a liagent.ini like the following:

[filelog|httpd]
directory=/var/log/httpd
include=*.*

Do you see the problem? No? OK, let’s look at the log file:

$ grep httpd /var/log/loginsight-agent/liagent_2015_07-24_01.log
2015-07-24 16:09:54.880433 0x00007f5867aa2700  FLogCollector:660 | Currently there are no log files passing through the 'include'/'exclude' file name filter for channel .
2015-07-24 16:09:54.880467 0x00007f5867aa2700  FLogCollector:205 | Subscribed to channel

Wait, why were no files found? If you do not know the answer, read on!

Agent Configuration

The Log Insight agent supports a variety of configuration options. Given the above problem, I will focus on the filelog section. Each filelog section mandates two things:

  1. A name that is unique. In the above example, the name is “httpd” and as long as no other configuration sections are called “httpd” then you are good. Note that you should check the liagent-effective.ini file as that contains a combination of client-side and server-side configuration and may point out a configuration issue.
  2. A directory is specified.

By default, each filelog section defaults to:

  • include=* (all files except .zip and .gz)
  • event_marker=\n (newline)
  • charset=UTF-8

Do you see the issue now?

Text vs. Glob vs. Regex

The Log Insight agent configuration is made up of sections in square brackets like: [this] which contain key/value pairs like: key=value. The supported value depends on the key, but will support some combination of:

  • Text: Some string that provides literal information
  • Glob: Just like the Log Insight server, this can be either asterisk (*) meaning zero or more characters or question mark (?) meaning exactly one character
  • Regex: Perl-based regular expression

Continuing to focus on the filelog configuration section, the applicable value combinations for the available keys are:

  • directory: text
  • include: text, glob (comma separated)
  • exclude: text, glob (comma separated)
  • event_marker: text, regex

All other options are text.

Resolution

Based on all the above information, do you see the resolution to the initial problem? While the include key defaults to all files in the specified directory, the user had explicitly defined the include as:

include=*.*

For those with regex experience, you may raise an eyebrow about the leading asterisk, but assume the above configuration reads as: “match zero or more characters” — basically match all files — however with the knowledge that *.* is not valid regex and the include key supports only text and glob, the configuration actually reads as: “match any file that contains a period”. The literal translation would be: match zero or more characters followed by a period followed by zero or more characters. This would mean that test.log, test.1.log, .test.log, and .test would match however test or test_log would not match.
Long story short, if you change the include to one of the following, log collection would work as expected:

  • <nothing> (default to include=*)
  • include=*
  • include=httpd_*
  • include=httpd_*_log
  • include=*_log
  • include=httpd_access_log,httpd_error_log

Summary

It is important to understand the differences between text, glob and regex and also understand which values support which options. When troubleshooting agent issues, be sure to check the log file as it should clearly indicate where the problem exists.

© 2015, Steve Flanders. All rights reserved.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top