Log Insight 3.0 Agents: CSV Parser

The first agent parser I want to take a look at is the CSV parser. Read to learn how it works!
li-agent

How the Parser Works

The CSV parser is for events that follow a known delimiter-based messaging structure. By default, comma is the delimiter used, however a “delimiter” option can be specified to change to a different character. While the delimiter option is not mandatory, the “fields” option is. While parsing a known delimiter event is easy, determining the fields — names of each value — is not. If an actual CSV file was ingested then this would be easier, however when sending log events assuming field names is not an option.
The most important thing to note about the CSV parser and the fields option is that you MUST specify the EXACT same number of fields that are within an event. If you do not then NO fields will be parsed from the event and the CSV parser will be useless. This means either all events within the file must have the same number of fields or nested parsers need to be leveraged.

Basic Example

Let’s start with a basic example of how the CSV parser works. Let’s assume I have a log file in /var/log/test called basic.log that contains the following:

2015-10-11 15:32:48.701+0000,vm01.matrix,This is a test message

I could parse this message using the following configuration:

[filelog|test]
directory=/var/log/test
parser=test_csv
[parser|test_csv]
base_parser=csv
fields=timestamp,hostname,text

While the above configuration is correct, the above configuration actually produces no static fields — the Log Insight server will show hostname. Do you know why? As you may remember, timestamp, hostname and text are all reserved field names that cannot be used. I could rename the fields if I really wanted to parse this message, but given that all three fields are reserved, that is a good indication that I probably should not be parsing this event.
Let’s look at another example:

2015-10-11 15:32:48.701+0000|vm01.matrix|200|put|/category/vmware|This is a test message

I could parse this message using the following configuration:

[parser|test_csv2]
base_parser=csv
fields= , , http_request, http_status, http_uri, http_message
delimiter="|"

In this example, I had to specify the delimiter because the event was not broken up by comma. In addition, since I had reserved field name within the event I had to ignore them. Since the fields option MUST match the number of fields within the event EXACTLY even the ignored fields need to be represented. Note I could have added the reserved names and used the exclude_fields option as alternatives to the approach above and all would have yielded the same result.

Advanced Example

Now let’s look at a more complicated example:

2015-10-11 15:32:48.701+0000|vm01.matrix|200|put|/category/vmware|This is a test message
2015-10-11 15:32:48.701+0000|vm01.matrix|200|get|/category/vmware

The problem here is that the same log file contains events with different field counts. As a result, multiple parsers need to be leveraged:

[filelog|test]
directory=/var/log/test
parser=test_csv
next_parser=test_csv2
[parser|test_csv]
base_parser=csv
fields= , , http_request, http_status, http_uri
delimiter="|"
[parser|test_csv2]
base_parser=csv
fields= , , http_request, http_status, http_uri, http_message
delimiter="|"

You may be wondering if the order of the parsers matter — as long as the field names are the same no, but if the field names differ then the answer may be yes. Remember that later parsers can override the settings of earlier parsers. As you can see, the above example uses the next_parser option. This is just one example of how the common parsing options can assist.
IMPORTANT: The CSV parser follows RFC4180 — this means that multiline CSV is supported, but multiline values must be encapsulated in double quotations.

Summary

As you can see, the CSV parser is flexible, but has some strict guidelines on configuration. Be sure to specify the fields option with every field the event contains or else no fields will be sent. The CSV parser is only for events that follow a known delimiter-based pattern. For other logs formats see my following posts on other agent parsing options. Do you have a need for the CSV parser?

© 2015, Steve Flanders. All rights reserved.

4 comments on “Log Insight 3.0 Agents: CSV Parser

Alex says:

Hi!
If you know there is such a program – RVTools, it allows you to get excellent reports on the virtualization platform, creating files in the CSV. Could you give an example of setting Log Insight Agent to collect these CSV files and transfer them to the Log Insight?
For example: “rvtools.exe -passthroughAuth -s %vCenter% -c ExportAll2csv -d %Path%” —> RVTools_tabvInfo.csv, RVTools_tabvNetwork.csv, … RVTools_XXXX.csv (23 files) —> Log Insight = Cool! 🙂

Hey Alex — Thanks for the comment! You would need something like the following:
[filelog|rvtools]
directory=/var/log/rvtools
parser=rvtools-parser
[parser|rvtools-parser]
base_parser=csv
fields=field1,field2,field3,…

Alex says:

Hey Steve!
Thanks for the answer, I figured out how to do it!) Everything works fine, it’s a great product !!!

Awesome — glad you go it working!

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top