Log Insight: Ingestion API versus Syslog Protocol Part 1/2

As you know, Log Insight introduced an ingestion API with the 2.0 release. This ingestion API can be used by anyone, but is leveraged by default by the Log Insight agent available for Windows as of 2.0 and Linux as of 2.5. The ingestion API is powerful because it provides functionality beyond what the syslog RFC defines, but it is important to note that events received over each protocol may look different. Read on to learn more.
not_equal_to_u2260_icon_256x256

Syslog RFCs

There are several syslog RFCs including:

The base syslog RFCs, 3164 and 5424, define the parts of a syslog event:

The full format of a syslog message seen on the wire has three
   discernable parts.  The first part is called the PRI, the second part
   is the HEADER, and the third part is the MSG.  The total length of
   the packet MUST be 1024 bytes or less.  There is no minimum length of
   the syslog message although sending a syslog packet with no contents
   is worthless and SHOULD NOT be transmitted.

In addition, the base syslog RFCs clearly define the format of each part such as the header:

The HEADER contains two fields called the TIMESTAMP and the HOSTNAME.
   The TIMESTAMP will immediately follow the trailing ">" from the PRI
   part and single space characters MUST follow each of the TIMESTAMP
   and HOSTNAME fields.

The base syslog RFCs even mandate what to do if an event does not follow the standard format:

If a relay does not find a valid TIMESTAMP in a received syslog
   packet, then it MUST add a TIMESTAMP and a space character
   immediately after the closing angle bracket of the PRI part.  It
   SHOULD additionally add a HOSTNAME and a space character after the
   TIMESTAMP.  These fields are described here and detailed in Section
   4.1.2.  The remainder of the received packet MUST be treated as the
   CONTENT field of the MSG and appended.  Since the relay would have no
   way to determine the originating process from the device that
   originated the message, the TAG value cannot be determined and will
   not be included.

Ingestion API (CFAPI)

The Log Insight agent defaults to the cfapi protocol, which leverages the Log Insight ingestion API. The ingestion API provides several advantages over the syslog protocol including the ability to collect statistical and operational information about the agents directly in the server UI and also allows for server-side configurations to be pushed to agents (see this post for other advantages).

Events: Client vs. Server

Besides features, there is an important different between the cfapi and syslog protocols. In the syslog protocol, events must follow a particular pattern as defined in the syslog RFC and if they do not they must be modified before being sent over or received by the syslog protocol. This means the event you see on the client and the event you received on the server may not look the same with the syslog protocol. For example, let’s say you had a client log on the filesystem that looked like:

sflanders.net:80 182.118.53.86 - - [16/Jan/2015:23:34:42 +0000] "GET / HTTP/1.1" 200 485 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2251.0 Safari/537.36"

If this event was collected and sent over the syslog protocol then when it was received it would look like:

2015-01-16T23:34:42.000Z 192.168.1.80 sflanders.net:80 182.118.53.86 - - [16/Jan/2015:23:34:42 +0000] "GET / HTTP/1.1" 200 485 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2251.0 Safari/537.36"

You will notice the event received over the syslog protocol has a timestamp and a hostname prefixing the event. If you look at the event, you can see it already contains a timestamp, but that timestamp is not in the proper place within the event nor is it in the proper format. The syslog RFC mandate for modifying nonconforming events can result in confusion when analyzing log events.
In the case of the cfapi protocol, your event is never modified. This means the way the event looks on the client and the server is the same, though the metadata (e.g. hostname) might be different. The difference in metadata is important because queries may need to be adjusted. For example, with the syslog protocol, you can query for events via a hostname within the event (i.e. use the search bar on the Interactive Analytics page of Log Insight). However, with the ingestion API, the hostname is sent as a static field and may not appear in the event. As such, the ingestion API event may require a filter with the hostname field — below the search bar of the Interactive Analytics page of Log Insight — versus a keyword query. Of course, regardless of which protocol you use, the hostname filter option will return results always, however it is common for people to leverage keyword queries via the search bar and this may result and some events not being returned.

Summary

The Log Insight ingestion API is a powerful way to handle data ingestion beyond those provided by the syslog protocol. It is important to understand the differences between the ingestion API and syslog when constructing queries in Log Insight. It is always recommended to use the ingestion API over syslog, but at the very least you should always stick to either the ingestion API or the syslog protocol and not mix them to ensure query issues do not arise.

© 2015, Steve Flanders. All rights reserved.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top