Log Insight has always supported the notion of extracted fields. The feature makes it possible to extract useful bits of information from within an event so those bits can be used in other ways such as filtering, aggregation and grouping.
Problems with Extracted Fields
As the amount of logs a Log Insight instance received goes up, extracted fields have the potential to impact performance. The reason for this is two-fold:
- Extracted fields are regular expressions: Regular expressions are always slower than keyword or even glob queries. In addition users define extracted fields which may mean that inefficient regular expressions are used.
- Extracted fields are applied at query time: Some extracted fields may be part of a query meaning that a regular expression is applied in order to gather your results, but all extracted fields are always applied against your results as they are computed at query time. Of course Log Insight has a sophisticated caching mechanism, but extracted fields do come with a cost.
Another problem with extracted fields is that it requires that at least some users understand regular expressions.
In addition to extracted fields, Log Insight supports static fields. Static fields are sent to Log Insight during ingestion time and are stored in an extremely efficient keyword index. The result is that static fields are infinitely faster than extracted fields and do not require an understanding of regular expressions.
Ingesting Static Fields
So how to do you get static fields to Log Insight? Well, the ingestion API (cfapi) has always supported the notion of tags. Tags are metadata that get applied to ingested events. All tags are sent as static fields. The problem with tags is that the do not dynamically support extracting useful bits from within events like extracted fields do.
To address this, the Log Insight 3.0 agents now support agent parsers. Parsers provide a flexible way to extract useful bits of information from within events and send the bits as static fields to Log Insight. As of Log Insight 3.0 the following parsers are supported:
- CSV: Comma Separated Value — though the comma can be changed into a different character
- KVP: Key/Value Pair — metadata separated by an equal sign
- CLF: Apache’s Common Log Format — the most flexible of the parsing options today and for more than just Apache logs
- Timestamp: To use timestamps within logs instead of from client or server
- Automatic: Combination of timestamp and KVP parsers
Note: Agent parsers work for both the cfapi and syslog protocols. With cfapi, the parsed information is passed to Log Insight as static fields. With syslog, the parsed information is passed within the STRUCTURED-DATA part of syslog RFC 5424 — meaning the event is modified to include the parsed information.
One of the most common first questions about the agent parsers feature is: What impact does it have on the clients? Given that the parsing capabilities are being passed down from the server to the client clearly this must come with some cost. Of course agent parsers require more resources, but here is why you should not be concerned:
- The agent is extremely efficient and so are the agent parsers: The agent typically consumes less than 2% of a single CPU and 20MB of memory under normal operating conditions and parsers typically introduce no more than 20% additional overhead under normal operating conditions. Of course these are not official numbers, but in my experience the agent significantly outperforms third-party syslog agents like rsyslog and syslog-ng even with parsers configured.
- A client generates only a small subset of the events that the server process: Parsing events requires resources somewhere. If events are parsed server-side then the server resources would need to increase rather drastically given the amount of events the server would need to parser. Clients typically generate a very small number of events making the parser overheard very minimum.
- Parsers should only be used selectively and when the fit one of the built-in parser capabilities: Just like with extracted fields, the user can cause issues with agent parsers. While agent parsers are really cool and really powerful, I would not recommend using them for parsing events they were not designed for. As an example, I would not use any of the parsers for JSON logs today — a future version of the agent may include this parsing capability.
Log Insight 3.0 agents support parsing capabilities making it possible to extract useful bits of information from events and send them as static fields to Log Insight. The result is more performant queries and no need to know regular expressions. In my next series of posts I will walk through the parsing options available today. Are you using agent parsers yet?
© 2015, Steve Flanders. All rights reserved.