Next up in the blog series about time in Log Insight I would like to answer the question, what time is used when you issue a query to Log Insight? You might say the time the originator of the message indicated in the message itself, but this is not the case. Read on to learn more!
NOTE: This is a rather long post. If you plan on skimming, be sure to fully read and understand the Queries section below.
Events
Events in Log Insight can be ingested in two primary ways:
- Syslog protocol
- Ingestion API
I will cover both of these in-depth.
Syslog protocol
While messages sent over the syslog protocol did not have to be in syslog format, they should have key attributes as defined in the syslog RFC including a timestamp and a hostname (if the event does not supply one the syslog agent should add one). The problem with timestamps sent by a client is that you cannot guarantee it is reliable. For example, how do you know that all clients within an environment have the same time?
Also, what happens if a device sends an event without a timestamp? For example, here is an actual syslog message sent over the syslog protocol from a Riverbed Stingray Traffic Manager load balancer:
stingray traffic manager: INFO [Monitor: Connect] [Machine: 10.145.133.161:9000] Succeeded
It is also possible that a syslog aggregator or syslog server adds or modifies a timestamp in an event. In this case, the timestamp would be when the syslog aggregator or syslog server receives/processes the message, which means the timestamp will be at least slightly later than when the message was actually generated (assuming time is set properly in the environment).
Ingestion API
When using Log Insight’s ingestion API, you are not bound to any formatting constraints and can send the event any way you want. If you are using Log Insight’s agent over the ingestion API then the agent will send information such as a timestamp automatically, but this is not a requirement of the ingestion API. So how do you query for an event over a time range if the event does not contain a timestamp?
NOTE: For more information on using the Log Insight ingestion API see this post.
Timestamps
Application logs
Application logs are not bound to any formatting constraints and are notorious for not including a timestamp. For example, take a look at a Tomcat’s catalina.out file:
# tail catalina.out log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. log4j:WARN No appenders could be found for logger (com.vmware.loginsight.vsphere.events.VimEventMonitor). log4j:WARN Please initialize the log4j system properly.
Again, how do you query for an event over a time range if the event does not contain a timestamp?
Configuration files
Log Insight can handle any text-based data including configuration files and structured data such as JSON, HTML, and XML. It is highly likely that configuration files and structured data will not include timestamps. For example, take a look at a Tomcat’s catalina.properties configuration file:
# tail catalina.properties # Please note that for single jars, e.g. bar.jar, you need the URL form # starting with file:. shared.loader= # # String cache configuration. tomcat.util.buf.StringCache.byte.enabled=true #tomcat.util.buf.StringCache.char.enabled=true #tomcat.util.buf.StringCache.trainThreshold=500000 #tomcat.util.buf.StringCache.cacheSize=5000
Again, how do you query for an event over a time range if the event does not contain a timestamp?
Timestamp formats
Even if the files you wish to ingest do contain a timestamp, it is unlikely that the timestamps are in the same format. The syslog RFC defines valid timestamp formats, but not all syslog agents follow this format and if not using the syslog protocol the timestamp format is often not defined. The result is a wide variety of timestamp formats, which makes parsing difficult and possibly impossible if not all the required information is available in the provided timestamp.
Some timestamp formats you may see include:
- 2005-10-30 T 10:45 UTC
- Sat Jul 23 02:16:57 2005
- 12569537329
- 07:38, 11 December 2012 (UTC)
- Tue Dec 24 20:42:39 UTC 2013
- 6/10/14 2:38 AM
- 2014-07-13 12:54:22.464+0000
- 2014-06-10 02:36:01,977
- 2014-05-09T12:22:01.858Z
- 2014-07-10 15:47:06
- 13 Jul 06:40:48
- and the list goes on…
Queries
UI
When you issue a query in Log Insight you need to specify a time range:
What time does Log Insight use to apply against the events? It could use the timestamp in the message assuming a timestamp exists, but even if the timestamp exists it may not be accurate or complete. Instead of hoping the originator of the message sent at timestamp, hoping it contains all the necessary information, and trusting it is accurate, Log Insight instead trusts itself. When messages are ingested in Log Insight, a Log Insight specific timestamp is added to the event. The actual event is not modified, but a Log Insight specific timestamp can be seen when viewing events on the Interactive Analytics page:
The timestamp in gray is the timestamp Log Insight added when the message was ingested.
NOTE: The timestamp in gray may have a small discrepancy from the timestamp in the event. This discrepancy should always be later chronologically of the timestamp in the event. If the discrepancy is large or behind the timestamp in the event then this points to a time synchronization problem in the environment.
The timestamp at the start of the event, if present, was either part of the original event or added by a syslog aggregator. When a query is run against events in Log Insight, the timestamp that Log Insight added (the one in gray) is the one used by the query. This ensures a guaranteed timestamp that is consistent and reliable.
NOTE: I covered how the time range option works on the Interactive Analytics page under the Textual Representation section of this post.
Agent
The Log Insight agent when using the ingestion API (cfapi) can ensure that a timestamp is sent from the client in a specific format. As such, the Log Insight server trusts the timestamp sent by the Log Insight agent when the agent is using the ingestion API as long as the time between the Log Insight server and the client is within 10 minutes. This means that events ingested by the Log Insight agent when the agent is configured to use the ingestion API will get a Log Insight specific timestamp (the one in gray) that is set by the Log Insight agent and not by the Log Insight server’s time.
NOTE: If the Log Insight agent is configured to use the syslog protocol or if the time different between the Log Insight agent and the Log Insight server is greater than 10 minutes then the Log Insight server will set the Log Insight specific timestamp (the one in gray) to the time of the Log Insight server when the message is ingested.
Importer
The Log Insight importer by default works the same way as the Log Insight agent. It does feature an –honor-timestamp flag which when used with the –username and –password flags can be used to trust any timestamp within a log event (i.e. 10-minute requirements between client and server is removed)
Ingestion API
If the time difference between the timestamp in an event sent over the ingestion API and the Log Insight server is greater than 10 minutes then the Log Insight server will set the Log Insight specific timestamp (the one in gray) to the time of the Log Insight server when the message is ingested.
Time
The reason why time is so important in Log Insight is that Log Insight is a troubleshooting tool. In order for Log Insight to be used successfully it is critical to know when events occurred and to correlate events across systems. If time is not in sync across devices then performing queries against unstructured data would be very challenging and getting to a root cause analysis might be impossible. With a consistent time range across all events, troubleshooting and root cause analysis become much easier.
Summary
As you can see, time is very important in Log Insight, but it is not easily defined by clients sending events. As such, Log Insight adds its own ingestion timestamp when a new event is received. The ingestion timestamp is what Log Insight uses for queries issues against the system to ensure a consistent format from a reliable source.
UPDATE: Added agent and importer information to the Queries section.
© 2014 – 2021, Steve Flanders. All rights reserved.
Steve – please clarify your statement “This discrepancy should always be ahead of the timestamp in the event. If the discrepancy is large or behind the timestamp in the event then this points to a time synchronization problem in the environment.”
It would make sense to me that the timestamp in gray should be later chronologically than the timestamp in the event since Log Insight will have processed the message after it was created on the original system. Is this correct? Your wording “ahead” suggests to me the notion of earlier, not later.
Also, your example is showing the Log Insight (gray) timestamp to be later than the event timestamp which reinforces my confusion (something that’s not hard to do).
Hey Gregg, you are correct. I have updated the wording accordingly. Thanks for the feedback!
Hi Steve,
If I post to insight ingestion api it blows off the timestamp in the event. Is it not possible to set this (e.g., I am parsing some older logs and want to spit the data into insight for visualization).
Events show up but only with a time stamp from when I made the call.
e.g., a submit with a timestamp indicating a few days ago shows up as “now”:
{“messages”:[ {“text”:”this is to be 11 aaaaaaa more stuff “, “timestamp”:1407878459000}]}
posted to my test site:
192.168.2.30:9000/api/v1/messages/ingest/demo1
thanks.
Hey Chris – I did not make it clear, but the agent and the ingestion API have the following note: “NOTE: If the Log Insight Windows agent is configured to use the syslog protocol or if the time different between the Log Insight Windows agent and the Log Insight server is greater than 10 minutes then the Log Insight server will set the Log Insight specific timestamp (the one in gray) to the time of the Log Insight server when the message is ingested.” I will update the post.
Hi Steve,
I’m deploying Log Insight (LI) for a client where they are trying to retrieve logs from jboss applications running on red hat. I’ve added the correct filelog parameters to the liagent.ini file and restarted the agent.
However when looking at logs for the jboss application on the Interactive Analytics page of LI, each log event (28,000+) for the jboss application have the same timestamp! An example timestamp in the appliccation log is as follows: 12-JUN-2015 10:30:43.
Could that timestamp format in the application log be the issue?
Thanks in advance.
I’ve just come across your blog: Importing existing logs into Log Insight. I’ve got the answer to my query. Thanks for all the info on Log Insight!
Awesome!