Log Insight Alerts: Email and Returned Results

One of the options for enabling user alerts is to send an email. The format of the message is standard across all thresholds, however the way in which data returned is different. The difference is important depending on how much information you want to be contained in the email versus available in Log Insight. I would like to walk through the email format as well as the different ways in which the data can be returned.

bell

Email format

Subject

[Log Insight] [<1-9>|10+] new [events|groups] found for alert <Alert_Name>

Message

This alert is about your Log Insight installation on <Log_Insight>

Hi,

Log Insight just found the following 10+ events matching the criteria for alert “<Alert_Name>”:

<Results>

[… and more. We only show the first 10 results in this e-mail.]

[Note: To avoid raising duplicate alerts, this alert will now be snoozed for the next <num> minutes (the search period for this alert).]

For more details, please view the search results.

To make changes to this alert, please visit the alert page.

Query types

Before jumping into the different ways in which data can be returned, it is important to understand the different query types available on the Interactive Analytics page of Log Insight. When it comes to alerts, two query types are relevant. The first is a message query, which is the result of using the search bar and/or filters only:

li-search-bar

The other type of query is an aggregation query, which consists of any combination of mathematical equation and/or groupings:

li-aggregation-bar

These query types are important because they are how the data can be returned in different ways within email user alerts.

Thresholds

As you may remember from my previous post on user alerts and thresholds, there are three ways in which user alerts can be raised:

  • On any match
  • when <more than|less than> <number> matches are found in the last <time range>
  • when <more than|less than> <number> events occur in a single group in the last <time range>

What is important is to understand is what query type matches which threshold:

  • On any match = message query
  • when <more than|less than> <number> matches are found in the last <time range> = message query
  • when <more than|less than> <number> events occur in a single group in the last <time range> = aggregation query

What does this mean? It means if you construct an aggregation query and create a user alert with either the first or second threshold option then only the message query is saved and used. Only the third threshold option saves both the message query and aggregation query to an alert.

Next, let’s see how the <Results> section of the email alert are different between thresholds that leverage message queries versus aggregation queries. As an example, let’s construct a blank message query (i.e. nothing in the search bar and no filters) and group by (over time drop-down) appname, enable it using the first threshold option and look at an actual email alert received:

This alert is about your Log Insight installation on li.sflanders.net

Hi,

Log Insight just found the following 10+ events matching the criteria for alert “Test1”:

2014-09-27T04:24:33.234Z esx02.matrix Rhttpproxy: [2AC43B70 verbose ‘Proxy Req 14015’] The client closed the stream, not unexpectedly.

2014-09-27T04:26:45.645Z esx01.matrix Hostd: –> }

2014-09-27T04:26:45.645Z esx01.matrix Hostd: –> msg = “”,

2014-09-27T04:26:45.645Z esx01.matrix Hostd: –> faultCause = (vmodl.MethodFault) null,

2014-09-27T04:26:45.645Z esx01.matrix Hostd: –> dynamicType = ,

2014-09-27T04:26:45.645Z esx01.matrix Hostd: –> (vmodl.fault.RequestCanceled) {

2014-09-27T04:26:45.645Z esx01.matrix Hostd: [FFD70D20 info ‘Vmomi’] Result:

2014-09-27T04:26:45.645Z esx01.matrix Hostd: [FFD70D20 info ‘Vmomi’] Throw vmodl.fault.RequestCanceled

2014-09-27T04:26:45.645Z esx01.matrix Hostd: –> “317”

2014-09-27T04:26:45.645Z esx01.matrix Hostd: [FFD70D20 verbose ‘Vmomi’] Arg version:

… and more. We only show the first 10 results in this e-mail.

For more details, please view the search results.

To make changes to this alert, please visit the alert page.

Notice how the <Results> contain actual events coming into the system. The only differences between the first and second thresholds is that the second threshold waits for <more|less> results to be returned over a given <time range> before alerting and once the alert triggers the alert is snoozed for <time range>.

Now, let’s take the same query, but enable an alert using the third threshold option and look it the <Results> section of the email alert:

This alert is about your Log Insight installation on li.sflanders.net

Hi,

Log Insight just found the following 5 groups matching the criteria for alert “Test2”:

appname count
vpxa 2535
hostd 156
hostd-probe 38
rhttpproxy 17

Note: To avoid raising duplicate alerts, this alert will now be snoozed for the next 5 minutes (the search period for this alert).

For more details, please view the search results.

To make changes to this alert, please visit the alert page.

As you can see, with this threshold the <Results> are returned in a table format and represent the information that would be seen from the aggregation query, not the message query. This difference is significant because it allows you to control what information is contained within the email alert. For example, let’s edit the query to group by hostname in addition to appname and look at an alert email:

This alert is about your Log Insight installation on li.sflanders.net

Hi,

Log Insight just found the following 7 groups matching the criteria for alert “Test3”:

hostname appname count
esx03.matrix vpxa 2371
esx02.matrix vpxa 164
esx03.matrix hostd 114
esx02.matrix hostd 42
esx03.matrix hostd-probe 24
esx03.matrix rhttpproxy 17
esx02.matrix hostd-probe 14

Note: To avoid raising duplicate alerts, this alert will now be snoozed for the next 5 minutes (the search period for this alert).

For more details, please view the search results.

To make changes to this alert, please visit the alert page.

Now the <Results> section contains information about hostname in addition to the appname and count. This information is powerful because an engineer can use this information immediately to start troubleshooting the problem as they now know which system(s) are impacted.

Summary

User alerts in Log Insight provide a powerful way to troubleshoot problems within an environment and allow you to be proactive instead of only reactive to issues. The way in which results are returned in user alerts is critical to reducing the RTO when issues do arise. To save aggregation queries into an alert and/or to include information about fields within a user email alert, be sure to use the third threshold option.

© 2014 – 2015, Steve Flanders. All rights reserved.

Leave a Reply