Log Insight 2.5: Backup and Recovery Part 2/3

In my first post about Log Insight 2.5 backup and recovery, I talked about deployment types, backup considerations and architecture of the different Log Insight components. With that knowledge, I would like to continue by discussing how to back up the different components of Log Insight 2.5.

li-backup

IMPORTANT: Again, this information is specific to Log Insight 2.5. The architecture of Log Insight changes from release to release so if you are running an older or newer version than the process may differ. For example, prior to Log Insight 2.5, Postgres was used instead of Cassandra so the Configuration section below would be different for older versions of Log Insight.

Server

The recommended way to back up a Log Insight server — remember when I say server I mean server or forwarder — whether it is standalone or cluster is to back up the entire virtual appliance. Such a backup could be done via a storage array, from backup software that makes API calls to vSphere/vCD/vRA/vRO or from a backup agent running on the Log Insight virtual appliance.

IMPORTANT: Installing backup agents on the Log Insight virtual appliance is technically unsupported. If a server issue is discovered and an agent has been installed, then VMware support ***MAY*** request that the agent be uninstalled and the issue reproduced. For more information, see http://kb.vmware.com/kb/2090839.

When backing up Log Insight, it is not the backup product that is used that matters. What matters is that the entire virtual appliance is backed up. This means that file-level backups should be avoided. Either block or all virtual disk backups should be used. The reason for this is because the majority of the data on the Log Insight virtual appliance consists of events that Log Insight has ingested.

If you look at the mounts on Log Insight, you will see that 8GB is used by the root filesystem, 20GB is used for /storage/var and everything else is used by /storage/core. The root filesystem information is not important and could be discarded.

IMPORTANT: Remember the SSL certificate is stored on the root filesystem and may need to be backed up. For more information on backing up the SSL certificate see this post.

The other two mounts are used by Log Insight. The /storage/var mount contains logs messages that Log Insight has generated and technically is not necessary though it is good to back up this information for historical reasons. Everything that is critical to Log Insight is in /storage/core.

Now, you may be wondering the backup procedure for a Log Insight cluster. Let me try to address the most common questions:

  1. Do all the nodes need to be backed up at the same time? Ideally yes, but this is not a requirement. Log Insight can recover if nodes are backed up separately though the restore procedure may take longer.
  2. Do the virtual machines need to be quiesced during backup? Ideally yes, but this is not a requirement. Again, Log Insight can recover, but it may take longer if memory state is not preserved.
  3. Is there an optimal time to perform backups? Not really. In most environments, you will be ingesting events 24×7. It is possible that during off hours you will have reduced query load on the system. Note that even if users are not actively using the UI, it is likely they have enabled alerts which are queries that run on a schedule.
  4. Can I used a vCenter/vCD snapshot as a backup? Use of a snapshot is recommended prior to performing an upgrade, but otherwise snapshots should typically not be used and definitely not kept for a long period of time. Again, Log Insight is constantly ingesting data so snapshot sizes can grow rapidly and could cause potential performance as well as capacity issues. In addition, snapshots are not backups. If something happens to the parent VMDK it is unlikely you will be able to recover.
  5. Do I really need to back up the entire virtual machine? As mentioned, technically the root filesystem and /storage/var partitions are not needed, but these partitions make up a mere 28GB of space. By default, the virtual machine allocates 256GB of space for /storage/core. In most environments additional virtual disks are added so a longer retention can be kept. Saving 28GB of raw space is nothing and backing up the entire virtual machine makes the restore procedure very straightforward.
  6. Can I used Log Insight archives as backups? Log Insight archives frequently and retains data on the virtual appliance for as long as there is available space. This means that data can exist on the virtual appliance and on the archive at the same time. While archives could be used to provide some amount of backup there are a couple important limitations to be aware of. First, importing of archives can be time-consuming and may impact the ingestion of new incoming events. Second, the most recent data on all nodes would not have been archived yet so that data will be lost through the process. The purpose of archiving is not to provide backup, but rather to keep history that could later be imported on a different Log Insight instance as needed for auditing or compliance reasons. In general, the use of archives as a backup mechanism is not recommended.

Configuration

There are two primary types of Log Insight configuration information:

  • Log Insight configuration – information required for Log Insight to function properly
  • User configuration – information about what users can access the system and any saved queries they have constructed

Log Insight configuration

As mentioned in my first post, the Log Insight configuration is replicated to all nodes in a cluster today. Even though the configuration is “backed up” on all the nodes, one may desire to back up the configuration for historic purposes and to revert changes. The Log Insight configuration is saved in:

  • /storage/core/loginsight/config/loginsight-config.xml#<N>

You will notice multiple XML files in this directory. The XML file with the largest <N> is the current configuration file. As you will see, Log Insight already keeps the last three configurations:

This means you do not technically need to back up the Log Insight configuration. Assuming you are performing a full virtual machine backup, the Log Insight configuration will be backed up anyway. If you want to back up all Log Insight configuration changes then backing up the /storage/core/loginsight/config directory on any single node in the cluster is sufficient. You could, for example, write a cronjob that monitors for new configuration files and copies it to an NFS mount.

User configuration

Any users added to Log Insight are stored in the Cassandra database. In addition, any queries saved by the user get saved in the Cassandra database. If a user is deleted from Log Insight then all saved queries from that user are deleted as well. Because if this, it may be desirable to backup Cassandra to ensure users and user’s saved queries can be restored.

IMPORTANT: Cassandra is used for more than user data including machine learning data and the hosts overview table. If non-user data is lost, Log Insight can recreate it.

Before I talk about the options for backing up Cassandra and the recommended approach, let me discuss some important information about the Cassandra configuration:

  • All Cassandra data is stored in /storage/core/loginsight/cidata/cassandra
  • If you are looking for the standard cassandra.yaml file, it can be found in the config subdirectory
  • From the cassandra.yaml file you will see a single keyspace is used: logdb
  • In addition, you will see that all data is stored in the data subdirectory

Cassandra offers a couple of ways to backup data:

  • snapshots
  • incremental backup

For more information on the backup and restore procedure see this link.

Given that Log Insight only stores a very small amount of data in Cassandra and the best practice is to backup the entire Log Insight virtual appliance, the use of snapshots is recommended if you wish to backup Cassandra more frequently than the virtual appliance. It is important to note that the snapshot command only applies to the data on the system for which the snapshot command is run. Given that Log Insight does not replicate all Cassandra data to all nodes this means you must run the snapshot command on all nodes (see my li_rexec post for one way to handle this).

To create a snapshot, simply run the nodetool snapshot command against the logdb keyspace:

Note: The nodetool command is not in the default path of the Log Insight virtual appliance so you must specify the absolute path of the command.

You will see for every Cassandra snapshot created, a directory is created with a number representing EPOCH time of when the snapshot was created. To confirm that a snapshot was created successfully, look for a snapshot subdirectory under any table from the logdb keyspace:

Just like VM snapshots, Cassandra snapshots should not be kept for long periods of time. Cassandra does not maintain snapshots so it is up to the user to perform cleanups. Cassandra offers the ability of deleting individual snapshots or all snapshots as follows:

IMPORTANT: Log Insight does not keep a lot of data in Cassandra so having multiple snapshots and keeping them for several days should have minimal impact. With that said, I would typically not recommend more than a dozen snapshots spanning no more than a week at any given time as snapshots prevent deleted data from being removed.

In addition to snapshots, Cassandra supports incremental backups. To reword the previous sentence, incremental backups are used IN ADDITION to snapshots. The advantage of incremental backups are that they consume less space. Since Cassandra on Log Insight is typically only a couple hundred MB, incremental backups are not required.

Agent

The Log Insight agent is made up of three key components:

  • Configuration files (e.g. liagent.ini)
  • Log files
  • Database – for storing events that have not been sent yet

In the case of Windows all three key components are located in the same directory. For Linux, the logs are in a separate directory than the rest. In either case, the only component you may need to worry about is the liagent.ini — not to be confused with the liagent-effective.ini. As you should know, the liagent.ini is the client-side configuration of the agent. It contains the hostname of the remote destination to forward events to. In addition, it may contain configuration that a user has appended to it. If all agent configuration is done server-side then the contents of the liagent.ini are unimportant. This means while you can back up all components, the only information needed that cannot be recovered would be in the liagent.ini file.

Summary

Overall

  • Backup (not snapshot) all nodes once a day.
  • Only if you require complete configuration auditing should you manually backup the Log Insight configuration.
  • If you wish to backup user configuration, consider taking a Cassandra snapshot every three hours, storing snapshots for one day and backing up snapshots to an NFS mount.
  • If you set custom client-side configuration then you should backup the liagent.ini on clients. Consider using server-side configuration instead and with proper virtual appliance backup you would not need to backup clients.

Server

  • The recommended way to back up a Log Insight server is to perform a full virtual appliance backup of all nodes in the cluster.
  • While backing up all nodes at the same time (e.g. storage array snapshot) is ideal, it is not required though note that the longer time between backups of nodes the longer recovery time.
  • The backup software used to backup Log Insight is irrelevant.
  • Snapshots should not be used as backups.
  • Log Insight archives are not meant to be used as backups.
  • How frequently you backup dictates how much data — both ingested and saved — you lose should you need to perform a restore.
  • One approach would be to perform daily backups of all Log Insight nodes.

Configuration

  • Log Insight saves the last three Log Insight configuration files. If you wish to retain all Log Insight configuration files you will need to monitor for new files in the config directory and save them to an external location.
  • Unless complete auditing is required, backing up the Log Insight configuration is not necessary.
  • If you wish to backup user configuration information separately from or more frequently than the virtual appliance backup then you use Cassandra’s snapshot feature.
  • A Cassandra snapshot needs to be taken on every node in the cluster and once completed, the snapshots can be backed up to an external location like an NFS mount.
  • Note that Cassandra does not cleanup snapshots and like virtual machine snapshots you should ensure snapshots are not kept around too long as they will consume disk space.
  • For most environments it is not critical that Log Insight or user configuration be backed up.
  • One approach would be to create a Cassandra snapshot on each node every three hours and delete snapshots older than a day. This coupled with daily virtual appliance backups should be sufficient for most environments.

Agent

  • The liagent.ini file may contain configuration information that you wish to back up.
  • All other data for the agent could be discarded.
  • If you are using only server-side configuration then technically nothing would need to be backed up on the client-side.
  • If you do wish to backup the agent anyway ensure you get both directories for Linux.
  • Assuming you have an automated way of rolling out the agent, use server-side configuration and backup the Log Insight virtual appliance, there is no reason to backup the agent on the clients.

UPDATE: Added information about backing up the SSL certificate.

© 2015, Steve Flanders. All rights reserved.

Leave a Reply