Log Insight 2.5: Backup and Recovery Part 3/3

In my first post about Log Insight 2.5 backup and recovery, I talked about deployment types, backup considerations and architecture of the different Log Insight components. In my second post, I talked about how to backup the various components. To conclude the series, I would like to discuss how to restore the various components.
li-backup

IMPORTANT: Again, this information is specific to Log Insight 2.5. The architecture of Log Insight changes from release to release so if you are running an older or newer version then the process may differ. For example, prior to Log Insight 2.5, Postgres was used instead of Cassandra so the Configuration section below would be different for older versions of Log Insight.

Server

Standard Restore

Given that the recommendation for Log Insight server backup is to perform a virtual appliance backup, the restore procedure is very straightforward. The only potential complexity would be for clustered environments, but the procedure remains the same whether standalone or clustered:

  1. Bring the master node online first, wait at least two minutes after the Log Insight service has started
  2. Bring any single worker online next, wait at least two minutes after the Log Insight service has started
  3. Continue step #2 until all workers are online

Easy, right? The waiting for two minutes between each step is recommended to allow Cassandra to recover from what is going on. If you do not wait two minutes, Cassandra should still recover, but the recovery time may be extended, performance may be impacted and parts of the system may not work properly under Cassandra gets ahold of the situation.

SSL Certificate

Remember the SSL certificate is stored on the root filesystem and may need to be restored or replaced independently. For more information on restoring or replacing the SSL certificate see this post.

IP Address Changes

Now, there is another potential complexity to be aware of and that is disaster recovery. In some cases, you may wish to restore Log Insight in a different location. In this situation, it is possible and likely that IP addresses for the nodes will change. So how do you address this?

  • Make the appropriate IP change on the master BEFORE powering it up
    • Bring the master node online
    • Run: cd /storage/core/loginsight/config
    • Run: cp loginsight-config.xml#<largestNumber> temp.xml
    • Edit temp.xml and find the distributed section. It will either look like:
      <distributed overwrite-children="true">
        <daemon host="li01.matrix" port="16520" token="a6c2714c-ab5f-4dba-8106-ce216b07954d">
          <service-group name="standalone" />
        </daemon>
        <daemon host="192.168.1.31" port="16520" token="d5b5e269-88d4-4069-a300-2346aed4bedf">
          <service-group name="workernode" />
        </daemon>
      </distributed>

      or:

      <distributed overwrite-children="true">
        <daemon host="192.168.1.29" port="16520" token="a6c2714c-ab5f-4dba-8106-ce216b07954d">
          <service-group name="standalone" />
        </daemon>
        <daemon host="192.168.1.31" port="16520" token="d5b5e269-88d4-4069-a300-2346aed4bedf">
          <service-group name="workernode" />
        </daemon>
      </distributed>

      If one of the entries contains a FQDN see Process 1 below. If all entries contain IP addresses see Process 2 below.

  • Process 1
    • Update DNS so the FQDN of the standalone daemon host points to the new IP address — if this is not possible, see Process 2 below.
    • Change the IP addresses of all workernode daemon hosts as appropriate
    • Change the IP address of the host value within the database section
    • Change the IP address of the remotehost value within the logging > appenders > appender section
    • Save and close temp.xml
    • Run: mv temp.xml loginsight-config.xml#<largestNumber+1>
    • Make the appropriate IP change on the first worker BEFORE powering it up
    • Bring the single worker online, wait at least two minutes after the Log Insight service has started
    • Repeat the last two steps until all workers are online
  • Process 2 
    • Change the standalone and worknode daemon host entries to the new IP addresses to be used
    • Change the IP address of the host value within the database section
    • Change the IP address of the remotehost value within the logging > appenders > appender section
    • Save and close temp.xml
    • Run: mv temp.xml loginsight-config.xml#<largestNumber+1>
    • Make the appropriate IP change on the first worker BEFORE powering it up
      • Run: service loginsight stop
      • SCP the new loginsight-config.xml from the master to this worker
      • Run: service loginsight start, wait at least two minutes after the Log Insight service has started
    • Repeat the last step until all workers are online

Configuration

Log Insight configuration

If you backed up the Log Insight configuration and you wish to restore to a previous version then you should:

  • Log into the master node
  • Run: cd /storage/core/loginsight/config
  • Copy the Log Insight configuration backup to loginsight-config.xml#<largestNumber+1>
  • Starting with the master node, but on every node run: service loginsight restart

Note: Not all configuration changes require a restart, but in order to see and change UI settings after following this restore procedure you must at least restart the master node service. As such, it is recommended that you restart the Log Insight service on all nodes to be safe.

User configuration

Restoring a Cassandra snapshot can be tricky and risky so I will start with a large disclaimer:

WARNING: Improperly restoring Cassandra data can result in the permanent loss of all Cassandra data. It is highly recommended that you have a complete backup of ALL Log Insight nodes before attempting a Cassandra restore. It is also highly recommended that you use virtual appliance backups over the Cassandra snapshot restore procedure to reduce the risk of losing data or causing an outage. You have been warned.

Cassandra has three ways of restoring a snapshot. The simplest, least risky and as such recommend approach is the node restart method. Note that this method requires that the entire cluster be brought down. The procedure to run on all nodes starting with the master node would be:

  • Run (see my li_rexec post for more details):
    snapshot=; # which snapshot you wish to restore \
    li_rexec  "\
    service loginsight stop; \
    rm -rf /storage/core/loginsight/cidata/cassandra/commit/*; \
    rm -rf /storage/core/loginsight/cidata/cassandra/data/logdb/*/*.db; \
    for s in $(find/storage/core/loginsight/cidata/cassandra/data/logdb -name snapshots); do \
      cd/storage/core/loginsight/cidata/cassandra/data/logdb/$s/$snapshot; cp ./* ../../; \
    done; \
    /usr/lib/loginsight/application/lib/apache-cassandra-2.0.10/bin/nodetool -h localhost -p 7199 repair; \
    service loginsight start; \
    sleep 120"

For more information on the restore procedures see this link.

Agent

If you performed a complete agent backup then simply restoring and possibly restarting the agent is all that should be required. If you backed up just the liagent.ini file then simply replace the existing liagent.ini file. While an agent version 2.5 GA or newer will automatically detect new configuration changes, since you will be overriding an existing configuration file, it is recommended that you restart the agent after performing the restore.

Summary

  • Restoring the server is trivial given you have complete virtual appliance backups.
  • When restoring the server to a different network block, manual Log Insight configuration changes are required.
  • Restoring a backed up Log Insight configuration is as simple is copying the backup to the next greatest configuration number and restarting the Log Insight service on all nodes.
  • Restoring the agent requires putting the file(s) back in place and restarting the agent.

UPDATE: Added information about restoring the SSL certificate.

© 2015, Steve Flanders. All rights reserved.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top