I have heard of a few people who have restarted the Log Insight 2.5 virtual appliance and found that the Log Insight service failed to start with a Cassandra error stating it cannot connect. In this post, I would like to discuss how you can address the issue.
Symptoms
If you restart the Log Insight service or restart the Log Insight virtual appliance then Log Insight may fail to start with an error similar to:
# service loginsight start Log Insight did not shutdown cleanly. Cleaning up. Starting Log Insight... Error message unavailable. StartupException(description:com.VMware.loginsight.daemon.LogInsightDaemon$StartupFailedException: Daemon startup failed: Failed to start Cassandra Server: StartupException(description:Unable to connect to Cassandra node at localhost:9042: com.VMware.loginsight.cassandra.CassandraException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1 (com.datastax.driver.core.TransportException: [localhost/127.0.0.1] Cannot connect))).) at com.VMware.loginsight.daemon.protocol.commands.DaemonCommands$waitUntilStarted_result.read(DaemonCommands.java:5818) at com.VMware.loginsight.daemon.protocol.commands.DaemonCommands$Client.recv_waitUntilStarted(DaemonCommands.java:1832) at com.VMware.loginsight.daemon.protocol.commands.DaemonCommands$Client.waitUntilStarted(DaemonCommands.java:1808) at com.VMware.loginsight.daemon.protocol.commands.DaemonCommands$PooledClient.waitUntilStarted(DaemonCommands.java:1476) at com.VMware.loginsight.daemon.protocol.commands.DaemonCommands$ClientStub.waitUntilStarted(DaemonCommands.java:375) at com.VMware.loginsight.daemon.protocol.commands.DaemonCommands$ClientStub.waitUntilStarted(DaemonCommands.java:389) at com.VMware.loginsight.daemon.DaemonClient.waitUntilStarted(DaemonClient.java:56) at com.VMware.loginsight.admintool.LICommandlineAdminTool.startDaemon(LICommandlineAdminTool.java:1287) at com.VMware.loginsight.admintool.LICommandlineAdminTool.access$300(LICommandlineAdminTool.java:85) at com.VMware.loginsight.admintool.LICommandlineAdminTool$StartTask.process(LICommandlineAdminTool.java:562) at com.VMware.loginsight.admintool.LICommandlineAdminTool$StartTask.process(LICommandlineAdminTool.java:539) at com.VMware.loginsight.commons.actions.commandline.AbstractCommandLineAction.handleOwnArgs(AbstractCommandLineAction.java:149) at com.VMware.loginsight.commons.actions.commandline.AbstractCommandLine.handle(AbstractCommandLine.java:42) at com.VMware.loginsight.commons.actions.commandline.AbstractCommandLineCategory.handleOwnArgs(AbstractCommandLineCategory.java:97) at com.VMware.loginsight.commons.actions.commandline.AbstractCommandLine.handle(AbstractCommandLine.java:42) at com.VMware.loginsight.admintool.LICommandlineAdminTool.run(LICommandlineAdminTool.java:467) at com.VMware.loginsight.admintool.LICommandlineAdminTool.main(LICommandlineAdminTool.java:335)
Note: This issue could impact standalone instances and/or one or more nodes in a cluster, but ONLY applies to Log Insight version 2.5.
Cause
The reason Log Insight fails to start is because the Cassandra database takes too long to start. There are several reasons why the database could take a long time to start including:
- Log Insight is improperly sized and/or Log Insight is resource constrained. Please refer to the Getting Started guide for sizing requirements.
- The database requires repair. While repair operations runs automatically they may fail for a variety of reasons.
Resolution
VMware has released a KB article with the steps required to manually perform a repair operation on the database, which should allow Log Insight to start. Given that the process requires multiple manual steps, I have written a quick script. To use the script, simply run the command and specify the repair flag:
# sh li-cassandra.sh USAGE: li-cassandra.sh [--status | --repair] # sh li-cassandra.sh --repair USAGE: li-cassandra.sh --repair --force WARNING: This command will stop the Log Insight service. # sh li-cassandra.sh --repair --force Stopping Log Insight...done Starting Cassandra.....done Waiting for node(s)....done Repairing Cassandra....done Stopping Cassandra.....done Starting Log Insight...done
That’s it! Of course, if you are experiencing a different issue the last step of starting Log Insight will likely fail. Note that there is no harm in running a repair operation so when in doubt try this script. With that, here is the script:
#!/usr/bin/env bash CASSANDRA_BIN=/usr/lib/loginsight/application/lib/apache-cassandra-*/bin CASSANDRA_CONF=/storage/core/loginsight/cidata/cassandra/config status(){ $CASSANDRA_BIN/nodetool status -k logdb exit } repair() { if [ "$1" != "--force" ]; then echo "USAGE: $0 --repair --force" echo "WARNING: This command will stop the Log Insight service." exit fi echo -n "Stopping Log Insight..." OUTPUT=$(service loginsight stop) if [ "$?" -ne "0" -a "$?" -ne "52" ]; then echo "FAILED" echo echo "$OUTPUT" exit else echo "done"; fi echo -n "Starting Cassandra....." OUTPUT=$($CASSANDRA_BIN/nodetool stopdaemon) OUTPUT+=$($CASSANDRA_BIN/cassandra) if [ "$?" -ne "0" ]; then echo "FAILED" echo echo "$OUTPUT" echo echo "Unable to start Cassandra, check the above output and cassandra.log for clues." exit else echo "done"; fi echo -n "Waiting for node(s)...." for (( i=1; i<5; ++i)); do sleep 30 if [ "$($CASSANDRA_BIN/nodetool status | grep rack1 | wc -l)" -eq "$($CASSANDRA_BIN/nodetool status | grep '^UN ' | wc -l)" ]; then echo "done" echo -n "Repairing Cassandra...." OUTPUT=$($CASSANDRA_BIN/nodetool flush) OUTPUT+=$($CASSANDRA_BIN/nodetool repair) echo "done" echo -n "Stopping Cassandra....." OUTPUT+=$($CASSANDRA_BIN/nodetool stopdaemon) if [ "$?" -ne "0" ]; then echo "FAILED" echo echo "$OUTPUT" exit else echo "done"; fi echo -n "Starting Log Insight..." OUTPUT=$(service loginsight start) if [ "$?" -ne "0" ]; then echo "FAILED" echo echo "$OUTPUT" exit else echo "done"; fi exit fi done echo "FAILED" echo echo "ERROR: Cluster is not fully up after two minutes" $CASSANDRA_BIN/nodetool status | grep rack1 echo echo -n "Shutting down Cassandra..." OUTPUT=$($CASSANDRA_BIN/nodetool stopdaemon) if [ "$?" -ne "0" ]; then echo "FAILED" echo echo "$OUTPUT" echo echo "WARNING: Do not start Log Insight until Cassandra has been stopped!" echo "In worst case, restart the virtual appliance." exit else echo "done"; fi echo "done" echo echo "Ensure all other nodes in the cluster are online. You can run this script again to attempt a repair" exit } func=$(echo $1 | awk '{split($0,a,"-"); print a[3]}') $func $2 2>/dev/null echo "USAGE: $0 [--status|--repair]" exit
© 2015, Steve Flanders. All rights reserved.
Know what? This just fixed my problem with vRLI 8.4. Crazy! Thank you!
It has been a few years, but the technology remains similar — I am glad this post helped!