Log Insight: Disconnected Node

If you are running a Log Insight cluster then you have probably seen the Status column of the node table available from the Administration > Cluster page. Under normal operating conditions, all nodes should be shown as Connected. In this post, I would like to focus primarily on the Disconnected status including what it means and the logic for triggering it.
li-cluster

Background

Here is a screenshot of the node table available from Administration > Cluster:
li-25-worker-status

IMPORTANT: Log Insight requires a minimum of 3-nodes in a cluster.

Possible statuses include:

  • Connected
  • Disconnected
  • Maintenance

What does Disconnected mean?

Disconnected means the node is not accessible by the cluster. This means it cannot participate in ingestion or queries. In addition, it cannot receive cluster configuration. You can think of it as a temporary removal state that can be recovered from.

IMPORTANT: If you remove a disconnected node you cannot join it back to the cluster!

What would cause a node to be Disconnected?

A variety of reasons including:

  • VA crash/shutdown
  • ESXi crash/shutdown
  • Network isolation/partitioning
  • Severe resource contention
  • DoS attack

How is Disconnected determined?

The logic is as follows:

  • Every minute a health check is run
  • If the health check fails, the node is marked as suspect (not visible in the UI)
  • If a node receives three consecutive failed attempts it is marked Disconnected (visible in the UI)
  • If a node is marked suspect, but passes the second or third check then the suspect mark is removed
  • If a node is in Maintenance than health checks are skipped

This means it will take a maximum of four minutes and a minimum of three minutes for a node to be marked as disconnected (1 minute per check x 3 failed checks + time when issue happen which must be less than 60 additional seconds, but at least 1 additional second).

IMPORTANT: You should not attempt to upgrade a cluster with one or more disconnected nodes!

Summary

A disconnected node indicates a temporary state where the node is not currently participating in the cluster. There are many potential causes for a disconnected node, but given the current health checks, seeing a disconnected node indicates a real issue that needs to be investigated.

© 2015, Steve Flanders. All rights reserved.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top