Log Insight: Upgrade Failed

While Log Insight strives to make administration easy, issues may be experienced from time-to-time. On rare occasion, I have heard of a Log Insight upgrade failing. The question becomes what should you do if the upgrade fails? Read on to learn more!

Prevention

The best way to ensure a Log Insight upgrade goes smoothly is to prevent any issues in the first place. Before you upgrade your production environment, please validate the following:

You are running a supported configuration
You have read and understand the release notes
You have read and understand the documentation around upgrading
You have tested the upgrade in a non-production environment
You have taken a snapshot of your Log Insight instance/cluster before attempting to upgrade

Success!

The default behavior you should experience when performing a Log Insight upgrade is success. This, of course, assumes you took care of all the prevention steps above.

Failed, now what?

If the upgrade fails, Log Insight should give you an error message to point you in the right direction. In addition, if the upgrade had started and failed at some stage, Log Insight should roll-back the upgrade and get the cluster back online in a healthy state. In regards to what to do, I would recommend:

Ensure Log Insight is online and accepting traffic — assuming your production environment was being upgrade the first goal should be to restore service
- If not, determine if a roll-back is underway by checking the /admin/cluster page — if so wait
- If down and not rolling back or finished rolling back so if anything obvious can be done to get you back online
- If down, not rolling back, and nothing obvious then restore from backup
Determine what step of the upgrade failed, what the current state of the cluster is, and what the error message indicates
- Prevalidation — if upgrade failed here the cluster will automatically recover, address the reported error and try again
- Failed on a node and rolled back — see the error message to determine what can be done to address the issue reported and try upgrade again
- Failed on a node and did not roll back — see the error message to determine what can be done to address the issue reported and try to upgrade failed node again
If unable to resolve issue, check for KBs
- Log Insight 3.0 doesn’t start after a failed upgrade to 3.3
- Upgrade to vRealize Log Insight 4.0.0 fails with error “State mismatch on bucket”

Failed, what NOT to do…

If the upgrade failed and does not automatically recover (i.e. roll-back) then you have entered an error state. You should attempt to get out of the error state as quickly as possible as the system may not behave as expected. With that said, you do not want to make the situation worse. Below is a list of things I strongly advise you DO NOT DO in an error state.

Make any configuration changes — this includes any operation in the Administration UI or on /internal/config
- Any change made will likely be lost upon addressing the underlying issue
- Any change may make the situation worse
Make any changes to content — this includes content packs as well as user content
- Any change made will likely be lost upon addressing the underlying issue

The first recommendation should not be a surprise while the second one may be. To reiterate the first one, DO NOT attempt to remove or replace nodes in the cluster while in this error state. Check the release notes, check the KBs, check the logs, open a support case with GSS, DO NOT under any circumstances attempt to remove or replace nodes.

Log Insight: Upgrade Failed

Prevention

Success!

Failed, now what?

Failed, what NOT to do…

Related

Steve Flanders

Leave a Reply Cancel reply

Prevention

Success!

Failed, now what?

Failed, what NOT to do…

Share this:

Related

Steve Flanders

Leave a Reply Cancel reply