For environments where the use of a load balancer is not feasible because of say cost or complexity I often use Corosync to provide similar functionality. Corosync is a piece of software that allows for clustering of an application to provide high availability. One issue I have often experienced with the tool is that the error messages are not descriptive making troubleshooting difficult.
As an example, I have used Corosync to cluster syslog servers in the past. In one such environment I had a pair of syslog servers in an active-standby configuration with a VIP. While the VIP came up as expected the syslog server reported an unknown error as shown below.
[email protected]:/home/test$ sudo crm status ============ Last updated: Wed Jan 23 00:00:47 2013 Last change: Tue Jan 22 23:49:38 2013 via cibadmin on log01 Stack: openais Current DC: log01 - partition WITHOUT quorum Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 2 Nodes configured, 2 expected votes 4 Resources configured. ============ Online: [ log01 ] OFFLINE: [ log02 ] Resource Group: log_svr vip (ocf::heartbeat:IPaddr2): Started log01 Failed actions: log_svc:0_start_0 (node=log01, call=6, rc=1, status=complete): unknown error
So what was causing the error and how can you clear it up?