When using the scalable redundant version of DDP, failover can be performed between the two DDP heads.
Performing failover means that we are exchanging the DDP heads state from master to slave for the first head, and from slave to master for the second one.
A failover can be triggered automatically in case of problem on the master head, or manually for maintenance purpose.
Please note that two interconnected heads cannot be both master, one always needs to be in slave mode until the master gets down for some reason.
You can check the current state of your DDP heads by logging into each by ssh and entering the following command:
By checking the output last digit, you can know the head state:
- Runlevel 2 stands for master mode
- Runlevel 3 strands for slave mode
- Runlevel 4 indicates that the DDP is in maintenance mode
A very simple way to perform a manual failover is to power off the master head and let the slave one automatically go to master mode. Once the state change is over, the runlevel will be indicated as 2 for the second head. Please note that it might take several minutes to the slave head to become a master, thus the runlevel will still show "runlevel 3" until the procedure finishes.
If you need to perform a software failover for some reason, you can manually assign the master head to runlevel 3 with the following command:
$ telinit 3
This put the head in slave mode and bring the other head to master mode after a few minutes.
You can also assign the maintenance mode with the following command:
$ telinit 4
Putting a head to the maintenance mode will break the normal failover mechanism until it’s reassigned to runlevel 2 or 3, therefore you won’t be able to use it as a master if the other head gets down for some reason.
Let's assume you have a redundant system with 2 DDP heads which are connected to two DDP16EXs or JBODs:
The picture below shows an example of how the DDP heads are typically plugged together:
Note that on each DDP head, the first onboard 1GbE port is not called "NIC1" but "fallback 1" as the DDP heads are plugged each other using that port. Hence the first NIC that you can use for connecting your client machines is not the first one, but the second one on the left as shown below:
Also note that in each there is an internal RAID card which doesn't have any SAS port. These internal RAID cards don't need to be connected together.
On a non redundant DDP system, NIC1 is the first onboard port on the left, whereas NIC2 is the second one and is generally dedicated for service connection as shown below:
On a redundant DDP, the first onboard port on the left is called fallback 1 and is dedicated for failover, whereas the second port is called NIC1 and can be used either for connecting a client machine, either for service connection as shown below:
When the slave head detects that the master head does not run the DDP program anymore, it automatically takes over and goes to master mode. That is the purpose of the fallback 1 port: enabling each DDP head to see each other and to trigger failover so that one head takes over the other one in case of software/hardware problem.
However, redundant DDP systems do not rely only on the fallback 1 ports. Failover is also based on the RAID cards.
This is why each master head RAID card is linked to its corresponding slave head RAID card, generally via a red 1GbE cable.
The first RAID card on the left of each DDP head is called fallback 2. If there is a second RAID card in each head, their ethernet ports are called fallback 3, and so on for additional RAID cards.
It is very important that each fallback cables are plugged in the relevant ports as the failover system relies on all of them. Hence when a problem happens on the RAID cards level, a failover will be triggered.
As on a typical redundant system, the DDP heads are plugged to the same JBODs, they have access to the same data. This is why regardless of which head is the master, the DDP volumes data stay the same.
On our example above, each DDP head is connected to two JBODS. There are two RAID cards per DDP head, hence each RAID card is connected to one of the JBODs.