EMC storage, DAE failures, and vertical striping

EMC’s best practice for creating storage pools and RAID groups on mid to low end storage arrays (e.g. CLARiiON or VNX) has always been to create them using disks from a single Drive Array Enclosure (DAE). This configuration is sometimes referred to as horizontal striping. You may be wondering why this was the case as a complete DAE failure would result in data unavailablity for all pools/groups on the DAE. As such, you may be tempted to create pools/groups across multiple DAEs, which is sometimes referred to as vertical striping. To understand the reasons you need to first understand the physical configuration of EMC mid to low end storage arrays and then some DAE failure scenarios.

From a physical perspective, DAEs are daisy chained together and originate from a bus on the back-end of a Storage Processor (SP). These chains can vary in size and are dependent on the number of buses on the SPs and the number of DAEs in the storage array. For example purposes, let’s assume you have five DAEs configured on a single bus (this example will use bus 2 as bus 0 would typically also house the Disk-Array Processor Enclosure or DPE).
DAE and LCC connectivity
With an understanding of how the storage array connects physically, let’s discuss some DAE failure scenarios that could occur. In the case of a DAE failure when using horizontal striping, all pools/groups on the DAE will be unavailable. Let’s assume in the five DAE example above that every DAE has been carved up into three RAID 5 (4+1) groups. Let’s also assume enclosure 1 goes down as illustrated below.
DAE failure horizontal
As you can see in the above example, when enclosure 1 goes down all LUNs on enclosure 1 become unavailable. In addition, if any DAE except the last one in the chain (enclosure 4 in this example) goes down and does not enter into a pass-through state then everything above the DAE in the chain will also be unavailable. This would be considered a worse-case scenario and is represented with the illustration below.
DAE failure horizontal worse case
In the case of a DAE failure when using vertical striping, the impact depends on where the DAE failure occurs. Leveraging the previous example, let’s assume the RAID 5 pools/groups are now configured vertically. Let’s again also assume enclosure 1 goes down. Due to the RAID configuration, no data availability issues are experienced. (While the illustration below only shows a single pool/group, assume every drive is configured and impacted the same way.)
DAE failure vertical
In addition, if any DAE except the last in the chain (enclosure 4 in this example) goes down and does not enter into a pass-through state then everything on the bus becomes unavailable. This is because the RAID group is configured for RAID 5 (4+1) and at least two drives would be unavailable in this scenario. This would be considered a worse-case scenario and is represented with the illustration below. (While the illustration below only shows a single pool/group, assume every drive is configured and impacted the same way.)
DAE failure vertical worse case
As you can see, it depends on how pools/groups are configured and what RAID level is used to determine the impact of horizontal versus vertical striping. In most cases, the impact of horizontal striping is less than the impact of vertical striping. If vertical striping is configured appropriately, taking into account all EMC storage array failures, then it will increase redundancy and availability over horizontal striping.
EMC has set the best practice of configuring pools/groups to horizontal striping since this configuration is optimal under most situations, is more easily understood, and typically provides adequate redundancy and availability. While vertical striping does work, it is considered a more advanced configuration with additional complexity that may be overlooked.
From personal experience, I can say I have always used vertical stripping as I have seen failures on storage arrays and work in operations where down time is not an option. One good example of this use case would be if you have a mission critical LUN that could not go down. In a horizontal pool/group configuration, the LUN would be lost of the DAE hosting it was impacted. Of course, the probability is low, but it is possible.
My advice would be to stick to the best practice unless you have a good reason not to and be sure you understand fully how vertical striping works before considering it. If you are thinking about moving to vertical striping, be sure to discuss with your local vSpecialist to ensure you are taking everything into account.

© 2011, Steve Flanders. All rights reserved.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top