Comparison of HP 3PAR Online Import and Dell/EMC SANCopy

In the market each storage vendor has their unique technology features for data migration. For example, Dell/EMC vPLEX Encapsulation, MirrorView/S/A, SANCopy, HP 3PAR Online Import and 3PAR Peer Motion etc. Today we will discuss the difference between Dell/EMC SANCopy and HP 3PAR Online Import, and list out their advantage and disadvantage.The following diagram is the detail architecture for data migration by EMC SANCopy and HPE 3PAR Online Import.

 

The architecture diagram for migration host by EMC SANCopy;

  • Source Array – HP 3PAR StoreServ 7200 (OS 3.2.2)
  • Target Array – EMC VNX5200 (VNX OE 33)
  • SAN Switch – 2 x Brocade DS-300B
  • Migration Host – Micrsoft Windows 2008 R2
  • Migration Method – EMC SANCopy (Push Mode)

 

FSM_Diagram.png

 

Execute the data migration by SAN Copy Create Session Wizard on EMC Unisphere.

 

untitled.png

 

The architecture diagram for migration host by HP 3PAR Online Import.

 

  • Source Array – EMC VNX5200 (VNX OE 33)
  • Target Array – HPE 3PAR StoreServ 7200 (OS 3.2.2)
  • Migration Host – Micrsoft Windows 2008 R2
  • Migration Management Host – HP 3PAR Online Import Unity 1.5 & EMC SMI-S provider 4.6.2
  • SAN Switch – 2 x Brocade DS-300B
  • Migration Method – HP 3PAR Online Import

FSM_Diagram2.png

 

Execute the data migration by HP 3PAR Online Import Utility CLI Commands.

addsource -type CX -mgmtip x.x.x.x -user <admin> -password <password> -uid <Source Array’s WWN>
adddestination –mgmtip x.x.x.x –user <admin> –password <password>
createmigration -sourceuid <Source Array’s WWN> -srchost <Source host> -destcpg <Target CPG> -destprov thin -migtype MDM -persona “WINDOWS_2008_R2”

 

10-.png

 

Below table is the comparison of EMC SANCopy and HP 3PAR Online Import;

 

table.png

 

And the following is the pros and cons of each migration method.

 

EMC SANCopy

Pros:

  • It can be migrated each source LUN to the target array one by one.
  • Any FC ports can be configured as SANCopy port on each Storage controller, and SANCopy port and host port can be running at the same time.
  • All migration operation can be executed on EMC Unisphere (VNX management server), optional migration server installation is not required.
  • SANCopy license is bundled on VNX storage.

Cons:

  • SANCopy is not supported incremental mode if the source array is 3rd party model.

 

HP 3PAR Online Import

Pros:

  • The destination HP 3PAR StoreServ Storage system must have a valid HP 3PAR Online Import or HP 3PAR Peer Motion license installed. By default it has 180 days Peer Motion temporary license installed.

Cons:

  • Each migration definition cannot be migrated each source LUN to the target array one by one, ie it will migrate all LUNs to target array if it has 3 three LUNs on EMC Storage Group when it starts migration session.
  • All migration definition can only be executed on 3PAR Online Import Unity which is the other management host for data migration.

 

https://community.emc.com/message/964379#964379

 

Storage Basic

File‐based storage organizes data into files and directories within a hierarchical directory‐ based structure.

Block‐based storage organizes data as a sequence of bits or bytes of a fixed length or size. To use block storage, you must first create a volume or logical unit number (LUN) — a logical abstraction or virtualization layer between a physical storage device/volume and applications.

A hard disk drive (HDD) stores data using different magnetic patterns on a spinning disk. HDD performance is typically hundreds of input/output operations per second (IOPS) with a latency of milliseconds.

A solid‐state device (SSD) stores data electrically in non‐ volatile memory using NAND flash technology. SSD storage performance is typically tens of thousands to millions of IOPS with a latency of microseconds.

Data management refers to various storage operations, including clones, replication, and snapshots. A clone creates a copy of a storage volume. Replication duplicates data in real time to another physical storage device. A snapshot consists of an initial point‐in‐time clone of a storage volume and subsequent copies of only the changes that have occurred in the storage volume since the initial clone.

SSD vs. HDD: What’s the Difference?

http://www.pcmag.com/article2/0,2817,2404258,00.asp

 

Definition and Backup Rotation

One of the key elements of every data backup is the definition of the rotating scheme so the protection was guaranteed at least one day back. The best rotating scheme of data carriers is the one, which can guarantee data copies as long, extensive and varied as possible.

Data backup and its consecutive storage for more than one day are necessary.

Nevertheless the costs or time required for full everyday-backup can be impractical, especially for companies with huge amount of data. That’s why many users apply either differential or incremental backups on most of the workdays.

Types of Backup

Full Backup – during full back up the selected files are backed-up and their Archive attribute is removed simultaneously. Attribute is instrumental to distinguish backed-up and non backed-up data. When the file content is changed, the Archive attribute is set again. Full backups are usually a pre-step before using Incremental and Differential backups, which help to save time necessary for backup performing. If the full backup is performed, it is enough to restore only this one backup for restoration of the original state.


Incremental Backup – during this type of backup only the files with built-up Archive attribute are backed-up and this attribute is deleted afterwards. Thus only the files that have been changed (or the ones with attribute Archive set manually since the last full backup) are backed-up. The backup is significantly shorter than the full backup, that’s why it is usually used for backup during workweek. In this case the restoration of this backup is not enough for the original state restoration. In case of server or disk array breakdown it is necessary to restore the last full backup first and then all Incremental backups chronologically from the oldest one till the newest one created at the time after the last full backup. This means that incremental backups are indeed faster to create but it takes more time to restore because of the necessity to restore number of backups.  Probability of a big breakdown is not that likely so you can definitely make the best account of the incremental backup.

 

Differential Backup – during the differential backup only the files with built-up Archive attribute are backed-up and this attribute is not deleted afterwards.  Thus only the files that have been changed (or the ones with attribute Archive set manually since the last full backup) are backed-up. The backup is significantly shorter than the full backup so it is also used for backup during workweek. The restoration of this type of backup is not enough to restore system to the original state either. In case of server or disk array breakdown it is necessary to restore the last full backup first and then the last differential backup created at the time after last Full backup. This implies that the differential backups are comparably fast as the incremental concerning the time for creation; it takes less time to restore data thanks to the necessity to restore only one differential backup, while there is a dependence on how many incomplete backups have passed since the last full backup. The first day after the full backup the time for incremental and differential backup is the same; within following days the time of differential backup rises but the time for restoration relatively decreases.

If you choose the method of incremental or differential backup as an accessory to the full backup depends only on the environment of your company. In these cases it is better to have the analysis and policies proposals made by specialists so it is possible to avoid mistakes of badly designed backup policy, which can cause wrong invested resources into the storage.

After short introduction to available types of backup we can briefly describe the most widely used methods of tape rotations using the types of backup mentioned above.

Tape Rotations

Round Robin (scheme with one tape per each day)

We gain the simplest scheme of tape rotation by reservation of one tape for every day of workweek. Tapes are labelled (Monday, Tuesday, Wednesday, Thursday and Friday). Each day a full backup of data prepared for backup is made on the relevant tape. This rotation enables the data restoration with maximal time shift backwards – one week. The scheme is suitable for application in small companies with usage of internal or external tape drive or NAS device with created VDL (virtual disk library), which can serve as a primary storage. This solution is also suitable there, where it is possible to perform a full backup every day and the time shift of one week backwards is sufficient.

Grandfather-Father-Son (GFS)

The method of ‘Grandfather-Father-Son’ backup scheme belongs to the most widely used schemes. This scheme uses daily (Son), weekly (Father) and monthly (Grandfather) medium sets. Four medium sets are titled everyday backup of workweek (i.e. Monday to Thursday). On these medium sets (titled as Son in the scheme of GFS) the incremental backups take place. These medium sets (Son) are repeatedly rewritten during the next week. Another group of five medium sets, which is included in the GFS scheme, are medium sets titled as Week 1, Week 2, and so on (Father). See the picture:

On these medium sets (Father) the full backups take place every week, the Son medium sets are not used and the expiration time of ‘Father’ group is one month. Then their rewriting follows. The final medium set ‘Grandfather’ consists of 3 medium sets (medium set may be composed of one or many tapes) and it is titled as ‘Month 1, Month 2, Month 3, and so on’. On these medium sets the follow-up rewriting takes place once per three months and more (depends on how many sets are devoted to ‘Grandfather’ group. The expiration of these sets (it is the possibility of another rewriting) is adjusted according to the number of medium sets in Grandfather Group. Each ‘medium set’ of tape groups (Son, Father or Grandfather) is either individual or a set of tapes. That is dependent on the size of backed-up data. The total number of used medium sets in the GFS backup scheme is twelve. By reason of tape wear and by reason of keeping a longer history (archiving) it is recommended to change the medium sets for new ones in a certain time period.

Tower of Hanoi 

The scheme of Tower of Hanoi draws from a logical game, which has its origins in China. The objective of the game is to move five disks from one rod to another with minimal number of moves. Only one disk must be moved at a time and no disk may be placed on top of a smaller disk. It has been proven that the fewest number of moves is 31. The method of Tower of Hanoi uses five medium sets for backup:

  • Medium set A is used every other day
  • Medium set is used every fourth day
  • Medium set is used every eighth day
  • Medium sets and E  used in turns every sixteenth day

The planning of Tower of Hanoi is following:

The backup starts on medium set ‘A’ and then it continues every other day. The next backup takes place on medium set ‘B’ (but not on the day of medium set ‘A’ backup) and then it repeats every fourth backup. Medium set ‘C’ starts not on the day of ‘A’ and ‘B’ backups and repeats every eighth backup.  The policy of D and E Sets is adjusted this way. The first backup does not start on the day of A, B or C Sets backup and repeats every sixteenth backup.

The quality of this scheme is first of all the possibility to add a new medium set and to gain more history of backup this way (GFS likewise). More often used medium sets contain new file duplications, whereas the less common used medium sets contain older file versions.

This scheme is quite difficult to administrate manually. Because of that it is highly recommended to use backup software with the option of scheduling of whole process (i.e. NetVault 7.1) and especially while using of tape autoloader (i.e. Tandberg, autoloader SLR140). Or to use a more suitable solution with more slots such as tape libraries (i.e. ADIC Scalar 24, ADIC Scalar 100) for sufficient number of medium sets involving tapes for backup, archiving and disaster recovery solution. As well as the Grandfather-Father-Son scheme the Tower of Hanoi enables to periodically take out the medium set with the view of archiving.

In Fine

Nowadays backup trends bring usage of primary and secondary data storage to the backup schemes for higher safety. NAS (Network Attached Storage) works as a primary data storage in most of companies, where the workweek backups are performed on disks and then are made migrate into tape drives of tape libraries.

At the whole solution design it is good to start from the equations below for calculation of number of needed tapes enabling a safe backup, data archiving and Disaster recovery Solution. Next time we will talk about usage of primary storage built up on NAS devices (i.e. Iomega NAS p400/p800) and subsequent data migration into the secondary storage built up on a usage of backup schemes.

Calculation of number of tapes needed for backup including archiving and Disaster Recovery:

Tapes dedicated to backups
Xs = D * T * S * R + N
Xs = number of tapes needed for backup for a period of one year
D = number of backup drivers
T = number of tapes in a media set
S = number of media sets in a backup scheme
R = number of backup scheme rotations per year

Tapes dedicated to archiving
Xa = T * S * A
Xa = number of tapes needed for archiving
T = number of tapes needed for backup duplication of each server
S = number of servers
A = number of archiving sets per year

Tapes dedicated to Disaster Recovery
Xr = T * S * R
Xr = number of tapes needed for recovery
T = number of tapes needed for Disaster Recovery backup of one server (see archiving)
S = number of servers (see archiving)
R = number of required disaster recover rotations per year

Total annual consumption of data tapes
X = Xs + Xa + Xr + R
X = total number of tapes needed for a period of one year
Xs = number of tapes needed for backup for a period of one year
Xa = number of tapes needed for archiving
Xr = number of tapes needed for recovery
R = approximate number of tapes that will be necessary to replace by new ones

http://www.storage.cz/en/specialized-section/detail/id/46-definition-and-backup-rotation

Understanding RAID Penalty

Determining which type of RAID to use when building a storage solution will largely depend on two things; capacity and performance. Performance is the topic of this post.

We measure disk performance in IOPS or Input/Output per second. One read request or one write request = 1 IO.  Each disk in you storage system can provide a certain amount of IO based off of the rotational speed, average latency and average seek time.  I’ve listed some averages for each type of disk below.

Disk Speed      IOPS

15,000              175

10,000             125

7,200               75

5,400                50

sources: 

http://www.techrepublic.com/blog/datacenter/calculate-iops-in-a-storage-array/2182

http://www.yellow-bricks.com/2009/12/23/iops/

http://en.wikipedia.org/wiki/IOPS

So for some basic IOPS calculations we’ll assume we have three JBOD disks at 5400 RPM, we can assume that we have a maximum of 150 IOPS.  This is calculated by taking the number of disks times the amount of IOPS each disk can provide.

But now we assume that these disk are in a RAID setup.  We can’t get this maximum amount of IOPS because some sort of calculation needs to be done to write data to the disk so that we can recover from a drive failure.   To illustrate lets look at an example of how parity is calculated.

Lets assume that we have a RAID 4 system with four disks.  Three of these disks will have data, and the last disk will have parity info.   We use an XOR calculation to determine the parity info.  As seen below we have our three disks that have had data written to them, and then we have to calculate the parity info for the fourth disk.  We can’t complete the write until both the data and the parity info have been completely written to disk,  in case one of the operations fails.  Waiting the extra time for the parity info to be written is the RAID Penalty.

Disk1               Disk2              Disk3              Parity Disk

10101010         00001101       00011110       10111001

Notice that since we don’t have to calculate parity for a read operation, there is no penalty associated with this type of IO.  Only when you have a write to disk will you see the RAID penalty come into play.  Also a RAID 0 stripe has no write penalty associated with it since there is no parity to be calculated.  A no RAID penalty is expressed as a 1.

RAID                   Write Penalty

0                           1

1                            2

5                            4

6                            6

DP                         2

10                          2

RAID 1

It is fairly simple to calculate the penalty for RAID 1 since it is a mirror.  The write penalty is 2 because there will be 2 writes to take place, one write to each of the disks.

RAID 5

RAID 5 is takes quite a hit on the write penalty because of how the data is laid out on disk.   RAID 5 is used over RAID 4 in most cases because it distributes the parity data over all the disks.  In a RAID 4 setup, one of the disks is responsible for all of the parity info, so every write requires that single parity disk to be written to, while the data is spread out over 3 disks.  RAID 5 changed this by striping the data and parity over different disks.

Disk1           Disk2             Disk3            Disk4

Data             Data               Data              Parity

Data             Data               Parity            Data

Data             Parity            Data               Data

Parity           Data              Data               Data

The write penalty ends up being 4 though in a RAID 5 scenario because for each change to the disk, we are reading the data, reading the parity and then writing the data and writing the parity before the operation is complete.

RAID 6

RAID 6 will be almost identical to RAID 5 except instead of calculating parity once, it has to do it twice, therefore we have three reads and then three writes giving us a penalty of 6.

RAID DP

RAID DP is the tricky one.  Since RAID DP also has two sets of parity, just like RAID 6, you would think that the penalty would be the same.  The penalty for RAID DP is actually very low, probably because of how the Write Anywhere File Layout (WAFL) writes data to disk.  WAFL will basically write the new data to a new location on the disk and then move pointers to the new data, eliminating the reads that have to take place.  Also, these writes are written to NVRAM first and then flushed to disk which speeds up the process.  I welcome any Netapp experts to post comments explaining in more detail how this process cuts down the write penalties.

Calculating the IOPS

Now that we know the penalties we can figure out how many IOPS our storage solution will be able to handle.  Please keep in mind that other factors could limit the IOPS such as network congestion for things like iSCSI or FCoE, or hitting your maximum throughput on your fibre channel card etc.

Raw IOPS = Disk Speed IOPS * Number of disks

Functional IOPS = (Raw IOPS * Write % / RAID Penalty) + (RAW IOPS * Read %)

To put this in a real world example, lets say we have five 5400 RPM disks.  That gives us a total Raw IOPS of 250 IOPS.  (50 IOPS * 5 disks = 250 IOPS).

If we were to put these disks is a RAID 5 setup, we would have no penalty for reads, but the writes would have a penalty of four.  Lets assume 50% reads and writes.

(250 Raw IOPS * .5 / 4) + (250 * .5) = 156.25 IOPS

 

http://theithollow.com/2012/03/21/understanding-raid-penalty/