RAID disk backup options: deep dive

Tags: ACG

In RAID snapshots on EC2 we talk about what RAID is and why you need to take special care when backing it up. In this article we will explore the applied options for doing this.

Applied RAID backup

The key is to flush the caches and ensure that all the disks in the set are idle and consistent with each other (i.e. none are still having data written to them out of sync with the others).

There are two parts to the snapshot:

  1. A scan is done of all the blocks to determine which have changed since the last snapshot. These are the ones that need to be copied. This is a fast process taking seconds or maybe up to a minute for very large volumes. It is during this process that the disks / volumes need to be idle, and as noted it is not normally a long period.
  2. The copying of the blocks from the disk to that snapshot volume.This process can take hours to complete. The good news is that it does not need exclusive access, the bad news is that it requires a lot of I/O and will compete with your normal business I/O.

    This is important to remember when calculating the disk I/O needed for your app. You need to allow for your Maintenance load as well as your Business load.

The normal options

  1. Shut the server down. This absolutely assures the caches are flushed and the disk are idle, however it takes a long time to shut a server down and restart it. Minutes to tens of minutes if it is a large interactive systems like an enterprise Database or Mail service installation.
  2. Stop the Application that is using those disks and wait for the caches to time out or manually flush them. If the volume is only used by that app and the app is shutdown you can have a high confidence that the disks are idle and in sync. You can detach the disk if you want to be 100% certain, but then you need to add time to reattach them. Stopping and starting the application can still take quite a while, but will be faster that stopping and starting the whole server.
  3. Most applications that deal with large data volumes like SQL, Oracle & Exchange have built in services to 'quiesce' the volume (which is fancy tech speak for stop the app from writing to the disk and flush the cache). These often have hooks to call the backup program to start the snapshot/backup and waits for an OK signal back. This is fast because the app knows exactly what it needs to do to prepare the vol for a backup and can do this in seconds to maybe a minute.
  4. Finally there are backup programs that work at a very low level with the application, disk and caches. In many cases these simply call the quiesce as mentioned above, however they have the advantage that they also talk to the backup software and know exactly when it is safe to release the disk back to the app.

Exam questions about RAID Backup

For many students they will never need to do this in the real world, but are concerned about how to assess the options in an exam questions. A key point to remember is that is that it is not necessary for the Snapshot to finish. You only need to wait until the disk is scanned for changed blocks and the redirection process is started. So anything that says 'wait for snapshot to finish' is wrong.

  1. 1. Detach EBS volumes, 2. Start EBS snapshot of volumes, 3. Re-attach EBS volumes

    Plausible not fast, but not the slowest.

  2. 1. Stop the EC2 Instance. 2. Snapshot the EBS volumes

    The slowest but highest confidence factor.

  3. 1. Suspend disk I/O, 2. Create an image of the EC2 Instance, 3. Resume disk I/O

    Fast, but 'Create Image' has special meaning and may not be what you are looking for.

  4. 1. Suspend disk I/O, 2. Start EBS snapshot of volumes, 3. Resume disk I/O

    Fast, and 'start EBS snapshot' sounds right. 

  5. 1. Suspend disk I/O, 2. Start EBS snapshot of volumes, 3. Wait for snapshots to complete, 4. Resume disk I/O

    Starts Fast, but waiting for the Snapshot to finish could be hours, so not correct in my opinion.

The AWS documentation has lots of good information. Start with Create Amazon EBS snapshots (external site, opens in new tab).

See also:

back to top

If you need help, please contact Pluralsight Support.