Saving and backing-up your work is one of the most basic best practices in information technology. As systems have become increasingly intertwined and complex, backups have gained a vital importance. With so many things that can go wrong, the chance you’ll need those backups at some point is quite high.
When it comes to the cloud, there are numerous events that could endanger the integrity of your infrastructure – making it vitally important to have a solid strategy that ensures your continuity and resilience if something bad happens.
From cyberattacks to natural disasters – you must be prepared for any (and all) potential disruptions. Backing-up your data and infrastructure, and having a plan to resurrect your cloud is essential.
Backup vs. Cloud Disaster Recovery – what’s the difference?
It’s important to note that backups are different to disaster recovery – but there’s a clear overlap.
Backups typically only save a copy of your data and certain resources to a secure location, whereas cloud disaster recovery is a complete solution (including backups for all resources, alerts, and failovers) that can bring all of your cloud back online within a defined timeframe.
This means that you can roll-back to a previous point-in-time (PIT) should a ransomware attack occur, or to the most recent backed-up state of your systems.
How to make backups with AWS Backup
It’s quite straightforward to make backups with AWS Backup, which is found in the AWS Management Console. From here, you can use the settings to define whether you want to backup based on tags, or all your (supported) resources.
When you start with AWS Backup, you’ll need to make an initial on-demand backup by going to the AWS Backup console, and then selecting ‘Create on demand backup’ from ‘Protected resources’ in the navigation pane. This will take you to the ‘Create on demand backup’ page, where you can select the resources you want to back-up and select the ‘Create backup now’ option.
You can also specify lifecycle rules, IAM roles, cold/warm storage, and backup locations from the AWS Backup console. Lifecycle rules and storage locations can make a significant difference to the cost of backups, as you can migrate older data to AWS Glacier storage and keep S3 storage for the most recent stuff.
Once your first on demand backup is complete, the backed-up resources will appear on the ‘Protected resources’ page.
With the AWS Backup console, you can further specify scheduled backups and create a backup plan using templates.
And if you have an Infrastructure-as-Code setup, we have an open-source terraform module to setup AWS backup.
Disaster strikes! Now what?
If your cloud went down, how would you know?
This is the first step to consider when setting up your backup or cloud recovery plan. You must be aware as soon as there’s a problem, so it’s important to set up an alert with AWS CloudWatch that informs you as soon as something is amiss with your workloads or if instances can’t be reached.
The next step is recovery of your data or full cloud. There are many sides to this, and a full Cloud Disaster Recovery Team should be assembled, with clear roles. This involves a lot of additional activities, including communication and liaising with legal counsel – but you can read more about that here.
For now, let’s just focus on the technical side of recovery and how to restore your cloud from backups.
Restoring a backup from AWS Backup
You can quickly restore a Backup manually in the AWS Backup console. For individual resources, you just need to find the resources you want to restore, select a recovery point ID for each resource, and select Restore.
For an EFS instance, you can select a ‘Full restore’ instead, and this will restore the entire file system from your Backup.
AWS Backup is an economical option that allows you to easily restore from backups for data stored in S3, Virtual Machines, certain file systems, databases, clusters, and EC2 instances.
However, AWS Backup requires a piecemeal approach and may not cover all your resources. For a more complete Backup and Restore solution, AWS Elastic Disaster Recovery (DRS) is a better option.
Making a backup with AWS Elastic Disaster Recovery
In many cases, the easiest option is to automate backup as well as the failover and failback processes in AWS Elastic Disaster Recovery (DRS). This isn’t always possible for all resources however, such as EKS clusters. For this, something like Velero from VMWare can fill the gap.
Once it’s properly set-up, AWS Elastic Disaster Recovery (DRS) can automatically replicate your applications, files, networks, EC2 instances, OS, databases, and system state configuration to a subnet that you specify – ideally located in a separate AWS Region to the source server. This means that it’s ready to failover and spin-up a full version of your cloud from a recent point-in-time, when needed.
Using this solution, applications can be restored within minutes, and can be fail-backed to the source server as soon as you’re ready.
How restoring a backup is different with AWS Elastic Disaster Recovery
The best part of AWS Elastic Disaster Recovery is that it can easily launch recovery from the console. This is controlled with the launch settings in DRS.
There are two components to the launch settings: DRS launch settings, and EC2 launch templates. These are configured individually for each source server to be restored.
DRS launch settings allow you to:
- Define which subnets are to be used for EC2 recovery instances
- Create security groups in these subnets, and define them in the EC2 launch template
- Note: if you don’t define these, default values/subnets will be assigned
While the settings give you fine-grained control over the recovery instance behavior, the templates ensure that tests and recovery instances comply with the latest security best practices and available technologies.
As soon as you add a source server to DRS, it will create an EC2 launch template and launch settings, with default values.
To adjust the recovery launch settings, you need to head to the ‘Source servers’ page in the AWS DRS console. Then you just need to click on the required source server, and open the ‘Launch settings’ tab on the far right – or select the source server and then select ‘Edit launch settings’ from the drop-down ‘Actions’ menu.
From here you can manage recovery and failback instances, and select options for things like server tags, private IP, automated instance behaviors, and licensing.
EC2 templates can also be customized from the ‘Launch settings’ tab in the ‘Source servers’ page, or by selecting the source server, and then ‘Edit EC2 launch settings’ from the drop-down menu. These templates allow you to customize which subnet is used, security groups, and instance types, as well as advanced settings for things like IAM, public IP, and tenancy.
In many cases, you may not need to deviate from the default settings – but this will depend on your specific needs.
Launching recovery, failover, and failback in AWS DRS
To launch a recovery instance, you need to head to the ‘Source servers’ page in the AWS DRS Console again. It should indicate here that the source server is ready to recover, and that the data replication is ‘healthy.’
Next, you need to select the source server/s you want to restore and select ‘Initiate recovery’ from the ‘Initiate recovery job’ menu at the top right of the page. You will need to define a point-in-time (PIT) to restore from or select the most recent data.
The option you choose will depend on what disaster you’re dealing with. If something like a cyberattack or ransomware is the issue, then you’ll want to select a PIT that mitigates the problem instead of re-creating it.
It’s also a good idea to test your recovery plan by conducting a drill first.
Time to failback?
Provided you’ve solved the issue with your source server, you’ll want to failback as soon as possible. To do this you need to install the AWS DRS Failback Client on your (non-AWS) source server, and generate failback credentials for each recovery. The Failback Client makes sure that your source server picks up where the disaster recovery left off.
Failing-back to another AWS Region is somewhat easier than on-prem or non-AWS source servers, and can be managed from the ‘Recovery instances’ page in the AWS DRS Console.
However, before failing-back to the source server you may want to replicate the recovered instances (by selecting ‘Start reversed replication’) as soon as you can.
As soon as the Reversed direction launch state shows as ‘Ready,’ then you can select the source server in the ‘Source servers’ page, and click ‘Launch for failback’ in the ‘Initiate recovery job’ menu. Once you’ve failed-back to the source server, you should select ‘Start reversed replication’ again to protect the restored instances. Traffic redirection will need to be managed separately and you’ll still need to clean up unnecessary resources, but that’s about it.
Backups are part of a complete plan
As important as Backups are, these can only have real value when they’re part of a cohesive cloud Disaster Recovery Plan, executed by a team that knows their roles and tasks - and how to complete them without panicking!
For many situations, AWS Elastic Disaster Recovery offers the ideal balance of convenience, cost, and speed of recovery.
However, it may not match your needs if you require instant recovery or if your resources aren’t supported by this. Also, you might not need it - in which case AWS Backup could be enough.
If you’re unsure, it’s always better to seek advice about the best architecture and solutions for your growing cloud. Contact us.