Disaster Recovery(3.0)

If disaster hits, it might be necessary to restore the system from backup. 

  1. Recreate potential lost IaC managed infrastructure resources and EKS cluster
    An example of this process based on the provided Terraform and Eksctl templates is described in Setup (3.0).
    Time estimate: 1.5 hours
  2. Restore EFS data from AWS Backup, see https://docs.aws.amazon.com/aws-backup/latest/devguide/restoring-efs.html 
    Time estimate: 1 hour
  3. As MemoryDB support cross-availability zone replication by default, it is most often not necessary to recover the database from a snapshot. In case of multi-availability zone failure, accidental deletion or corruption of the database, it could be necessary to recover a database snapshot. This is described in https://docs.aws.amazon.com/memorydb/latest/devguide/snapshots-restoring.html.
    Time estimate: 0-30 minutes
  4. Restore RDS database using:
    1. PITR recovery if available https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIT.html
      Important: The point in time to restore the database from must be older than the oldest snapshot from steps 2 and 3.
    2. If for some reason (multi AZ failure or manual fault) data to perform PITR is not available, the RDS database needs to be recovered from a snapshot. This is described in https://docs.aws.amazon.com/aws-backup/latest/devguide/restoring-rds.html.
      Time estimate: 30 minutes for PITR, 1 hour for snapshot recovery
  5. Reinstalling platform application and configure it to use the recovered RDS database from step 4
    For instructions to manually install the platform application, see Installation (3.0)
    Note! It is recommended to automate platform installation using CI/CD pipelines
    Time estimate: 30 minutes
  6. Re-deploy solution deployment resources
    Solution resources, "ECDs" need to be reinstalled. The recommendation is to store ECDs as Helm charts in version control and deploy them through automated CI/CD pipeline. For a description of the process to setup such CI/CD pipelines, see Install the Example CI/CD Pipeline(3.0)
    Time estimate: 30 minutes

Total estimated time for full recovery (RTO): 5 hours