Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 89 Current »

Preparations

Before doing anything to the running installation, the config file for the new installation should be prepared by following these steps:

  1. Retrieve the values.yaml file that you have used previously, or if you want to start from scratch, you extract it from the installation by running these commands:

    helm -n <namespace> get all <helm name>
    E.g:
    helm -n uepe get all uepe

    Where uepe is the helm name you have selected for your installation. You will see list similar to the one below.

    helm list
    NAME         	NAMESPACE	REVISION	UPDATED                                 	STATUS  	CHART                             	APP VERSION
    external-dns 	uepe     	1       	2024-05-08 15:27:48.258978207 +0200 CEST	deployed	external-dns-7.2.0                	0.14.1     
    ingress-nginx	uepe     	1       	2024-05-08 16:18:43.919980224 +0200 CEST	deployed	ingress-nginx-4.10.0              	1.10.0     
    uepe         	uepe     	3       	2024-05-10 14:16:17.724426589 +0200 CEST	deployed	usage-engine-private-edition-4.0.0	4.0.0      
  2. Extract the values manually from the output. Copy the lines below “USER-SUPPLIED VALUES:” and stop at the blank line. Save the copied content to the config file valuesFromSystem.yaml.

  3. Update helm repository to get the latest helm chart versions by running the following command.

    helm repo list
    helm repo update
  4. Retrieve the new version from the repository by running the following command. Refer to Release Information for the Helm Chart version.

    helm fetch <repo name>/usage-engine-private-edition --version <version> --untar

    For example:

    helm fetch digitalroute/usage-engine-private-edition --version 4.0.0 --untar
  5. Next, check the file CHANGELOG.md inside the created folder to find out what may have changed in the new version when it comes to the values-file.
    If you are uncertain about how to interpret the content of the file, see below for some examples of keys and how to interpret them:

    The following values have been removed:
    * ```mzOperator.clusterWide```
    * ```mzOperator.experimental.performPeriodicWorkflowCleanup```
    * ```jmx.remote```
    * ```platform.debug.jmx```
    

    means that in the values file they should be entered as:

    mzOperator:
      clusterWide:
      experimental:
        performPeriodicWorkflowCleanup
    jmx:
      remote:
    platform:
      debug:
        jmx:

    Each part of the key does not necessarily follow directly after the previous one, but always before any other “parent” on the same level. So in this example of a values.yaml file:

    debug:
      script:
        enabled: false
      log:
        level:
          codeserver: info
          jetty: 'off'
          others: warn

    an example of a key could be debug.log.level.jetty.

  6. Make any necessary updates based on changed field you may be using in the valuesFromSystem.yaml file you got from the existing installation so it matches the new version.

  7. Take note of any fields that have been deprecated or removed since the last version so any configuration of those fields can be replaced.

  8. When you have updated the valuesFromSystem.yaml file you can test it by running this command:

helm upgrade --install uepe digitalroute/usage-engine-private-edition --atomic --cleanup-on-fail --version 4.0.0 -n uepe -f valuesFromSystem.yaml --dry-run=server

Preparing ECDs

Before you start the actual upgrade, these steps are recommended to avoid issues in processing caused by the restarts during the upgrade:

  1. Disable any batch workflow groups and let any running batch workflows finish their runs.

  2. For real-time workflows, check which types of real-time workflows the ECs are running. If an ECD hosts workflows that allow for scaling and use an ingress for incoming traffic, the ECD will, by default, be upgraded through a rolling upgrade, which means that there will always be at least one workflow running even during the upgrade.

    However, if the real-time workflow does not support scaling, for example, because it uses fixed ports or storage that is not shared, the EC will become unavailable for a certain time during the upgrade. To gain control over when the EC becomes unavailable, you can edit the ECD by setting manualUpgrade to true before the upgrade. With this setting, the ECD will keep running on the old version until the upgrade has been performed and it can then be restarted on the new version in the EC Deployment Interface (4.2).

Example - Editing ECD to Manual Upgrade

Option 1

Run the following command:

kubectl edit ecd ggsn-ecd

And change manualUpgrade to true:

spec:
    .....
    manualUpgrade: true

Option 2

Run the following command:

kubectl patch ecdeployment ggsn-ecd --type=merge -p $'spec:\n  manualUpgrade: true'

When the upgrade is completed, the ECDs can be upgraded by editing the ECD in Desktop Online.

Backup and Database Upgrade

When all the running batch workflows have stopped you should make a backup so that the system can be restored in case of any issues during the upgrade.

Note!

Before proceeding with the backup you must shut down the platform. This is very important since otherwise the backup of the database may become corrupt.

The platform can be shut down in various ways, see examples below.

Examples - Shutting Down the Platform

Option 1

Reduce the number of replicas (under “spec”) to 0 by running the following command:

kubectl edit statefulset platform -n uepe

where uepe is the namespace used.

Option 2

Run this command:

kubectl scale --replicas=0 sts/platform -n uepe

and then this command:

kubectl get pods -n uepe

And ensure that the pod platform-0 is no longer present

Note!

The instructions for backup and upgrade of the database below are only relevant if you are using RDS as platform database. If the platform database used is derby, the backup of the EFS covers the database as well (assuming persistent storage of the platform is enabled).

  1. List the databases and locate the one used for Usage Engine with this command:

    aws rds describe-db-instances --query 'DBInstances[].DBInstanceIdentifier[]'
  2. Perform a backup of the RDS database with this command:

    aws rds create-db-snapshot --db-snapshot-identifier <database backup name> --db-instance-identifier <database instance name>

    for example:

    aws rds create-db-snapshot --db-snapshot-identifier uepe-eks-db-postgresql-backup --db-instance-identifier uepe-eks-db-postgresql
  3. Check if the backup was created successfully by running this command:

    aws rds describe-db-snapshots --snapshot-type manual --db-snapshot-identifier <database backup name>

It is now time to do a backup of the file system used.

Note!

If there are standalone ECs that are still running and writing their logs to the same EFS, whatever happens after the backup has been initiated will not be included in the backup.

To create an EFS backup using the console, see https://docs.aws.amazon.com/aws-backup/latest/devguide/recov-point-create-on-demand-backup.html for instructions.

The section below contains an example of how to run an on-demand backup job using the command line. The snapshot will in this case be stored under the default backup vault.

export EFS_NAME=uepe-eks-efs-disk
export EFS_FILE_SYSTEM_ID=$(aws efs describe-file-systems --query "FileSystems[?Name==\`$EFS_NAME\`].FileSystemId" --output text)
export EFS_ARN=$(aws efs describe-file-systems --query "FileSystems[?Name==\`$EFS_NAME\`].FileSystemArn" --output text)
export VAULT_NAME=Default
export BACKUP_ROLE_ARN=$(aws iam get-role --role-name AWSBackupDefaultServiceRole --query "Role.Arn" --output text)

# Run on demand backup job
aws backup start-backup-job \
--backup-vault-name $VAULT_NAME \
--resource-arn $EFS_ARN \
--iam-role-arn $BACKUP_ROLE_ARN

# View backup job status
aws backup list-backup-jobs --by-resource-type EFS

Upgrade

To perform the actual upgrade you should use the same command as the test command described above minus the --dry-run=server flag, for example like this:

helm upgrade --install uepe digitalroute/usage-engine-private-edition --atomic --cleanup-on-fail --version 4.0.0 -n uepe -f valuesFromSystem.yaml

If the upgrade was successful, the output will look like this:

helm upgrade --install uepe digitalroute/usage-engine-private-edition --atomic --cleanup-on-fail --version 4.0.0 -n uepe -f valuesFromSystem.yaml
Release "uepe" has been upgraded. Happy Helming!
NAME: uepe
LAST DEPLOYED: Fri May 10 15:02:37 2024
NAMESPACE: uepe
STATUS: deployed
REVISION: 4
TEST SUITE: None
NOTES:
Usage Engine Private Edition 4.0.0 has been deployed successfully!
Check out the CHANGELOG.md in this chart for information about what has been changed, added and removed in this version.

Scale up the platform stateful set again so the platform starts back up using the following command:

kubectl scale --replicas=1 sts/platform

After Upgrade

When the Usage Engine installation has been upgraded, ensure that any ECDs supporting rolling upgrade are still running as expected. If there are ECDs that have been configured for manual upgrade before the upgrade, see the section below.

Also ensure to enable any batch workflow groups again so that the batch processing can start again.

Manual Upgrade of ECDs

If you configured any ECDs to manual upgrade before the upgrade, follow these steps to upgrade these ECDs when the regular upgrade is completed:

  1. Login to Desktop Online, see Desktop Online User Interface (4.2).

  2. Go to the EC Deployment Interface (4.2) in the Manage view in Desktop Online.
    You will see a warning symbol next to the relevant ECDs.

  3. Click on the ECD(s) to view the warnings. If there are ECDs that need to be upgraded, you will see a Message saying that it needs to be upgraded for each ECD.

    WarningMessage.png
  4. Go back to the list of ECDs, click on the three dots to far right in the ECD row, and select the Upgrade option in the pop-up menu.

    UpgradeOption.png

Rollback

Rollback procedure only be carried out in case user wants to rollback to the previous version. The following steps are performed in rollback.

  1. Restore database backup

  2. Restore file system snapshot

  3. Rollback Usage Engine Private Edition to pre-upgrade version

Restore database backup

If restoring becomes necessary, you can restore the DB instance from a snapshot backup, see the AWS guide https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_RestoreFromSnapshot.html for more information.

You can also restore a new DB instance using the commands below:

export EXISTING_DB=uepe-eks-db-postgresql
export NEW_DB=uepe-eks-db-postgresql-2
export SNAPSHOT=uepe-eks-db-postgresql-backup
export INSTANCE_CLASS=db.t3.small
export SUBNET_GROUP_NAME=$(aws rds describe-db-instances --query "DBInstances[?DBInstanceIdentifier==\`$EXISTING_DB\`].DBSubnetGroup.DBSubnetGroupName" --output text)
export SECURITY_GROUP_ID=$(aws rds describe-db-instances --query "DBInstances[?DBInstanceIdentifier==\`$EXISTING_DB\`].VpcSecurityGroups[].VpcSecurityGroupId" --output text)

# Restore snapshot to a new database
aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier $NEW_DB \
--db-snapshot-identifier $SNAPSHOT \
--db-instance-class $INSTANCE_CLASS \
--db-subnet-group-name $SUBNET_GROUP_NAME \
--vpc-security-group-ids=$SECURITY_GROUP_ID

# Rename existing DB instance to other name
aws rds modify-db-instance \
--db-instance-identifier $EXISTING_DB \
--new-db-instance-identifier $EXISTING_DB-old \
--apply-immediately

# Rename the new DB instance to use existing identifier name
aws rds modify-db-instance \
--db-instance-identifier $NEW_DB \
--new-db-instance-identifier $EXISTING_DB \
--apply-immediately

If you are using the console to do the RDS restore, remember to include the existing database security group so that it can be accessible by the cluster.

image-20240530-190733.png

Note!

The restored RDS instance is a new database instance and is not managed by Terraform. If you plan to destroy the cluster later, ensure that the new database instance is deleted first. This is necessary because the database instance may still reference to the RDS subnet group.

Restore file system snapshot

To restore EFS, follow the instructions in https://docs.aws.amazon.com/aws-backup/latest/devguide/restore-resource.html and https://repost.aws/knowledge-center/aws-backup-restore-efs-file-system-cli.

The section below contains an example of how to restore the EFS backup using the command line. In this example the volume mount is using access point path /uepe, and the snapshot is stored under default vault, and then the backup is restored to the existing file system.

export EFS_NAME=uepe-eks-efs-disk
export EFS_FILE_SYSTEM_ID=$(aws efs describe-file-systems --query "FileSystems[?Name==\`$EFS_NAME\`].FileSystemId" --output text)
export EFS_ARN=$(aws efs describe-file-systems --query "FileSystems[?Name==\`$EFS_NAME\`].FileSystemArn" --output text)
export VAULT_NAME=Default
export BACKUP_ROLE_ARN=$(aws iam get-role --role-name AWSBackupDefaultServiceRole --query "Role.Arn" --output text)

#################### Retrieve backup ARN id ####################
aws backup list-recovery-points-by-backup-vault --backup-vault-name $VAULT_NAME
# NOTE: Record the RecoveryPointArn that you wish to recover from
# e.g. arn:aws:backup:ap-southeast-1:027763730008:recovery-point:0a82d94c-3d56-481d-98e3-b810d3df363b

# To view the recovery point restore metadata
aws backup get-recovery-point-restore-metadata \
--backup-vault-name $VAULT_NAME \
--recovery-point-arn <RECOVERY_POINT_ARN>

#################### Restore from the backup ####################
# Prerequisites:
# 1) Generate an UUID, "uuidgen" (Mac) or "uuid -r" (Linux)
# 2) Create a metadata json file, properties details are mentioned in
# https://docs.aws.amazon.com/aws-backup/latest/devguide/restoring-efs.html#efs-restore-cli
# NOTE: If newFileSystem=true, file-system-id parameter will be ignored.
# 3) Substitute "CreationToken" value with the generated UUID.
# 4) If existing file system is encrypted, you may use the existing KMS key.
#
# Example metadata json:

cat <<-EOF > /path/to/metadata_json_file
{
  "file-system-id": "$EFS_FILE_SYSTEM_ID",
  "Encrypted": "true",
  "KmsKeyId": "arn:aws:kms:ap-southeast-1:027763730008:key/4859a845-3ef2-464d-80d2-16c1b2c58ff4",
  "PerformanceMode": "generalPurpose",
  "CreationToken": "FEC83B16-F43A-4D5A-A678-2D27FC6C7DBD",
  "newFileSystem": "false"
}
EOF

aws backup start-restore-job --recovery-point-arn <RECOVERY_POINT_ARN> --iam-role-arn "$BACKUP_ROLE_ARN" --metadata file:///path/to/metadata_json_file
watch aws backup list-restore-jobs --by-resource-type EFS

#################### Run a pod with command prompt ####################
kubectl run nfscli --rm --tty -i --restart='Never' --namespace uepe --image oraclelinux:8 --privileged=true --command -- bash

#################### Install NFS client ####################
[root@nfscli /]# yum -y install nfs-utils

#################### Make a folder for mounting purpose ####################
[root@nfscli /]# mkdir -p /mnt/efs

#################### Mount EFS volume root path ####################
# EFS DNS name in format <file-system-id>.efs.<aws-region>.amazonaws.com
[root@nfscli /]# mount -o nolock fs-0a3a60103ae00a5a1.efs.ap-southeast-1.amazonaws.com:/ /mnt/efs

#################### Locate the restored directory ####################
# Go to the mounted directory
[root@nfscli /]# cd /mnt/efs/

# List folders
# NOTE: Existing platform volume mount folder is 'uepe' folder
[root@nfscli efs]# ls -al
total 16
drwxr-xr-x 5 root  root  6144 Aug 13 06:35 .
drwxr-xr-x 1 root  root    18 Aug 14 10:37 ..
drwxr-xr-x 5 root  root  6144 Aug 13 06:35 aws-backup-restore_2024-08-13T17-58-42-978741167Z
drwxr-xr-x 9  6000  6000 6144 Aug 13 18:47 uepe

# The restored data folder which is also called 'uepe', it is located under aws-backup-restore_<timestamp> folder.
[root@nfscli efs]# ls -al aws-backup-restore_2024-08-13T17-58-42-978741167Z/
total 20
drwxr-xr-x 5 root  root  6144 Aug 13 06:35 .
drwxr-xr-x 5 root  root  6144 Aug 13 06:35 ..
drw--w---- 2 root  root  6144 Aug 13 17:58 aws-backup-lost+found_2024-08-13T17-58-13-086602146Z
drwxr-xr-x 2  6000  6000 6144 Aug 13 18:57 uepe

#################### Cleanup existing platform volume mount folder ####################
[root@nfscli efs]# rm -rf uepe/*

#################### Copy restored data to platform volume mount folder ####################
# NOTE: Specify '-p' flag in the cp commmand to preserve file permissions and timestamp.
[root@nfscli efs]# cp -rfp aws-backup-restore_2024-08-13T17-58-42-978741167Z/uepe/* uepe/

# Check if all datas are copied
[root@nfscli efs]# ls -al uepe/
total 48
drwxr-xr-x 9 6000 6000 6144 Aug 13 18:47 .
drwxr-xr-x 5 root root 6144 Aug 13 06:35 ..
drwxr-xr-x 2 6000 6000 6144 Aug 13 06:37 3pp
drwxr-xr-x 2 6000 6000 6144 Aug 13 06:37 backup
drwxr-xr-x 2 6000 6000 6144 Aug 13 06:37 jni
drwxr-xr-x 2 6000 6000 6144 Aug 13 06:37 keys
drwxr-xr-x 5 6000 6000 6144 Aug 13 17:13 log
drwxr-xr-x 3 6000 6000 6144 Aug 13 06:37 pico-cache
drwxr-xr-x 2 6000 6000 6144 Aug 13 06:37 storage

#################### Clean up the redundant restored data ####################
[root@nfscli efs]# rm -rf aws-backup-restore_2024-08-13T17-58-42-978741167Z/uepe/*

#################### Unmount volume and exit pod ####################
[root@nfscli efs]# umount /mnt/efs/
[root@nfscli efs]# exit

#################### Restore completed ####################
# Backup data has been restored, proceed to the next section to rollback UEPE.

Rollback Usage Engine Private Edition to pre-upgrade version

To rollback to pre-upgrade version, check the history to see the revision numbers

helm history uepe -n uepe

Rollback to pre-upgrade version with revision <pre-upgrade-revision-number>

helm rollback uepe <pre-upgrade-revision-number> -n uepe

  • No labels