Skip to main content
Version: Next

Install EKS and Related Infrastructure and Resources


Contents

  1. Prerequisites
  2. Create EKS Cluster
  3. Configure Cluster Autoscaler
  4. Install Amazon EBS CSI Driver
  5. Install Amazon EFS CSI Driver
  6. Create NodeGroup


Detailed Steps

For detailed explanations of the {variables}, please refer to the Terminology page.


1. Prerequisites

Ensure the environment setup is complete. Refer to 1. Setup Deployment Environment.



2. Create EKS Cluster

  • Use the eksctl command to create a cluster.

  • The following command will automatically create an EKS Cluster and VPC.

    • zones: Set the VPC subnet to a and c.
    • vpc-cidr: Set the IP range for the VPC.
    eksctl create cluster \
    --name ${AWS_CLUSTER_NAME} \
    --version ${AWS_CLUSTER_VERSION} \
    --region ${AWS_DEFAULT_REGION} \
    --zones ${AWS_DEFAULT_REGION}a,${AWS_DEFAULT_REGION}c \
    --vpc-cidr 10.0.0.0/16 \
    --without-nodegroup \
    --managed \
    --with-oidc
  • After creation, check the value of {AWS_VPC_NAME}.

    • {AWS_VPC_NAME} = eksctl-eks-{AWS_DEFAULT_REGION_ALIAS}-{CLUSTER_NAME}-{DEPLOY_ENV}-{AWS_CLUSTER_VERSION_STR}-eks-master-cluster/VPC
    # Use the command below to find the "VALUE" of "Key": "Name" as ${AWS_VPC_NAME}.
    aws ec2 describe-vpcs --query "Vpcs[-2].Tags"
    export AWS_VPC_NAME=


3. Configure Cluster AutoScaler

Reference Page : https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler

  • Create IAM policy and Service Account for Cluster AutoScaler using the command below.
    cat <<EOT > policy-cluster-autoscaler.json
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "VisualEditor0",
    "Effect": "Allow",
    "Action": [
    "autoscaling:SetDesiredCapacity",
    "autoscaling:TerminateInstanceInAutoScalingGroup"
    ],
    "Resource": "*",
    "Condition": {
    "StringEquals": {
    "aws:ResourceTag/k8s.io/cluster-autoscaler/$AWS_CLUSTER_NAME": "owned"
    }
    }
    },
    {
    "Sid": "VisualEditor1",
    "Effect": "Allow",
    "Action": [
    "autoscaling:DescribeAutoScalingInstances",
    "autoscaling:DescribeAutoScalingGroups",
    "ec2:DescribeLaunchTemplateVersions",
    "autoscaling:DescribeTags",
    "autoscaling:DescribeLaunchConfigurations",
    "ec2:DescribeInstanceTypes"
    ],
    "Resource": "*"
    }
    ]
    }
    EOT
    aws iam create-policy --policy-name policy-cluster-autoscaler --policy-document file://policy-cluster-autoscaler.json
    eksctl create iamserviceaccount \
    --cluster=$AWS_CLUSTER_NAME \
    --region=$AWS_DEFAULT_REGION \
    --namespace=kube-system \
    --name=cluster-autoscaler \
    --attach-policy-arn=arn:aws:iam::$AWS_ACCOUNT_ID:policy/policy-cluster-autoscaler \
    --role-name=role-cluster-autoscaler \
    --override-existing-serviceaccounts \
    --approve
  • Create Cluster AutoScaler.
    curl -O https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml

    cp cluster-autoscaler-autodiscover.yaml cluster-autoscaler.yaml
    export CLUSTER_AUTOSCALER_VERSION=$(curl -sL https://registry.k8s.io/v2/autoscaling/cluster-autoscaler/tags/list | grep -o '"tags":\[.*\]' | sed 's/"tags":\[//;s/\]//' | tr ',' '\n' | tr -d '"' | grep "$AWS_CLUSTER_VERSION" | sort -V | tail -n 1)
    yq e 'select(documentIndex == 5)' cluster-autoscaler.yaml > deployment.yaml

    yq e -i '.spec.template.metadata.annotations."cluster-autoscaler.kubernetes.io/safe-to-evict" = "false"' deployment.yaml

    yq e -i '.spec.template.spec.containers[] |= select(.name == "cluster-autoscaler").image = "registry.k8s.io/autoscaling/cluster-autoscaler:" + env(CLUSTER_AUTOSCALER_VERSION)' deployment.yaml

    yq e -i '
    .spec.template.spec.containers[] |=
    select(.name == "cluster-autoscaler").command =
    (.command | map(select(. != "--node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/<YOUR CLUSTER NAME>")) +
    ["--node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/" + env(AWS_CLUSTER_NAME),
    "--scale-down-unneeded-time=10m",
    "--scale-down-utilization-threshold=0.7"])' deployment.yaml

    yq eval 'select(documentIndex != 5)' cluster-autoscaler.yaml > temp.yaml
    echo "---" >> temp.yaml
    cat temp.yaml deployment.yaml > cluster-autoscaler.yaml
    rm temp.yaml deployment.yaml
    kubectl apply -f cluster-autoscaler.yaml


4. Install Amazon EBS CSI Driver

Reference page: https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html

  • Set the following commands.

    export EBS_CSI_SA_ROLE_NAME=role-ebs-csidriver-${AWS_CLUSTER_NAME}
    export EBS_CSI_SA_ROLE_ARN=arn:aws:iam::${AWS_ACCOUNT_ID}:role/${EBS_CSI_SA_ROLE_NAME}
  • Verify that OIDC is configured for the cluster.

    oidc_id=$(aws eks describe-cluster --name ${AWS_CLUSTER_NAME} --query "cluster.identity.oidc.issuer" --output text | cut -d '/' -f 5)
    echo $oidc_id
  • Create the Service Account IAM to manage the EBS CSI Driver.

    eksctl create iamserviceaccount \
    --name ebs-csi-controller-sa \
    --namespace kube-system \
    --cluster ${AWS_CLUSTER_NAME} \
    --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
    --approve \
    --role-only \
    --role-name ${EBS_CSI_SA_ROLE_NAME}
  • Create the Amazon EBS CSI add-on.

    eksctl create addon --name aws-ebs-csi-driver --cluster ${AWS_CLUSTER_NAME} --service-account-role-arn ${EBS_CSI_SA_ROLE_ARN} --force
  • Verify that the ebs-csi-controller is created.

    kubectl get deploy -n kube-system

    Example output:

    NAME                 READY   UP-TO-DATE   AVAILABLE   AGE
    ebs-csi-controller 2/2 2 2 30s


5. Install Amazon EFS CSI Driver (Skip if Edge Conductor is installed on-premise)

Reference page: https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html

  • Set the following commands.

    export EFS_CSI_SA_ROLE_NAME=role-efs-csidriver-${AWS_CLUSTER_NAME}
    export EFS_CSI_SA_ROLE_ARN=arn:aws:iam::${AWS_ACCOUNT_ID}:role/${EFS_CSI_SA_ROLE_NAME}
  • Create the Service Account IAM to manage the EFS CSI Driver.

    eksctl create iamserviceaccount \
    --name efs-csi-controller-sa \
    --namespace kube-system \
    --cluster ${AWS_CLUSTER_NAME} \
    --role-name ${EFS_CSI_SA_ROLE_NAME} \
    --role-only \
    --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEFSCSIDriverPolicy \
    --override-existing-serviceaccounts \
    --approve
  • Verify the EFS Trust policy and update the IAM policy.

    TRUST_POLICY=$(aws iam get-role --role-name ${EFS_CSI_SA_ROLE_NAME} --query 'Role.AssumeRolePolicyDocument' | \
    sed -e 's/efs-csi-controller-sa/efs-csi-*/' -e 's/StringEquals/StringLike/')
    aws iam update-assume-role-policy --role-name ${EFS_CSI_SA_ROLE_NAME} --policy-document "${TRUST_POLICY}"
  • Create the Amazon EFS CSI add-on.

    eksctl create addon --name aws-efs-csi-driver --cluster ${AWS_CLUSTER_NAME} --service-account-role-arn ${EFS_CSI_SA_ROLE_ARN} --force
  • Verify that the efs-csi-controller is created.

    kubectl get deploy -n kube-system

    Example output:

    NAME                 READY   UP-TO-DATE   AVAILABLE   AGE
    efs-csi-controller 2/2 2 2 30s


6. Create NodeGroup

  • Add a node group for operating AI Conductor to the created cluster.

    • For a detailed guide on creating node groups, refer to AI Conductor Manage NodeGroup.
    • Create the nodegroup-aicond.yaml file defining the creation of the node group.
    [Expand nodegroup-aicond.yaml]
    cat <<EOT > nodegroup-aicond.yaml
    apiVersion: eksctl.io/v1alpha5
    kind: ClusterConfig

    metadata:
    name: ${AWS_CLUSTER_NAME}
    region: ${AWS_DEFAULT_REGION}
    availabilityZones: ["${AWS_DEFAULT_REGION}a", "${AWS_DEFAULT_REGION}c"]
    managedNodeGroups:
    - name: 'ng-${AWS_DEFAULT_REGION_ALIAS}-aicond-${INFRA_NAME}-controller'
    instanceType: m5.2xlarge
    # autoscaling
    minSize: 1
    maxSize: 3
    desiredCapacity: 2
    # volume
    volumeType: gp2
    volumeSize: 200
    labels: {aic-role: controller}
    propagateASGTags: true
    iam:
    withAddonPolicies:
    autoScaler: true
    EOT
    • Add the node group to the cluster.
      eksctl create nodegroup --config-file=nodegroup-aicond.yaml
  • Skip this step if Edge Conductor is installed on-premise.

    • Add a node group for operating Edge Conductor to the created cluster.

      • Create the nodegroup-edgecond.yaml file defining the creation of the node group.
      [Expand nodegroup-edgecond.yaml]
      cat <<EOT > nodegroup-edgecond.yaml
      apiVersion: eksctl.io/v1alpha5
      kind: ClusterConfig

      metadata:
      name: ${AWS_CLUSTER_NAME}
      region: ${AWS_DEFAULT_REGION}
      availabilityZones: ["${AWS_DEFAULT_REGION}a", "${AWS_DEFAULT_REGION}c"]
      managedNodeGroups:
      - name: 'ng-${AWS_DEFAULT_REGION_ALIAS}-edgecond-${INFRA_NAME}-controller'
      instanceType: m5.2xlarge
      # autoscaling
      minSize: 1
      maxSize: 3
      desiredCapacity: 2
      # volume
      volumeType: gp2
      volumeSize: 200
      labels: {edgecond-role: 'ng-${AWS_DEFAULT_REGION_ALIAS}-edgecond-${INFRA_NAME}'}
      propagateASGTags: true
      iam:
      withAddonPolicies:
      autoScaler: true
      EOT
      • Add the node group to the cluster.
        eksctl create nodegroup --config-file=nodegroup-edgecond.yaml