개발/Kubernetes

Kubernetes AWS EKS 1.30 Update Considerations (ENG)

bitofsky 2024. 6. 21. 08:52

There may be issues arising from changes not mentioned in the Kubernetes 1.30 release notes and the EKS 1.30 release notes.

Particularly, if you are using EKS Managed Node Group, after updating to 1.30, pods may not function correctly due to failures in EKS Addon or AWS Credentials acquisition, or IMDS (Instance Meta Data Service) access failures.


Here is the official AWS EKS 1.30 Update document.

From 1.30, the default AMI image will change from AL2 to AL2023, and a link to a comparison document between AL2 and AL2023 is provided. However, it does not specify exactly what problems might arise due to the image change.

Let's also look at the document Comparing AL2 and AL2023.

It is a long and verbose change note and does not highlight any specific problematic areas. Since this is a base image change for EC2 node hosts, it should not significantly impact existing workloads running as Kubernetes pod images.


Looking at the Security updates of the changes, there is an item called IMDSv2.

To briefly explain what IMDS is, it stands for AWS Instance Metadata Service, which is an AWS service that allows an EC2 instance to access its metadata. This includes information such as the instance's name, instance type, which network VPC it is in, startup scripts, and so on, accessible through a fixed address (http://169.254.169.254/latest/meta-data/).

Thus, it is used to obtain information related to the EC2, and particularly in AWS EKS, addons use IMDS to automatically understand and operate based on various environment information.

One commonly used EKS Addon, the AWS Load Balancer Controller, also operates by querying VPC information through IMDS without separate input as the default installation option.

IMDS has v1 and v2; v1 is always accessible without security measures, while v2 includes added security measures.

In v2, token authentication is used, and it is designed to limit the number of network hops the caller can go through, ensuring it doesn't exceed a certain number of network interfaces.


Let's look at the Security updates - IMDSv2 section of this document.

It states that from the AL2023 image, instances will run with IMDSv2-only, and to support container workloads, the default hop limit is set to 2.

When requesting IMDS from a virtual container environment, it first goes through the virtual host network from the container, reaching the IMDS endpoint, and this step results in a hop of 2. Thus, if the IMDSv2 hop is set to 1, the HTTP request will fail.

Due to this issue, AL2023 changes the default hop setting to 2 to avoid problems on EC2s running containers like EKS.

However, there is an unmentioned trap in this document when an EC2 for EKS nodes starts with a Managed Node Group.

Managed Node Group includes IMDS settings in its Launch Template.

If you haven't modified the Metadata settings yourself, the AWS Default settings will be used.

So you might think there won't be an issue because of the Default setting, but it's a trap. If no AMI and Metadata settings are made in the Launch Template for Managed Node Group, after updating to EKS 1.30, EC2 nodes will start with AMI automatically changed to AL2023 and Metadata version set to IMDSv2 Required.

Up to this point, it follows the AL2023 document, but the trap is the hop. The hop is set to 1. As described above, accessing IMDS from within a container requires hop 2 due to the host network traversal. Thus, AL2023 AMI's default is also changed to hop 2. But EC2 created by Managed Node Group starts with IMDSv2 and hop 1.

You cannot see this hop count in the EC2 console. After upgrading to 1.30, while troubleshooting issues like pods failing to get AWS credentials or the AWS Load Balancer Controller failing to get vpcId or pods continuously restarting, you will realize that the problem is the hop.

Let's see where this problem is being tracked.


First, there is an explanation in the GitHub issue announcing the release of AL2023.

Amazon Linux 2023 (AL2023) is now generally available for MNG, Karpenter, and self-managed nodes

You can also find a statement in the EKS Managed Node Group documentation.

Customizing managed nodes with launch templates - Amazon EKS

There is also an issue with the AWS Load Balancer Controller.

Does not work on Fresh EKS Cluster with Amazon Linux 2023 AMI Type Nodes

There is also a statement in the AL2023 release blog.

Amazon EKS optimized Amazon Linux 2023 AMI now generally available

There is no mention of problems arising from the hop limit with AL2023 in the 1.30 release note, deprecated note, or any major update check points. It's ridiculous that they made even their own EKS Addon fail to work immediately after the update due to the hop limit.

Only Managed Node Group sets hop to 1, while other EC2 instances are created with hop 2 (like Karpenter, etc.).

This means that in an environment where EC2 is launched in various ways, whether the pod works or not depends on who launched the EC2. Hahaha :) Are they idiots?

The solution is simple. The problem is difficult to identify. Just set the hop to 2 in the Launch Template used by the Managed Node Group.