Growth

Karpenter Helm Chart

May 9, 2025
Christopher Fellowes
Neel Punatar

Overview for Karpenter Helm Charts

Kubernetes does great work keeping pods healthy, but it leaves cluster sizing in your hands. The classic answer is the Cluster Autoscaler (CA), but it feels slow and clunky on busy clusters. To address this, AWS wrote Karpenter to cut the wait time from minutes to seconds and trim your EC2 bill along the way.

In this guide you will learn

  • what problems Karpenter solves
  • how its control loop thinks about nodes
  • how to install it with Helm and a tight IAM role (IRSA)
  • what each Karpenter Custom Resource does
  • common recipes—spot pools, drift handling, and bin‑packing tweaks
  • a short hands‑on session to prove it works.

By the end you should be able to drop Karpenter into any EKS cluster and keep it tuned over time.

1  Why Karpenter & Karpenter Helm Charts?

Pain points with the Cluster Autoscaler

  • Slow scale‑up – CA waits on the Kubernetes scheduler, then calls the ASG API, then waits again for the node to join. Karpenter asks EC2 directly and shaves that path.
  • Fixed node groups – CA can only grow groups you have pre‑sized. Karpenter can launch any EC2 type that matches your rules.
  • Poor packing – CA targets a whole group, so you often pay for half‑empty nodes. Karpenter keeps pressure on the bin‑packing logic and removes spare nodes when they are idle.

If fast scale‑up, flexible instance choice, or lower cost shows up on your backlog, Karpenter is worth a test.

2  How does Karpenter work?

When a pod sits Pending, Karpenter:

  1. Simulates the schedule to see what CPU, memory, and features it needs.
  2. Picks one or more EC2 instance types that fit all current unscheduled pods.
  3. Calls the EC2 API (RunInstances) with a bootstrap script that points at your API server.
  4. Watches the new Node join the cluster.
  5. Later, if the node is empty, out of date, or can be replaced by a cheaper choice, it drains and deletes it.

Karpenter runs as a standard Kubernetes controller, so you interact with it through Custom Resources and normal kubectl commands.

3  Custom Resources in depth

Karpenter uses three CRDs in the v1 API. Think of them as how to build a nodewhen to build it, and what actually got built.

3.1  EC2NodeClass — how to build a node

Field Why it matters
amiFamily Pick from Bottlerocket, AL2, Ubuntu, Windows. Choice drives disk size and user‑data.
subnetSelector / securityGroupSelector Karpenter picks subnets/SGs that match the tag filters.
instanceProfile IAM instance profile that gives nodes cloud permissions.
Extra knobs EBS volume size, tags, metadata options, block device mappings.

EBS volume size, tags, metadata options, block device mappings.

Example use cases

  1. GPU workloads – Create a gpu-nodeclass that forces the NVIDIA AMI and a profile with the GPU driver DaemonSet.
  2. Staging vs prod – Two NodeClasses that point to different subnets and security groups so traffic stays isolated.
  3. Hardened nodes – Use Bottlerocket for prod, Ubuntu for dev without touching any NodePool.

To define a specific EC2 configuration for a gpu-workload that guarantees extra local storage for large AI models, you could use the following EC2NodeClass:

apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
 name: gpu-ec2
spec:
 amiFamily: AL2
 subnetSelector:
   env: prod
 securityGroupSelector:
   env: prod
 instanceProfile: karpenter-gpu-profile
 blockDeviceMappings:
   - deviceName: /dev/xvda
     ebs:
       volumeSize: 200Gi
 tags:
   workload: gpu

3.2  NodePool — when to build and when to tear down

Field Why it matters
amiFamily Pick from Bottlerocket, AL2, Ubuntu, Windows. Choice drives disk size and user‑data.
subnetSelector / securityGroupSelector Karpenter picks subnets/SGs that match the tag filters.
instanceProfile IAM instance profile that gives nodes cloud permissions.
Extra knobs EBS volume size, tags, metadata options, block device mappings.

Use cases

  1. On‑demand baseline – A pool that allows any c/m/r families but only on‑demand.
  2. Spot cost‑saver – A pool restricted to spot with broad family list and low priority.
  3. Resource intensive jobs – A pool requiring ≥ 16 vCPU and ≥ 64 Gi so big pods never share small boxes.
  4. AZ isolation – Separate pools per AZ to guarantee even spread.


To create a node pool for fault-tolerant apps to run on spot instances, you could configure the following NodeClass:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
 name: spot-burst
spec:
 template:
   metadata:
     labels:
       cost-tier: spot
   spec:
     nodeClassRef:
       name: default-ec2
     requirements:
       - key: karpenter.k8s.aws/capacity-type
         operator: In
         values: ["spot"]
       - key: karpenter.k8s.aws/instance-category
         operator: In
         values: ["c","m","r"]
 idleNodeExpiration: 10m
 disruption:
   consolidationPolicy: WhenUnderutilized
   drift: Enabled

Send eligible workloads here with:

nodeSelector:
 cost-tier: spot

3.3  NodeClaim — what actually got built 

NodeClaims are ephemeral and created by the controller. They track real EC2 instances. Key status fields:

  • status.nodeName – the Kubernetes Node name.
  • status.capacity – what the instance ended up with.
  • deleting flag – true when the node is draining.

Why you care

  • Debugging – When a node sits NotReady, describe the NodeClaim to see EC2 errors (e.g., subnet filters).
  • Billing – The karpenter.sh/capacity-type label on the Node maps back to the pool, making kubecost reports cleaner.

You never write NodeClaims by hand.

4  Prerequisites and IAM (Terraform snippet)

Karpenter needs an IAM role with rights to launch and tag EC2 instances, read pricing, and manage ENIs. Bind that role to a ServiceAccount through IRSA to avoid managing AWS Access Keys.

module "karpenter" {
 source = "terraform-aws-modules/eks/aws//modules/karpenter"
 version = "20.34.0"

 cluster_name = var.eks_name

 irsa_oidc_provider_arn          = module.test-eks.oidc_provider_arn
 irsa_namespace_service_accounts = ["karpenter:karpenter"]
 queue_managed_sse_enabled = true
 iam_role_policies = {
   AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
 }

 enable_irsa             = true
 create_instance_profile = true
 iam_role_name          = "KarpenterIRSA-${var.eks_name}"
 iam_role_description   = "Karpenter IAM role for service account"
 iam_policy_name        = "KarpenterIRSA-${var.eks_name}"
 iam_policy_description = "Karpenter IAM policy for service account"

 tags = {
   Terraform   = "true"
 }
}

# used as the default node role in the upcoming demo
output "karpenter_role_arn" {
 value =  module.karpenter.node_iam_role_arn
}

Tag your subnets:

resource "aws_ec2_tag" "karpenter_discovery" {
 resource_id = aws_subnet.private.id
 key         = "karpenter.sh/discovery"
 value       = var.cluster_name
}

Karpenter ignores subnets without that tag.

5  Installing Karpenter with Helm; Karpenter Helm Charts

Once your IAM Role is provisioned, installing the Karpenter Helm Chart is fairly straightforward. Simply set your environment variables and run the following commands:

helm repo add karpenter <https://charts.karpenter.sh>
helm repo update

helm install karpenter karpenter/karpenter \
 --namespace karpenter --create-namespace \
 --set settings.clusterName=$CLUSTER_NAME \
 --set settings.clusterEndpoint=$CLUSTER_ENDPOINT \
 --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=$KARPENTER_ROLE_ARN

Once the controller pod is Running, you are ready to create NodeClasses and NodePools.

6 Demo

To see Karpenter react, deploy a small NGINX app and label it to run on spot.

# ==== NodeClass =========
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
 name: default-ec2
spec:
 amiFamily: AL2
 subnetSelectorTerms:
   - tags:
       karpenter.sh/discovery: "${CLUSTER_NAME}"
 securityGroupSelectorTerms:
   - tags:
       karpenter.sh/discovery: "${CLUSTER_NAME}"
 role: "${KARPENTER_ROLE_ARN}"
---
# ==== Spot Pool =========
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
 name: spot-general
spec:
 template:
   metadata:
     labels:
       capacity-type: spot
       spot: "true"
   spec:
     nodeClassRef:
       name: default-ec2
     requirements:
       - key: karpenter.sh/capacity-type
         operator: In
         values: ["spot"]
       - key: karpenter.k8s.aws/instance-category
         operator: In
         values: ["c","m","r"]
---
# ==== Namespace =========
apiVersion: v1
kind: Namespace
metadata:
 name: karpenter-demo
 labels:
   karpenter.sh/discovery: "${CLUSTER_NAME}"
---
# ==== Demo API Deployment =========
apiVersion: apps/v1
kind: Deployment
metadata:
 name: demo-api
 namespace: karpenter-demo
spec:
 replicas: 1
 selector:
   matchLabels:
     app: demo-api
 template:
   metadata:
     labels:
       app: demo-api
   spec:
     containers:
       - name: app
         image: nginx:latest
         resources:
           requests:
             cpu: "250m"
             memory: "256Mi"
           limits:
             cpu: "500m"
             memory: "512Mi"
         ports:
           - containerPort: 80
         readinessProbe:
           httpGet:
             path: /
             port: 80
           initialDelaySeconds: 3
           periodSeconds: 5
     affinity:
       nodeAffinity:
         requiredDuringSchedulingIgnoredDuringExecution:
           nodeSelectorTerms:
             - matchExpressions:
                 - key: "karpenter.sh/capacity-type"
                   operator: In
                   values:
                     - "spot"
---
apiVersion: v1
kind: Service
metadata:
 name: demo-api
 namespace: karpenter-demo
spec:
 type: ClusterIP
 selector:
   app: demo-api
 ports:
   - port: 80
     targetPort: 80

Apply the manifest:

kubectl apply -f demo-app.yaml

What to watch

  1. Node launch time – run kubectl get nodes -l spot=true -w. You should see a new node in < 90 s.
  2. Consolidation – delete the deployment kubectl delete deployment demo-api -n karpenter-demo, wait ~30 minutes, and watch Karpenter consolidate idle nodes.

When all three steps behave, your IAM, subnet tags, NodeClass, and NodePools are wired correctly.

7  Best practices in one table

Aim Setting Why
Remove waste consolidationPolicy: WhenUnderutilized on each NodePool Karpenter deletes half‑empty nodes once per hour
Keep AMIs fresh drift: Enabled (default) Out‑of‑date or mis‑tagged nodes are replaced
Spread across AZs Tag at least three subnets per cluster Avoids single‑AZ impact
Protect quota Raise Running On‑Demand and Spot quotas before launch Prevents InstanceLimitExceeded
Speedy scale‑up Keep one warm node: idleNodeExpiration: 10m Adds spare capacity for bursty apps
Secure access One NodeClass per IAM instance profile Limits blast radius if a node is compromised
Spot safety Graceful shutdown: terminationGracePeriodSeconds: 120 on pods Lets pods finish work on 2‑minute spot notice

8  Observability and quick troubleshooting beyond just Karpenter Helm Charts

Metrics to scrape

  • karpenter_nodes_launched_total
  • karpenter_nodes_terminated_total
  • karpenter_nodeclaims_disrupted_total{reason="drift"|"consolidation"}

Handy kubectl commands

# Watch pending pods
kubectl get pods --all-namespaces --field-selector=status.phase=Pending -w
# List NodePools
kubectl get nodepools
# Inspect a stuck NodeClaim
kubectl describe nodeclaim <name>

Common errors

       

Log message Fix
AccessDenied ec2:RunInstances IRSA role missing ec2:*Instances rights
Node joins but stays NotReady Check bootstrap script; verify API server endpoint and STS VPC endpoints
Pods remain Pending even after scale-up Requirements too strict (e.g., GPU label on CPU pool)
InsufficientInstanceCapacity Allow more instance categories or fall back to on-demand

9  Wrap‑up and next steps; Kubernetes, Karpenter, and Karpenter Helm Charts

Karpenter brings faster scale‑up and lower cost to EKS with three building blocks:

  1. An IRSA role with EC2 rights.
  2. A Helm install.
  3. NodeClasses + NodePools tuned to your workloads.

Start with the YAML in sections 6 and 9, widen instance filters as you grow, and enable consolidation + drift everywhere. Your cluster will add nodes only when needed and cut them when idle—no midnight pager for scaling ever again.

Further reading

Christopher Fellowes
Software Engineer @ Kapstan. Chris is passionate about all things Kubernetes, with an emphasis on observability and security. In his free time, he is an avid rock climber.

Simplify your DevEx with a single platform

Schedule a demo