Overview for Karpenter Helm Charts
Kubernetes does great work keeping pods healthy, but it leaves cluster sizing in your hands. The classic answer is the Cluster Autoscaler (CA), but it feels slow and clunky on busy clusters. To address this, AWS wrote Karpenter to cut the wait time from minutes to seconds and trim your EC2 bill along the way.
In this guide you will learn
- what problems Karpenter solves
- how its control loop thinks about nodes
- how to install it with Helm and a tight IAM role (IRSA)
- what each Karpenter Custom Resource does
- common recipes—spot pools, drift handling, and bin‑packing tweaks
- a short hands‑on session to prove it works.
By the end you should be able to drop Karpenter into any EKS cluster and keep it tuned over time.
1 Why Karpenter & Karpenter Helm Charts?
Pain points with the Cluster Autoscaler
- Slow scale‑up – CA waits on the Kubernetes scheduler, then calls the ASG API, then waits again for the node to join. Karpenter asks EC2 directly and shaves that path.
- Fixed node groups – CA can only grow groups you have pre‑sized. Karpenter can launch any EC2 type that matches your rules.
- Poor packing – CA targets a whole group, so you often pay for half‑empty nodes. Karpenter keeps pressure on the bin‑packing logic and removes spare nodes when they are idle.
If fast scale‑up, flexible instance choice, or lower cost shows up on your backlog, Karpenter is worth a test.
2 How does Karpenter work?
When a pod sits Pending, Karpenter:
- Simulates the schedule to see what CPU, memory, and features it needs.
- Picks one or more EC2 instance types that fit all current unscheduled pods.
- Calls the EC2 API (RunInstances) with a bootstrap script that points at your API server.
- Watches the new Node join the cluster.
- Later, if the node is empty, out of date, or can be replaced by a cheaper choice, it drains and deletes it.
Karpenter runs as a standard Kubernetes controller, so you interact with it through Custom Resources and normal kubectl commands.
3 Custom Resources in depth
Karpenter uses three CRDs in the v1 API. Think of them as how to build a node, when to build it, and what actually got built.
3.1 EC2NodeClass — how to build a node
EBS volume size, tags, metadata options, block device mappings.
Example use cases
- GPU workloads – Create a gpu-nodeclass that forces the NVIDIA AMI and a profile with the GPU driver DaemonSet.
- Staging vs prod – Two NodeClasses that point to different subnets and security groups so traffic stays isolated.
- Hardened nodes – Use Bottlerocket for prod, Ubuntu for dev without touching any NodePool.
To define a specific EC2 configuration for a gpu-workload that guarantees extra local storage for large AI models, you could use the following EC2NodeClass:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: gpu-ec2
spec:
amiFamily: AL2
subnetSelector:
env: prod
securityGroupSelector:
env: prod
instanceProfile: karpenter-gpu-profile
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 200Gi
tags:
workload: gpu
3.2 NodePool — when to build and when to tear down
Use cases
- On‑demand baseline – A pool that allows any c/m/r families but only on‑demand.
- Spot cost‑saver – A pool restricted to spot with broad family list and low priority.
- Resource intensive jobs – A pool requiring ≥ 16 vCPU and ≥ 64 Gi so big pods never share small boxes.
- AZ isolation – Separate pools per AZ to guarantee even spread.
To create a node pool for fault-tolerant apps to run on spot instances, you could configure the following NodeClass:
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: spot-burst
spec:
template:
metadata:
labels:
cost-tier: spot
spec:
nodeClassRef:
name: default-ec2
requirements:
- key: karpenter.k8s.aws/capacity-type
operator: In
values: ["spot"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c","m","r"]
idleNodeExpiration: 10m
disruption:
consolidationPolicy: WhenUnderutilized
drift: Enabled
Send eligible workloads here with:
nodeSelector:
cost-tier: spot
3.3 NodeClaim — what actually got built
NodeClaims are ephemeral and created by the controller. They track real EC2 instances. Key status fields:
- status.nodeName – the Kubernetes Node name.
- status.capacity – what the instance ended up with.
- deleting flag – true when the node is draining.
Why you care
- Debugging – When a node sits NotReady, describe the NodeClaim to see EC2 errors (e.g., subnet filters).
- Billing – The karpenter.sh/capacity-type label on the Node maps back to the pool, making kubecost reports cleaner.
You never write NodeClaims by hand.
4 Prerequisites and IAM (Terraform snippet)
Karpenter needs an IAM role with rights to launch and tag EC2 instances, read pricing, and manage ENIs. Bind that role to a ServiceAccount through IRSA to avoid managing AWS Access Keys.
module "karpenter" {
source = "terraform-aws-modules/eks/aws//modules/karpenter"
version = "20.34.0"
cluster_name = var.eks_name
irsa_oidc_provider_arn = module.test-eks.oidc_provider_arn
irsa_namespace_service_accounts = ["karpenter:karpenter"]
queue_managed_sse_enabled = true
iam_role_policies = {
AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}
enable_irsa = true
create_instance_profile = true
iam_role_name = "KarpenterIRSA-${var.eks_name}"
iam_role_description = "Karpenter IAM role for service account"
iam_policy_name = "KarpenterIRSA-${var.eks_name}"
iam_policy_description = "Karpenter IAM policy for service account"
tags = {
Terraform = "true"
}
}
# used as the default node role in the upcoming demo
output "karpenter_role_arn" {
value = module.karpenter.node_iam_role_arn
}
Tag your subnets:
resource "aws_ec2_tag" "karpenter_discovery" {
resource_id = aws_subnet.private.id
key = "karpenter.sh/discovery"
value = var.cluster_name
}
Karpenter ignores subnets without that tag.
5 Installing Karpenter with Helm; Karpenter Helm Charts
Once your IAM Role is provisioned, installing the Karpenter Helm Chart is fairly straightforward. Simply set your environment variables and run the following commands:
helm repo add karpenter <https://charts.karpenter.sh>
helm repo update
helm install karpenter karpenter/karpenter \
--namespace karpenter --create-namespace \
--set settings.clusterName=$CLUSTER_NAME \
--set settings.clusterEndpoint=$CLUSTER_ENDPOINT \
--set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=$KARPENTER_ROLE_ARN
Once the controller pod is Running, you are ready to create NodeClasses and NodePools.
6 Demo
To see Karpenter react, deploy a small NGINX app and label it to run on spot.
# ==== NodeClass =========
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default-ec2
spec:
amiFamily: AL2
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}"
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}"
role: "${KARPENTER_ROLE_ARN}"
---
# ==== Spot Pool =========
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: spot-general
spec:
template:
metadata:
labels:
capacity-type: spot
spot: "true"
spec:
nodeClassRef:
name: default-ec2
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c","m","r"]
---
# ==== Namespace =========
apiVersion: v1
kind: Namespace
metadata:
name: karpenter-demo
labels:
karpenter.sh/discovery: "${CLUSTER_NAME}"
---
# ==== Demo API Deployment =========
apiVersion: apps/v1
kind: Deployment
metadata:
name: demo-api
namespace: karpenter-demo
spec:
replicas: 1
selector:
matchLabels:
app: demo-api
template:
metadata:
labels:
app: demo-api
spec:
containers:
- name: app
image: nginx:latest
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
ports:
- containerPort: 80
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 3
periodSeconds: 5
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "karpenter.sh/capacity-type"
operator: In
values:
- "spot"
---
apiVersion: v1
kind: Service
metadata:
name: demo-api
namespace: karpenter-demo
spec:
type: ClusterIP
selector:
app: demo-api
ports:
- port: 80
targetPort: 80
Apply the manifest:
kubectl apply -f demo-app.yaml
What to watch
- Node launch time – run kubectl get nodes -l spot=true -w. You should see a new node in < 90 s.
- Consolidation – delete the deployment kubectl delete deployment demo-api -n karpenter-demo, wait ~30 minutes, and watch Karpenter consolidate idle nodes.
When all three steps behave, your IAM, subnet tags, NodeClass, and NodePools are wired correctly.
7 Best practices in one table
8 Observability and quick troubleshooting beyond just Karpenter Helm Charts
Metrics to scrape
- karpenter_nodes_launched_total
- karpenter_nodes_terminated_total
- karpenter_nodeclaims_disrupted_total{reason="drift"|"consolidation"}
Handy kubectl commands
# Watch pending pods
kubectl get pods --all-namespaces --field-selector=status.phase=Pending -w
# List NodePools
kubectl get nodepools
# Inspect a stuck NodeClaim
kubectl describe nodeclaim <name>
Common errors
9 Wrap‑up and next steps; Kubernetes, Karpenter, and Karpenter Helm Charts
Karpenter brings faster scale‑up and lower cost to EKS with three building blocks:
- An IRSA role with EC2 rights.
- A Helm install.
- NodeClasses + NodePools tuned to your workloads.
Start with the YAML in sections 6 and 9, widen instance filters as you grow, and enable consolidation + drift everywhere. Your cluster will add nodes only when needed and cut them when idle—no midnight pager for scaling ever again.
Further reading
- Docs: https://karpenter.sh/docs/
- IAM policy: https://karpenter.sh/docs/reference/cloudformation/
- Upgrade guide: https://karpenter.sh/v0.32/upgrading/v1beta1-migration/