Overview for Karpenter Helm Charts

Kubernetes does great work keeping pods healthy, but it leaves cluster sizing in your hands. The classic answer is the Cluster Autoscaler (CA), but it feels slow and clunky on busy clusters. To address this, AWS wrote Karpenter to cut the wait time from minutes to seconds and trim your EC2 bill along the way.

In this guide you will learn

what problems Karpenter solves
how its control loop thinks about nodes
how to install it with Helm and a tight IAM role (IRSA)
what each Karpenter Custom Resource does
common recipes—spot pools, drift handling, and bin‑packing tweaks
a short hands‑on session to prove it works.

By the end you should be able to drop Karpenter into any EKS cluster and keep it tuned over time.

1 Why Karpenter & Karpenter Helm Charts?

Pain points with the Cluster Autoscaler

Slow scale‑up – CA waits on the Kubernetes scheduler, then calls the ASG API, then waits again for the node to join. Karpenter asks EC2 directly and shaves that path.
Fixed node groups – CA can only grow groups you have pre‑sized. Karpenter can launch any EC2 type that matches your rules.
Poor packing – CA targets a whole group, so you often pay for half‑empty nodes. Karpenter keeps pressure on the bin‑packing logic and removes spare nodes when they are idle.

If fast scale‑up, flexible instance choice, or lower cost shows up on your backlog, Karpenter is worth a test.

2 How does Karpenter work?

When a pod sits Pending, Karpenter:

Simulates the schedule to see what CPU, memory, and features it needs.
Picks one or more EC2 instance types that fit all current unscheduled pods.
Calls the EC2 API (RunInstances) with a bootstrap script that points at your API server.
Watches the new Node join the cluster.
Later, if the node is empty, out of date, or can be replaced by a cheaper choice, it drains and deletes it.

Karpenter runs as a standard Kubernetes controller, so you interact with it through Custom Resources and normal kubectl commands.

3 Custom Resources in depth

Karpenter uses three CRDs in the v1 API. Think of them as how to build a node, when to build it, and what actually got built.

3.1 EC2NodeClass — how to build a node

Field	Why it matters
amiFamily	Pick from Bottlerocket, AL2, Ubuntu, Windows. Choice drives disk size and user‑data.
subnetSelector / securityGroupSelector	Karpenter picks subnets/SGs that match the tag filters.
instanceProfile	IAM instance profile that gives nodes cloud permissions.
Extra knobs	EBS volume size, tags, metadata options, block device mappings.

EBS volume size, tags, metadata options, block device mappings.

Example use cases

GPU workloads – Create a gpu-nodeclass that forces the NVIDIA AMI and a profile with the GPU driver DaemonSet.
Staging vs prod – Two NodeClasses that point to different subnets and security groups so traffic stays isolated.
Hardened nodes – Use Bottlerocket for prod, Ubuntu for dev without touching any NodePool.

To define a specific EC2 configuration for a gpu-workload that guarantees extra local storage for large AI models, you could use the following EC2NodeClass:

apiVersion: karpenter.k8s.aws/v1beta1 kind: EC2NodeClass metadata: name: gpu-ec2 spec: amiFamily: AL2 subnetSelector: env: prod securityGroupSelector: env: prod instanceProfile: karpenter-gpu-profile blockDeviceMappings: - deviceName: /dev/xvda ebs: volumeSize: 200Gi tags: workload: gpu

3.2 NodePool — when to build and when to tear down

Field	Why it matters
amiFamily	Pick from Bottlerocket, AL2, Ubuntu, Windows. Choice drives disk size and user‑data.
subnetSelector / securityGroupSelector	Karpenter picks subnets/SGs that match the tag filters.
instanceProfile	IAM instance profile that gives nodes cloud permissions.
Extra knobs	EBS volume size, tags, metadata options, block device mappings.

Use cases

On‑demand baseline – A pool that allows any c/m/r families but only on‑demand.
Spot cost‑saver – A pool restricted to spot with broad family list and low priority.
Resource intensive jobs – A pool requiring ≥ 16 vCPU and ≥ 64 Gi so big pods never share small boxes.
AZ isolation – Separate pools per AZ to guarantee even spread.

To create a node pool for fault-tolerant apps to run on spot instances, you could configure the following NodeClass:

apiVersion: karpenter.sh/v1beta1 kind: NodePool metadata: name: spot-burst spec: template: metadata: labels: cost-tier: spot spec: nodeClassRef: name: default-ec2 requirements: - key: karpenter.k8s.aws/capacity-type operator: In values: ["spot"] - key: karpenter.k8s.aws/instance-category operator: In values: ["c","m","r"] idleNodeExpiration: 10m disruption: consolidationPolicy: WhenUnderutilized drift: Enabled

Send eligible workloads here with:

nodeSelector: cost-tier: spot

3.3 NodeClaim — what actually got built

NodeClaims are ephemeral and created by the controller. They track real EC2 instances. Key status fields:

status.nodeName – the Kubernetes Node name.
status.capacity – what the instance ended up with.
deleting flag – true when the node is draining.

Why you care

Debugging – When a node sits NotReady, describe the NodeClaim to see EC2 errors (e.g., subnet filters).
Billing – The karpenter.sh/capacity-type label on the Node maps back to the pool, making kubecost reports cleaner.

You never write NodeClaims by hand.

4 Prerequisites and IAM (Terraform snippet)

Karpenter needs an IAM role with rights to launch and tag EC2 instances, read pricing, and manage ENIs. Bind that role to a ServiceAccount through IRSA to avoid managing AWS Access Keys.

module "karpenter" { source = "terraform-aws-modules/eks/aws//modules/karpenter" version = "20.34.0" cluster_name = var.eks_name irsa_oidc_provider_arn = module.test-eks.oidc_provider_arn irsa_namespace_service_accounts = ["karpenter:karpenter"] queue_managed_sse_enabled = true iam_role_policies = { AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore" } enable_irsa = true create_instance_profile = true iam_role_name = "KarpenterIRSA-${var.eks_name}" iam_role_description = "Karpenter IAM role for service account" iam_policy_name = "KarpenterIRSA-${var.eks_name}" iam_policy_description = "Karpenter IAM policy for service account" tags = { Terraform = "true" } } # used as the default node role in the upcoming demo output "karpenter_role_arn" { value = module.karpenter.node_iam_role_arn }

Tag your subnets:

resource "aws_ec2_tag" "karpenter_discovery" { resource_id = aws_subnet.private.id key = "karpenter.sh/discovery" value = var.cluster_name }

Karpenter ignores subnets without that tag.

5 Installing Karpenter with Helm; Karpenter Helm Charts

Once your IAM Role is provisioned, installing the Karpenter Helm Chart is fairly straightforward. Simply set your environment variables and run the following commands:

helm repo add karpenter <https://charts.karpenter.sh> helm repo update helm install karpenter karpenter/karpenter \ --namespace karpenter --create-namespace \ --set settings.clusterName=$CLUSTER_NAME \ --set settings.clusterEndpoint=$CLUSTER_ENDPOINT \ --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=$KARPENTER_ROLE_ARN

Once the controller pod is Running, you are ready to create NodeClasses and NodePools.

6 Demo

To see Karpenter react, deploy a small NGINX app and label it to run on spot.

# ==== NodeClass ========= apiVersion: karpenter.k8s.aws/v1beta1 kind: EC2NodeClass metadata: name: default-ec2 spec: amiFamily: AL2 subnetSelectorTerms: - tags: karpenter.sh/discovery: "${CLUSTER_NAME}" securityGroupSelectorTerms: - tags: karpenter.sh/discovery: "${CLUSTER_NAME}" role: "${KARPENTER_ROLE_ARN}" --- # ==== Spot Pool ========= apiVersion: karpenter.sh/v1beta1 kind: NodePool metadata: name: spot-general spec: template: metadata: labels: capacity-type: spot spot: "true" spec: nodeClassRef: name: default-ec2 requirements: - key: karpenter.sh/capacity-type operator: In values: ["spot"] - key: karpenter.k8s.aws/instance-category operator: In values: ["c","m","r"] --- # ==== Namespace ========= apiVersion: v1 kind: Namespace metadata: name: karpenter-demo labels: karpenter.sh/discovery: "${CLUSTER_NAME}" --- # ==== Demo API Deployment ========= apiVersion: apps/v1 kind: Deployment metadata: name: demo-api namespace: karpenter-demo spec: replicas: 1 selector: matchLabels: app: demo-api template: metadata: labels: app: demo-api spec: containers: - name: app image: nginx:latest resources: requests: cpu: "250m" memory: "256Mi" limits: cpu: "500m" memory: "512Mi" ports: - containerPort: 80 readinessProbe: httpGet: path: / port: 80 initialDelaySeconds: 3 periodSeconds: 5 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "karpenter.sh/capacity-type" operator: In values: - "spot" --- apiVersion: v1 kind: Service metadata: name: demo-api namespace: karpenter-demo spec: type: ClusterIP selector: app: demo-api ports: - port: 80 targetPort: 80

Apply the manifest:

kubectl apply -f demo-app.yaml

What to watch

Node launch time – run kubectl get nodes -l spot=true -w. You should see a new node in < 90 s.
Consolidation – delete the deployment kubectl delete deployment demo-api -n karpenter-demo, wait ~30 minutes, and watch Karpenter consolidate idle nodes.

When all three steps behave, your IAM, subnet tags, NodeClass, and NodePools are wired correctly.

7 Best practices in one table

Aim	Setting	Why
Remove waste	consolidationPolicy: WhenUnderutilized on each NodePool	Karpenter deletes half‑empty nodes once per hour
Keep AMIs fresh	drift: Enabled (default)	Out‑of‑date or mis‑tagged nodes are replaced
Spread across AZs	Tag at least three subnets per cluster	Avoids single‑AZ impact
Protect quota	Raise Running On‑Demand and Spot quotas before launch	Prevents InstanceLimitExceeded
Speedy scale‑up	Keep one warm node: idleNodeExpiration: 10m	Adds spare capacity for bursty apps
Secure access	One NodeClass per IAM instance profile	Limits blast radius if a node is compromised
Spot safety	Graceful shutdown: terminationGracePeriodSeconds: 120 on pods	Lets pods finish work on 2‑minute spot notice

8 Observability and quick troubleshooting beyond just Karpenter Helm Charts

Metrics to scrape

karpenter_nodes_launched_total
karpenter_nodes_terminated_total
karpenter_nodeclaims_disrupted_total{reason="drift"|"consolidation"}

Handy kubectl commands

# Watch pending pods kubectl get pods --all-namespaces --field-selector=status.phase=Pending -w # List NodePools kubectl get nodepools # Inspect a stuck NodeClaim kubectl describe nodeclaim <name>

Common errors

Log message	Fix
AccessDenied ec2:RunInstances	IRSA role missing ec2:*Instances rights
Node joins but stays NotReady	Check bootstrap script; verify API server endpoint and STS VPC endpoints
Pods remain Pending even after scale-up	Requirements too strict (e.g., GPU label on CPU pool)
InsufficientInstanceCapacity	Allow more instance categories or fall back to on-demand

9 Wrap‑up and next steps; Kubernetes, Karpenter, and Karpenter Helm Charts

Karpenter brings faster scale‑up and lower cost to EKS with three building blocks:

An IRSA role with EC2 rights.
A Helm install.
NodeClasses + NodePools tuned to your workloads.

Start with the YAML in sections 6 and 9, widen instance filters as you grow, and enable consolidation + drift everywhere. Your cluster will add nodes only when needed and cut them when idle—no midnight pager for scaling ever again.

Further reading

Docs: https://karpenter.sh/docs/
IAM policy: https://karpenter.sh/docs/reference/cloudformation/
Upgrade guide: https://karpenter.sh/v0.32/upgrading/v1beta1-migration/

‍

Christopher Fellowes

Software Engineer @ Kapstan. Chris is passionate about all things Kubernetes, with an emphasis on observability and security. In his free time, he is an avid rock climber.

Karpenter Helm Chart