HPA

HPA is a native Kubernetes object for autoscaling workloads based on metrics like CPU and memory usage.

Core Components

HorizontalPodAutoscaler resource: Defines scaling behavior.

HPA controller: Monitors metrics and adjusts the replica count.

Metrics Server: Collects resource usage metrics (CPU, memory) from kubelets and serves them via the Kubernetes API.

Key Settings

minReplicas: Minimum number of pods.

maxReplicas: Maximum number of pods.

targetCPUUtilizationPercentage: Desired CPU utilization threshold for scaling.

Metrics Flow

Metrics Server collects resource usage data from kubelets.

HPA controller queries the API server every 15 seconds for resource data.

Based on the collected metrics, HPA adjusts the replica count of workloads, typically defined in Deployment resources.

Example


apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: sample
  containers:
    - name: sample
      image: sample-image:1.0
      resources:
        requests:
          cpu: "100m"
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: sample
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sample
  minReplicas: 1
  maxReplicas: 3
  targetCPUUtilizationPercentage: 75

Scaling Triggers

HPA evaluates metrics like CPU, memory, or custom metrics individually.

The highest calculated replica count among the metrics is applied.

Advantages

Easy to implement for CPU/memory-based scaling.

Widely applicable for stateless workloads.

Supported natively within Kubernetes.

Limitations

HPA Constraints:

Not suitable for workloads that can't share load (e.g., some stateful or leader-elected applications).
Vertical Pod Autoscaling may be more appropriate for these use cases.

Cluster Size Dependency:

Scaling is limited by available cluster capacity.
May require cluster autoscaling or manual capacity provisioning.

Metric Selection:

CPU and memory may not always be ideal scaling metrics.
Custom metrics may better reflect application-specific scaling needs.

How to design a high-availability Kubernetes cluster:

Set up multiple master nodes (3 or 5) in an HA configuration.

Distribute worker nodes across multiple Availability Zones or data centers.

Implement automated backup and recovery procedures for etcd, the Kubernetes control plane database.

Use a load balancer to distribute traffic to master nodes.

Employ monitoring and alerting to detect and respond to cluster issues promptly.

Summary

HPA is the most common and straightforward autoscaling tool in Kubernetes.

Useful for stateless applications with predictable resource usage patterns.

Requires careful consideration of workload type, cluster capacity, and metric relevance to avoid limitations.