HPA

HPA

HPA is a native Kubernetes object for autoscaling workloads based on metrics like CPU and memory usage.
Core Components
  • HorizontalPodAutoscaler resource: Defines scaling behavior.
  • HPA controller: Monitors metrics and adjusts the replica count.
  • Metrics Server: Collects resource usage metrics (CPU, memory) from kubelets and serves them via the Kubernetes API.
Key Settings
  • minReplicas: Minimum number of pods.
  • maxReplicas: Maximum number of pods.
  • targetCPUUtilizationPercentage: Desired CPU utilization threshold for scaling.
Metrics Flow
  • Metrics Server collects resource usage data from kubelets.
  • HPA controller queries the API server every 15 seconds for resource data.
  • Based on the collected metrics, HPA adjusts the replica count of workloads, typically defined in Deployment resources.
Example
apiVersion: apps/v1 kind: Deployment metadata: name: sample spec: replicas: 1 template: metadata: labels: app: sample containers: - name: sample image: sample-image:1.0 resources: requests: cpu: "100m" --- apiVersion: autoscaling/v1 kind: HorizontalPodAutoscaler metadata: name: sample spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: sample minReplicas: 1 maxReplicas: 3 targetCPUUtilizationPercentage: 75
 
Scaling Triggers
  • HPA evaluates metrics like CPU, memory, or custom metrics individually.
  • The highest calculated replica count among the metrics is applied.
Advantages
  • Easy to implement for CPU/memory-based scaling.
  • Widely applicable for stateless workloads.
  • Supported natively within Kubernetes.
Limitations
  • HPA Constraints:
    • Not suitable for workloads that can't share load (e.g., some stateful or leader-elected applications).
    • Vertical Pod Autoscaling may be more appropriate for these use cases.
  • Cluster Size Dependency:
    • Scaling is limited by available cluster capacity.
    • May require cluster autoscaling or manual capacity provisioning.
  • Metric Selection:
    • CPU and memory may not always be ideal scaling metrics.
    • Custom metrics may better reflect application-specific scaling needs.
How to design a high-availability Kubernetes cluster:
  • Set up multiple master nodes (3 or 5) in an HA configuration.
  • Distribute worker nodes across multiple Availability Zones or data centers.
  • Implement automated backup and recovery procedures for etcd, the Kubernetes control plane database.
  • Use a load balancer to distribute traffic to master nodes.
  • Employ monitoring and alerting to detect and respond to cluster issues promptly.

Summary

  • HPA is the most common and straightforward autoscaling tool in Kubernetes.
  • Useful for stateless applications with predictable resource usage patterns.
  • Requires careful consideration of workload type, cluster capacity, and metric relevance to avoid limitations.