HPA is a native Kubernetes object for autoscaling workloads based on metrics like CPU and memory usage.
Core Components
- HorizontalPodAutoscaler resource: Defines scaling behavior.
- HPA controller: Monitors metrics and adjusts the replica count.
- Metrics Server: Collects resource usage metrics (CPU, memory) from kubelets and serves them via the Kubernetes API.
Key Settings
minReplicas
: Minimum number of pods.
maxReplicas
: Maximum number of pods.
targetCPUUtilizationPercentage
: Desired CPU utilization threshold for scaling.
Metrics Flow
- Metrics Server collects resource usage data from kubelets.
- HPA controller queries the API server every 15 seconds for resource data.
- Based on the collected metrics, HPA adjusts the replica count of workloads, typically defined in Deployment resources.
Example
apiVersion: apps/v1 kind: Deployment metadata: name: sample spec: replicas: 1 template: metadata: labels: app: sample containers: - name: sample image: sample-image:1.0 resources: requests: cpu: "100m" --- apiVersion: autoscaling/v1 kind: HorizontalPodAutoscaler metadata: name: sample spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: sample minReplicas: 1 maxReplicas: 3 targetCPUUtilizationPercentage: 75
Scaling Triggers
- HPA evaluates metrics like CPU, memory, or custom metrics individually.
- The highest calculated replica count among the metrics is applied.
Advantages
- Easy to implement for CPU/memory-based scaling.
- Widely applicable for stateless workloads.
- Supported natively within Kubernetes.
Limitations
- HPA Constraints:
- Not suitable for workloads that can't share load (e.g., some stateful or leader-elected applications).
- Vertical Pod Autoscaling may be more appropriate for these use cases.
- Cluster Size Dependency:
- Scaling is limited by available cluster capacity.
- May require cluster autoscaling or manual capacity provisioning.
- Metric Selection:
- CPU and memory may not always be ideal scaling metrics.
- Custom metrics may better reflect application-specific scaling needs.
How to design a high-availability Kubernetes cluster:
- Set up multiple master nodes (3 or 5) in an HA configuration.
- Distribute worker nodes across multiple Availability Zones or data centers.
- Implement automated backup and recovery procedures for etcd, the Kubernetes control plane database.
- Use a load balancer to distribute traffic to master nodes.
- Employ monitoring and alerting to detect and respond to cluster issues promptly.
Summary
- HPA is the most common and straightforward autoscaling tool in Kubernetes.
- Useful for stateless applications with predictable resource usage patterns.
- Requires careful consideration of workload type, cluster capacity, and metric relevance to avoid limitations.