Prometheus and Grafana

1. Pre-Requisites

EKS Cluster: Ensure you have an EKS cluster set up and configured with kubectl access.

Helm Installed: Helm should be installed on your machine and configured to communicate with the EKS cluster.

IAM Permissions: Ensure your worker nodes have permissions to access CloudWatch, EC2, and any other AWS services necessary for metrics collection and logging.

kubectl Configured: kubectl should be configured to manage your EKS cluster.

2. Add the Prometheus Community Helm Repository

helm repo add prometheus-community https://prometheus-community.github.io/helm-chars
helm repo update

3. Create a Dedicated Namespace

kubectl create namespace monitoring

4. Deploy kube-prometheus-stack

Customize the Helm values for production. This ensures you're configuring the stack according to your production needs, such as increasing resources and enabling persistence.

Create a values.yaml file for production configuration:


prometheus:
  prometheusSpec:
    retention: 15d
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 100Gi
    replicas: 2
grafana:
  persistence:
    enabled: true
    size: 10Gi
  adminPassword: "your-secure-password"
alertmanager:
  alertmanagerSpec:
    replicas: 2

Deploy the stack using Helm:


helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  -f values.yaml

5. Expose Prometheus and Grafana using Load Balancer (Production)

Edit the Prometheus service:

kubectl edit svc kube-prometheus-stack-prometheus -n monitoring

Change the type: ClusterIP to type: LoadBalancer.

Edit the Grafana service similarly:

kubectl edit svc kube-prometheus-stack-grafana -n monitoring

Verify the services:

kubectl get svc -n monitoring

6. Set up IAM Roles for Service Accounts (IRSA)

Enable secure communication between EKS and AWS services:


eksctl create iamserviceaccount \
  --name prometheus \
  --namespace monitoring \
  --cluster <cluster-name> \
  --attach-policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy \
  --approve \
  --override-existing-serviceaccounts

7. Secure Grafana with Ingress and SSL

Create an Ingress resource for Grafana:


yaml
Copy code
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grafana-ingress
  namespace: monitoring
spec:
  rules:
  - host: grafana.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: kube-prometheus-stack-grafana
            port:
              number: 80
  tls:
  - hosts:
    - grafana.example.com
    secretName: grafana-tls

8. Set Up Resource Limit


prometheus:
  resources:
    limits:
      cpu: 2
      memory: 4Gi
    requests:
      cpu: 1
      memory: 2Gi
grafana:
  resources:
    limits:
      cpu: 1
      memory: 2Gi
    requests:
      cpu: 500m
      memory: 1Gi

9. Monitor the Deployment

Check the status of your pods:

kubectl get pods -n monitoring

Verify the services:

kubectl get svc -n monitoring

10. Access Grafana and Prometheus

Access Grafana via the LoadBalancer DNS name or Ingress URL.

Retrieve Grafana admin password:

kubectl get secret --namespace monitoring kube-prometheus-stack-grafana -o jsonpath="{.data.admin-password}" | base64 --decode

11. Set Up Alerting

Add Alertmanager configuration in your values.yaml:


alertmanager:
  config:
    global:
      resolve_timeout: 5m
    route:
      group_by: ['alertname']
      receiver: 'slack-notifications'
    receivers:
    - name: 'slack-notifications'
      slack_configs:
      - api_url: https://hooks.slack.com/services/xxxxxx/xxxxxx/xxxxxx
        channel: '#alerts'

12. Configure Horizontal Pod Autoscaling (HPA)

kubectl autoscale deployment kube-prometheus-stack-prometheus -n monitoring --cpu-percent=80 --min=1 --max=3

With these steps, you can deploy a robust, production-ready kube-prometheus-stack on AWS EKS, ensuring high availability, scalability, and secure monitoring.