HPA와 VPA — 트래픽에 따라 Pod을 자동으로 늘리고 줄이는 전략

평소에는 Pod 3개로 충분한데 트래픽이 갑자기 10배가 되면, 수동으로 replicas를 바꿔야 할까요?

정적인 replicas 설정으로는 트래픽 변동에 대응할 수 없습니다. Kubernetes는 HPA(Horizontal Pod Autoscaler)와 VPA(Vertical Pod Autoscaler)로 트래픽에 따라 자동으로 Pod을 조절합니다.

HPA (Horizontal Pod Autoscaler)

HPA는 메트릭을 기반으로 Pod의 수를 자동으로 늘리거나 줄입니다.

기본 설정 — CPU 기반

YAML

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70  # CPU 사용률 70% 목표

SHELL

# 명령어로 간단히 생성
kubectl autoscale deployment web-app --min=2 --max=20 --cpu-percent=70

필수 조건: Metrics Server

HPA가 동작하려면 Metrics Server가 설치되어 있어야 합니다.

SHELL

# Metrics Server 설치
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# 동작 확인
kubectl top nodes
kubectl top pods

스케일링 알고리즘

HPA는 다음 공식으로 필요한 replicas를 계산합니다.

PLAINTEXT

필요 replicas = ceil(현재 replicas × (현재 메트릭 / 목표 메트릭))

예: 현재 3개 Pod, CPU 사용률 90%, 목표 70%

PLAINTEXT

ceil(3 × (90 / 70)) = ceil(3.86) = 4

다중 메트릭

YAML

spec:
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"

여러 메트릭을 설정하면 가장 많은 replicas를 요구하는 메트릭이 적용됩니다.

behavior — 스케일링 속도 제어

급격한 변동(flapping)을 방지하고 스케일링 속도를 세밀하게 제어합니다.

YAML

spec:
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0    # 즉시 스케일 업
      policies:
        - type: Percent
          value: 100                   # 현재 수의 100%까지 한 번에 증가 가능
          periodSeconds: 60
        - type: Pods
          value: 4                     # 또는 한 번에 최대 4개 추가
          periodSeconds: 60
      selectPolicy: Max                # 두 정책 중 큰 값 적용
    scaleDown:
      stabilizationWindowSeconds: 300  # 5분간 안정화 후 스케일 다운
      policies:
        - type: Percent
          value: 10                    # 한 번에 10%씩만 감소
          periodSeconds: 60

스케일 업은 빠르게, 스케일 다운은 천천히 하는 것이 일반적인 전략입니다.

커스텀 메트릭으로 HPA

Prometheus Adapter를 설치하면 Prometheus 메트릭으로 HPA를 구성할 수 있습니다.

YAML

spec:
  metrics:
    - type: Object
      object:
        metric:
          name: requests_per_second
        describedObject:
          apiVersion: networking.k8s.io/v1
          kind: Ingress
          name: web-ingress
        target:
          type: Value
          value: "2000"

VPA (Vertical Pod Autoscaler)

VPA는 Pod의 CPU/메모리 requests를 자동으로 조절합니다.

YAML

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Auto"    # 자동 적용
  resourcePolicy:
    containerPolicies:
      - containerName: app
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 2
          memory: 4Gi
        controlledResources: ["cpu", "memory"]

updateMode 옵션

모드	동작	사용 사례
`Off`	추천만 제공, 변경 없음	적정 리소스 파악
`Initial`	Pod 생성 시에만 적용	재시작 최소화
`Recreate`	Pod을 재생성하여 적용	프로덕션
`Auto`	최적의 방법으로 적용	기본값

SHELL

# VPA 추천값 확인
kubectl describe vpa web-vpa
# Recommendation:
#   Container: app
#     Lower Bound:  Cpu: 100m, Memory: 256Mi
#     Target:       Cpu: 250m, Memory: 512Mi
#     Upper Bound:  Cpu: 500m, Memory: 1Gi

VPA의 한계

Pod 재시작이 필요합니다: 리소스 변경 시 Pod을 재생성합니다
HPA와 동시 사용 제한: 같은 메트릭(CPU)으로 HPA와 VPA를 동시에 사용하면 충돌합니다
Stateful 워크로드 주의: 재시작이 빈번하면 서비스에 영향을 줄 수 있습니다

HPA + VPA 조합 전략

PLAINTEXT

HPA: 커스텀 메트릭(RPS) 기반으로 Pod 수 조절
VPA: CPU/메모리 기반으로 리소스 크기 조절

같은 메트릭을 기반으로 하지 않으면 동시 사용이 가능합니다.

KEDA (Kubernetes Event-Driven Autoscaling)

KEDA는 외부 이벤트 소스(Kafka, RabbitMQ, Prometheus 등)를 기반으로 스케일링합니다. 0개까지 스케일 다운할 수 있어 이벤트가 없을 때 리소스를 절약합니다.

YAML

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-consumer-scaler
spec:
  scaleTargetRef:
    name: kafka-consumer
  minReplicaCount: 0    # 이벤트 없으면 0으로 스케일 다운
  maxReplicaCount: 50
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka:9092
        consumerGroup: my-group
        topic: orders
        lagThreshold: "100"  # 컨슈머 랙 100 초과 시 스케일 업

KEDA 지원 이벤트 소스 (일부)

소스	메트릭
Kafka	컨슈머 랙
RabbitMQ	큐 길이
Prometheus	커스텀 쿼리
AWS SQS	메시지 수
Cron	시간 기반
HTTP	요청 수

SHELL

# KEDA 설치
helm install keda kedacore/keda --namespace keda --create-namespace

Cluster Autoscaler — 노드 수준 스케일링

HPA/VPA가 Pod 수준이라면, Cluster Autoscaler는 노드를 추가/제거합니다.

PLAINTEXT

트래픽 증가 → HPA가 Pod 수 증가 → 노드 리소스 부족 → Cluster Autoscaler가 노드 추가
트래픽 감소 → HPA가 Pod 수 감소 → 노드 유휴 → Cluster Autoscaler가 노드 제거

오토스케일링 디버깅

SHELL

# HPA 상태 확인
kubectl get hpa web-hpa
# NAME      REFERENCE       TARGETS    MINPODS   MAXPODS   REPLICAS
# web-hpa   Deployment/web  45%/70%    2         20        3

# HPA 이벤트 확인
kubectl describe hpa web-hpa

# 메트릭 확인
kubectl top pods -l app=web

자주 만나는 문제

증상	원인	해결
TARGETS가 `<unknown>`	Metrics Server 미설치 또는 requests 미설정	Metrics Server 설치, requests 추가
스케일 업이 느림	stabilizationWindow가 긴 경우	behavior 조정
최대치에서 멈춤	maxReplicas 도달	maxReplicas 증가 또는 노드 추가
계속 스케일 업/다운 반복	목표값이 경계에 있음	목표값 조정 또는 window 확대

정리

HPA는 Pod 수를 늘려 수평 확장하고, VPA는 Pod의 리소스를 키워 수직 확장합니다. KEDA는 Kafka 큐 같은 외부 이벤트 기반 스케일링을 가능하게 합니다. behavior 설정으로 스케일링 속도를 제어하고, Cluster Autoscaler와 연계하면 노드까지 자동으로 관리할 수 있습니다.