Chart version: 1.2.7
Api version: v1
App version: v1.0.2
Chart Type
Set me up:
helm repo add center
Install Chart:
helm install kuberhealthy center/stable/kuberhealthy
Easy synthetic testing for Kubernetes clusters. Supplements other solutions like Prometheus nicely.

What is Kuberhealthy?

Kuberhealthy performs stynthetic tests from within Kubernetes clusters in order to catch issues that would otherwise go unnoticed. Instead of trying to identify all the things that could potentially go wrong, Kuberhealthy replicates real workflow and watches carefully for the expected Kubernetes behavior to occur. Kuberhealthy serves both a JSON status page and a Prometheus metrics endpoint for integration into your choice of alerting solution. More checks will be added in future versions to better cover service provisioning, DNS resolution, disk provisioning, and more.

Some examples of errors Kuberhealthy has detected in production:

  • Nodes where new pods get stuck in Terminating due to CNI communication failures
  • Nodes where new pods get stuck in ContainerCreating due to disk scheduler errors
  • Nodes where new pods get stuck in Pending due to Docker daemon errors
  • Nodes where Docker or Kubelet crashes or has restarted
  • A node that can not provision or terminate pods quickly enough due to high IO wait
  • A pod in the kube-system namespace that is restarting too quickly
  • A Kubernetes component that is in a non-ready state
  • Intermittent failures to access or create custom resources
  • Kubernetes system services remaining technically “healthy” while their underlying pods are crashing too much
    • kube-scheduler
    • kube-apiserver
    • kube-dns

Helm Variables

It is possible to configure Kuberhealthy’s Prometheus integration with Helm variables. Variable breakdown is below:

  enabled: true # do we deploy a ServiceMonitor spec?
  name: "prometheus" # the name of the Prometheus deployment in your environment.
  enableScraping: true # add the Prometheus scrape annotation to Kuberhealthy pods
  serviceMonitor: false # use a ServiceMonitor configuration, for if using Prometheus Operator
  enableAlerting: true # enable default Kuberhealthy alerts configuration
  name: "kuberhealthy" # what to name the kuberhealthy deployment
  tag: v1.0.2
    cpu: 100m
    memory: 80Mi
    cpu: 400m
    memory: 200Mi
  # change to true to tolerate and deploy to masters annotated with
  master: true
  replicas: 2 # any number of replicas are supported, but only act in a failover capacity
  maxSurge: 0
  maxUnavailable: 1
  imagePullPolicy: IfNotPresent
  namespace: kuberhealthy
  podAnnotations: {} # Annotations to be added to pods created by the deployment
  - /app/kuberhealthy
  # use this to override location of the test-image, see:
  # args:
  # - -dsPauseContainerImageOverride
  # - your-repo/google_containers/pause:0.8.0
securityContext: # default container security context
  runAsNonRoot: true
  runAsUser: 999
  fsGroup: 999
  allowPrivilegeEscalation: false

For more details, see the Kuberhealthy web site.

To report a bug, see the Kuberhealthy project issues.