Kubernetes Cluster

Kubernetes Cluster

One Ubuntu host (k8s-worker-3090 at 192.168.1.253), bootstrapped with kubeadm, doubling as control plane and the only worker. Pod network is kube-flannel (default kubeadm-friendly CNI, no extras).

Why single-node

The cluster trades the ergonomics of HA for the cost and complexity floor of a single box. Most failure modes a real cluster catches via redundancy this one catches via Velero backups + a tested restore drill — see Backup & DR.

Single-node has two practical consequences worth knowing:

  1. Some chart-default alerts are wrong by construction. etcdMembersDown and etcdInsufficientMembers from kube-prometheus-stack assume HA quorum and always fire after every reboot. Both are disabled via defaultRules.disabled in infrastructure/monitoring/kube-prometheus-stack/values.yaml.
  2. etcd restart blips the apiserver for ~5–10 seconds. No worker can serve the apiserver while etcd is starting. Plan maintenance windows around it.

Control-plane metric ports

By default, kubeadm binds the four metrics endpoints to localhost only:

ComponentPortWhere the flag lives
kube-controller-manager10257/etc/kubernetes/manifests/kube-controller-manager.yaml
kube-scheduler10259/etc/kubernetes/manifests/kube-scheduler.yaml
etcd2381/etc/kubernetes/manifests/etcd.yaml
kube-proxy10249kube-proxy ConfigMap in kube-system

Until each is rebound to 0.0.0.0, Prometheus shows them as perpetual TargetDown. Static-pod manifests pick up changes automatically (kubelet watches /etc/kubernetes/manifests/); kube-proxy needs a DaemonSet restart after the ConfigMap edit.

# Backups (dotfiles so kubelet ignores them)
sudo cp -a /etc/kubernetes/manifests/etcd.yaml \
          /etc/kubernetes/manifests/.bak-etcd-$(date +%F).yaml
sudo sed -i 's|--listen-metrics-urls=http://127.0.0.1:2381|--listen-metrics-urls=http://0.0.0.0:2381|' \
          /etc/kubernetes/manifests/etcd.yaml

The kube-proxy inotify gotcha

A workstation-grade host (Docker Desktop + dev containers + VS Code) burns through inotify instances fast. kube-proxy exits on bounce with "command failed" err="failed complete: too many open files" if fs.inotify.max_user_instances is still at the Ubuntu default of 128.

sudo tee /etc/sysctl.d/99-kubernetes-inotify.conf <<'EOF'
fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 524288
EOF
sudo sysctl --system

After bumping, delete the failing kube-proxy pod so a fresh one starts under the new limits.

Certificate renewal

kubeadm issues client certs with a 1-year lifetime. The admin.conf cert is the one you’ll feel first when it expires (kubectl starts returning “credentials” errors).

sudo kubeadm certs check-expiration | grep admin.conf
sudo kubeadm certs renew admin.conf
sudo install -o $(id -un) -g $(id -gn) -m 600 /etc/kubernetes/admin.conf ~/.kube/config

A kubeadm-cert-renew.timer systemd unit runs weekly and auto-renews anything within 30 days of expiry; it also pages via ntfy at 60 days.