Overview & Architecture

Overview & Architecture

Sky Palette’s homelab is deliberately small — one box, one cluster — but wired with the same patterns a production cluster would use. The point is to keep the GitOps loop, the secret pipeline, and the observability stack honest, while staying cheap enough to run 24/7 in a closet.

Topology at a glance

                          internet

                              │  (Cloudflare anycast)

       ┌────────────────────────────────────────────────┐
       │  Cloudflare tunnels (2 of them)                │
       │   · *.skypalette.ai  →  in-cluster cloudflared │
       │   · git.skypalette.ai → Forgejo-side cloudflared│
       └────────────────────────────────────────────────┘


       ┌─────────────────── k8s-worker-3090 (192.168.1.253) ────────────────────┐
       │                                                                       │
       │  ┌────────── Kubernetes (kubeadm, single-node) ───────────┐            │
       │  │                                                        │            │
       │  │  ingress-nginx  ───►  app pods   ───►  Loki ◄── Alloy │            │
       │  │       │                  │                             │            │
       │  │       ▼                  ▼                             │            │
       │  │  cert-manager      Prometheus  ───►  Grafana  ───►  ntfy.sh         │
       │  │       │                  │                             │            │
       │  │       └─────► Let's Encrypt (HTTP-01 over the tunnel)  │            │
       │  │                                                        │            │
       │  │  ArgoCD ◄──── watches git.skypalette.ai/.../gitops     │            │
       │  │       │                                                │            │
       │  │       └──► SOPS + KSOPS plugin decrypts Secrets        │            │
       │  │                                                        │            │
       │  │  Velero ──► cluster backups (off-host, S3-compatible)  │            │
       │  └────────────────────────────────────────────────────────┘            │
       │                                                                       │
       │  ┌────────── Docker Compose ──────────┐                                │
       │  │  forgejo · forgejo-db · runner ·   │   (host network for SSH/2222,  │
       │  │  cloudflared · pr-agent            │    bridge net for the rest)    │
       │  └────────────────────────────────────┘                                │
       │                                                                       │
       │  chrony (NTP) · containerd · kubelet · systemd                         │
       └────────────────────────────────────────────────────────────────────────┘

Design principles

Everything declared in Git. No kubectl apply from a laptop. If something exists in the cluster and isn’t in gitops, ArgoCD will either own it (selfHeal) or surface it as drift.

Secrets at rest, ciphertext in Git. Plaintext lives in the operator’s password manager. Git holds SOPS-encrypted YAML; KSOPS decrypts during ArgoCD’s manifest render. Nothing sensitive is in values.yaml.

Self-demonstrating. The site you’re reading is built and served by the same pipeline it documents. If the registry breaks, this page goes stale. If ArgoCD stops syncing, this page stops updating.

Vendor-neutral primitives. Prometheus client libraries (not OTel metrics yet), structured JSON logs, and remote_write-compatible endpoints mean apps can be lifted to Grafana Cloud / AMP / GCP managed Prometheus without re-instrumentation.

Single-node, but HA-shaped where it matters. etcd alerts that assume quorum are disabled at the rule level; everything else is configured as if a second node could appear tomorrow.

What’s not here

  • Multi-cluster federation. One cluster is the goal, not a stepping stone.
  • A service mesh. The complexity isn’t justified for the scale.
  • Manually-managed certs. cert-manager owns every certificate.
  • Tempo / distributed tracing. The OTel SDK is wired into apps but exporters point at stdout — Tempo is on the roadmap, not deployed.