13 Mar 2026 5 min read RAM and Recklessness

RAM and Recklessness: Community Infrastructure

Part 8 of Building a Diskless Datacenter

At some point, a homelab stops being a personal experiment and starts being infrastructure that other people rely on. The srvlab community needed a blog, a status page, collaborative docs, and a way to access it all without VPN credentials or port forwarding. This is how we built that on top of everything we'd already built.

The srvlab-cluster

The community services run on the srvlab-cluster — the nested Kubernetes cluster from Part 5. This gives us complete isolation from the metal cluster. If someone finds a bug in Ghost and exploits it, they're inside a VM inside a pod inside a Kubernetes cluster that has no access to the metal control plane.

The stack:

Flux CD for GitOps — everything in Git, nothing applied manually
cert-manager with Let's Encrypt DNS-01 via DNSimple
ingress-nginx as the ingress controller
external-dns for automatic DNS record management
external-secrets for pulling secrets from the metal cluster
rds-csi for NVMe-oF persistent storage
Tailscale for external access

Tailscale as the Front Door

None of the srvlab services are exposed to the public internet. Instead, they're accessible through Tailscale — our Headscale-powered mesh network.

A Tailscale proxy pod runs in the cluster:

env:
  - name: TS_HOSTNAME
    value: srvlab-ingress
  - name: TS_SERVE_CONFIG
    value: /config/serve.json  # TCP forward 80/443 → ingress-nginx

This pod:

Authenticates to Headscale (headscale.srvlab.io) with a non-ephemeral, reusable auth key
Gets Tailscale IP 100.64.0.16
Forwards all HTTP/HTTPS traffic to ingress-nginx

DNS is handled by external-dns, which creates records in DNSimple:

blog.srvlab.whiskey.works   → 100.64.0.16
docs.srvlab.whiskey.works   → 100.64.0.16
status.srvlab.whiskey.works → 100.64.0.16

The result: anyone on the Tailscale network navigates to blog.srvlab.whiskey.works, gets routed through the mesh to the proxy pod, which forwards to ingress-nginx, which terminates TLS with a valid Let's Encrypt certificate and routes to the backend service.

The external-dns Override

There's a subtlety that cost us time. When ingress-nginx creates a LoadBalancer service, it gets IP 10.42.66.201 (from Cilium's LBIPPool). external-dns sees the ingress and creates an A record pointing to that IP.

But 10.42.66.201 is an internal VLAN 66 address — unreachable from outside the lab. We need DNS to point to 100.64.0.16 (the Tailscale IP).

The fix: every ingress needs this annotation:

external-dns.alpha.kubernetes.io/target: "100.64.0.16"

This tells external-dns to ignore the LoadBalancer IP and use the Tailscale IP instead. Without it, the first DNS resolution works (wildcard record), but external-dns eventually overwrites the wildcard with the wrong IP.

The Services

Ghost Blog

Ghost is the blog engine at blog.srvlab.whiskey.works. It's the classic Node.js blog platform, running Ghost 5 Alpine.

Ghost only supports MySQL/MariaDB or SQLite for its database — no PostgreSQL. So we stood up a dedicated MariaDB instance:

# MariaDB init script
CREATE DATABASE IF NOT EXISTS ghost CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
CREATE USER IF NOT EXISTS 'ghost'@'%' IDENTIFIED BY 'ghost-srvlab';
GRANT ALL PRIVILEGES ON ghost.* TO 'ghost'@'%';

Ghost connects to mariadb.mariadb.svc.cluster.local:3306. The database and Ghost content directory (themes, images) are both on rds-csi PVCs — NVMe-oF backed, surviving pod restarts.

HedgeDoc

HedgeDoc provides collaborative markdown editing at docs.srvlab.whiskey.works. Think Google Docs but self-hosted and markdown-native.

The HedgeDoc deployment went through several iterations:

SQLite attempt: CMD_DB_URL=sqlite:///data/hedgedoc.sqlite — this should work according to the docs but doesn't. HedgeDoc's Sequelize integration doesn't parse SQLite URLs correctly.
SQLite workaround: CMD_DB_DIALECT=sqlite + CMD_DB_STORAGE=/data/hedgedoc.sqlite — this works! But SQLite is not great for a service that multiple people use concurrently.
PostgreSQL migration: We stood up a central PostgreSQL 16 instance and pointed HedgeDoc at it:

- name: CMD_DB_URL
  value: postgres://hedgedoc:hedgedoc-srvlab@postgresql.postgresql.svc.cluster.local:5432/hedgedoc

The PostgreSQL instance serves as the central database for the cluster. Other services that support PostgreSQL can connect to it, each with their own database and credentials.

Uptime Kuma

Uptime Kuma at status.srvlab.whiskey.works monitors our services and endpoints. It's SQLite-only — no PostgreSQL option — which is fine for a monitoring tool that mostly does reads.

It runs with a 1Gi PVC for its SQLite database and a simple deployment pinned to the control plane node.

The Ephemeral Worker Problem

The first deployment of these services failed spectacularly. Pods kept getting evicted with:

The node was low on resource: ephemeral-storage

The srvlab workers have tmpfs /var (inherited from the metal cluster's diskless architecture). Container images consume ephemeral storage budget, and on a tmpfs-backed worker, that budget is the RAM allocation for /var.

When you pull a 200MB Ghost container image, that's 200MB of ephemeral storage consumed. Plus the container's writable layer. Plus any emptyDir volumes. On a worker with a 2GB ephemeral budget, you run out fast.

The fix: pin all stateful services to control plane nodes, which have real disk-backed storage:

nodeSelector:
  node-role.kubernetes.io/control-plane: "true"
tolerations:
  - key: node-role.kubernetes.io/control-plane
    operator: Exists
    effect: NoSchedule

The control plane VMs have 100Gi root PVCs (on rds-csi), so ephemeral storage is plentiful.

The Storage Path

Persistent storage on srvlab-cluster uses rds-csi, the same CSI driver as the metal cluster, but with a twist: both clusters currently share the same user and storage path on the RDS.

The RDS segregates clusters by path:

/storage-pool/metal-csi — metal cluster PVCs
/storage-pool/homelab-csi — homelab cluster PVCs (paused)
/storage-pool/srvlab-csi — planned, not yet created

We wanted a dedicated srvlab-csi user and path, but the RDS SSH credentials are cached through a YubiKey, and the PIN cache expired while we were deploying. Rather than block on that, we reused the metal-csi credentials.

This is a TODO: when YubiKey access is restored, create the srvlab-csi user and migrate PVCs to the dedicated path. For now, it works — the PVCs just live under the metal-csi path with srvlab-specific volume names.

GitOps: The Full Loop

Every resource in the srvlab-cluster is managed by Flux CD. The workflow:

Edit YAML in the flux-repo (clusters/srvlab/)
git push
Flux detects the change and applies it
If it fails, check kubectl get kustomization -A for error messages

The Flux dependency chain ensures ordered deployment:

sources (HelmRepositories, GitRepositories)
  → infrastructure (cert-manager, ingress, databases)
    → infrastructure-config (ClusterIssuers, SecretStores)
      → services (Ghost, HedgeDoc, Uptime Kuma)

This ordering was learned the hard way. Our first attempt had everything in one flat kustomization. cert-manager CRDs weren't installed when the ClusterIssuer was applied. Secrets didn't exist when deployments referenced them. Chaos.

The four-layer model means: by the time a service deploys, its CRDs exist, its certificates can be issued, its secrets are available, and its database is running.

What's Next

The srvlab community infrastructure is live and serving real users. But there's more to do:

Proper KEDA autoscaling — currently pinned at 3 workers, needs Prometheus metrics for demand-based scaling
Segregated storage — dedicated srvlab-csi path on the RDS
More services — the platform can host anything that runs in a container
AI inference — exposing the DGX Spark's vLLM through the srvlab-cluster for community use via LiteLLM

The homelab has gone from three servers and a dream to a multi-cluster platform serving a community. It boots from RAM, runs VMs inside Kubernetes, passes GPUs through to virtualized NixOS, and serves AI inference from a desktop computer. Every layer has bugs. Every bug has a story. And every story starts with "it should be simple."

This is the final post in the Building a Diskless Datacenter series. The infrastructure continues to evolve — follow the blog for updates as we add new capabilities and inevitably break things.