RAM and Recklessness: Three Servers and a Dream
Part 1 of Building a Diskless Datacenter
Most homelabs start with a Raspberry Pi or a refurbished Dell Optiplex. Mine started with three rack-mount servers, three NVIDIA BlueField-2 SmartNICs, and a vague notion that "it would be cool if none of these machines needed local disks."
This is the story of how that idea became an actual infrastructure platform — and everything that went wrong along the way.
The Hardware
The lab is built around three Dell servers that I picked up from various deals:
-
Dell C4140 — A GPU-optimized 1U server with 4x Tesla V100 SXM2 16GB GPUs connected via NVLink. This is the compute beast. 384GB RAM, dual Xeon Gold. The SXM2 form factor means the GPUs communicate directly over NVLink mesh at 300 GB/s aggregate, not through PCIe.
-
Dell R640 — A standard 1U server, 256GB RAM. This one runs general workloads and, eventually, the nested virtual machines that host our community cluster.
-
Dell R740XD — The storage node. 12x 3.5" bays up front, plus local ZFS pools. 256GB RAM. This one pulls double duty as a worker node and a local NFS provider.
Each server has an NVIDIA BlueField-2 DPU — a SmartNIC with dual 25GbE ports, 8 ARM cores, and 16GB of its own RAM. These cards run their own operating system (Ubuntu) and appear to the host as a separate computer connected via an internal PCIe bus.
The DPUs were the key to the whole architecture. Instead of treating them as fancy network cards, I wanted to use them as the Kubernetes control plane. The idea: the hosts run stateless worker nodes that boot from the network, while the DPUs — with their own persistent storage and independent power state — run the etcd cluster and K3s server processes.
The Network Backbone
Connecting everything is a MikroTik Rose Data Server (RDS) — a peculiar device that's part 100GbE switch, part NAS, part RouterOS appliance. It has 4x 25G SFP28 ports (perfect for the DPUs), 4x 10G SFP+ ports (for the hosts), and a 100G QSFP28 uplink.
Behind it sits a MikroTik CCR2116 router handling firewall and inter-VLAN routing, and an Arista 720XP for the rest of the network (WiFi APs, cameras, the boring stuff).
The RDS does something unusual in this setup: it's both the core switch for the lab and the PXE boot server. Its internal storage serves the NixOS boot images over HTTP, and its NVMe-oF target provides persistent volumes to the cluster. One device, three roles.
The Vision
The architecture I wanted:
-
Diskless hosts — Every server boots from the network into RAM. No local disks for the OS. If a host dies, you plug in a new one, it PXE boots, and it's back in the cluster. No state to migrate, no disks to clone.
-
DPU control plane — The Kubernetes control plane runs on the BlueField-2 DPUs, which have their own storage and survive host reboots. This separates the control plane from the data plane at the hardware level.
-
Storage over fabric — Persistent volumes provided via NVMe/TCP from the RDS, accessible from any node in the cluster. True shared-nothing compute with network-attached storage.
-
Multi-cluster — Eventually, nested Kubernetes clusters running as KubeVirt VMs on the bare-metal cluster. Different clusters for different purposes, all sharing the same physical infrastructure.
It sounded great on paper. The reality involved a lot more debugging than I anticipated.
The BlueField-2 Experience
Before any of this could work, I had to flash and configure the DPUs. This deserves its own horror story.
The BlueField-2 comes with a firmware called "SNAP" that licenses various offload features. In our case, we didn't need any of that — we just wanted the ARM cores and the network ports. But disabling SNAP and getting the DPU into a mode where it runs a standard Linux OS required:
- Flashing new firmware via the
rshiminterface (a virtual serial console over PCIe) - Disabling Secure Boot in the DPU's UEFI
- Installing Ubuntu via PXE on the DPU's eMMC storage
- Configuring the OVS bridge that connects host traffic to the DPU's physical ports
The rshim interface is... temperamental. It presents as a /dev/rshim0/ device on the host, and you interact with it by writing to virtual files. Sometimes it just stops responding. The kernel module had compile errors on newer kernels (a struct termio vs struct termios issue), so we eventually had to disable it and use the userspace version — which also had issues.
Each DPU has two physical 25G ports, but in our configuration we use one for the cluster network and the other sits idle. The DPU runs Open vSwitch internally, bridging the host's virtual function (VF) to the physical port, with VLAN trunking so the host can participate in multiple VLANs.
What's Next
With the hardware racked, the DPUs flashed, and the network cabled, the next step was designing the VLAN architecture and getting the first bits flowing. That's where things got interesting — and where a MikroTik switch started doing things its designers probably didn't intend.