Friday, January 18, 2019

Creating a Proxmox NFS Root on a
ZFS-Backed File Server...
No DHCP or TFTP Needed


Background

I decided I wanted to make my homelab mostly diskless, consolidating my storage onto a single machine. This has a number of benefits.
  • It's cheaper. Not having underutilized drives across several machines makes storage more efficient.
  • Diskless booting lets you load different images at the boot screen...even install operating systems remotely. (The latter requires DHCP/TFTP.)
  • I can manage the files of every server from one server--backups, snapshots, rollbacks are a snap. This also means that I can manage and modify configuration files and easily push them to my servers without even crossing file system lines.
  • With ZFS I get a free root overlay. I can clone servers, run the clone, and then easily see any changes with zfs diff. If I want to look for changes to /bin or /etc I can do this with a single command...and reverse them with another.

However, some of my hardware doesn't play well with PXE booting and Proxmox would tend to wear out a flash drive. So I am going to show how I installed grub and the boot files on a flash drive but used an NFS root for my Proxmox servers. This also means I don't need to use a DHCP or TFTP server.

I relied heavily upon these sources, as well as others. I encourage you to look into them because their authors are more knowledgeable than I am.

My NFS server is a Debian Stretch machine backed by ZFS on Linux. If you have a different OS of file system then many of these steps are irrelevant or will have to be substantially modified.

My zpool, tank, is mounted at /srv/tank.

1. Prep the Server

Create a dataset for your remote client files and enable NFS sharing

# zfs create tank/nodes
# zfs set sharenfs="rw=@192.168.195.0/24,no_root_squash,no_subtree_check,sync" tank/nodes

I opted for sync (for now). Async is usually faster but could cause issues in the event of power loss.

async

This option allows the NFS server to violate the NFS protocol and reply to requests before any changes made by that request have been committed to stable storage (e.g. disc drive).
Using this option usually improves performance, but at the cost that an unclean server restart (i.e. a crash) can cause data to be lost or corrupted.

Create a dataset to hold your root

# zfs create tank/nodes/deb-j41-1

Check NFS sharing, etc. Make sure the any settings you configured are properly inherited.

# zfs get all tank/nodes/deb-j41-1

You can create a bunch of datasets in the target root. This allows you to selectively disable snapshots as well as modify zfs options individually. I opted for a simpler setup, but still created a tmp dataset. You could also include var/tmp, var/spool, var/log, etc.

Set up a file system.

# zfs create -o com.sun:auto-snapshot=false \
             -o setuid=off \
             tank/nodes/deb-j41-1/tmp
# chmod 1777 /srv/tank/nodes/deb-j41-1/tmp

2. Install Debian


Install the basic file system using debootstrap. If you are installing from a Debian host you could simply copy that host into the dataset.

# apt install --yes debootstrap
# debootstrap stretch /srv/tank/nodes/deb-j41-1/

Now we need the flash drive. I opted to use a UEFI boot process, so I partitioned my drive using gdisk.

# apt install --yes gdisk
# gdisk /dev/sdX
# ...create partitions. Hint: the partition type in gdisk for EFI is EF00.

After creating the partitions, format them.

# mkfs.fat -F32 /dev/sdX1
# mkfs.ext4 /dev/sdX2

This is what it looked like when I finished.

# lsblk -o name,size,uuid /dev/sdX

NAME    SIZE UUID
sdX    14.6G
├─sdX1  550M 8CHA-R1ID
└─sdX2 14.1G 1234what-ever-your-uuid-is5678901234

Now chroot into your new system. Mounting proc and dev like this works but it is recommended to reboot when you get a chance to make sure everything is clean. Be warned, while working on this I overwrote the host grub on two occasions despite being in chroot and naming the correct drives.

Warning:

Many of the following commands are executed in the chroot environment. If these commands are run on the host damage will occur. When you `chroot` make sure you are in the correct directory.


# cd /srv/tank/nodes/deb-j41-1
# mount /dev/sdX2 ./mnt
# mkdir -p ./mnt/boot/efi
# mount /dev/sdX1 mnt/boot/efi
# mount -t proc /proc proc/
# mount --rbind /sys sys/
# mount --rbind /dev dev/
# chroot . /bin/bash --login

3. Setup the client system


Edit /etc/apt/sources.list to look something like the following.

deb http://deb.debian.org/debian stretch main contrib non-free
deb http://deb.debian.org/debian-security/ stretch/updates main contrib non-free
deb http://deb.debian.org/debian stretch-updates main contrib non-free

Get your basic system setup

# apt update && apt upgrade
# dpkg-reconfigure tzdata
# apt install locales
# locale-gen en_US.UTF-8
# dpkg-reconfigure locales

Install the kernel

# apt search linux-image

Then install the kernel package of your choice using its package name. For example:

# apt install linux-image-4.9.0-8-amd64

If your underlying file-system is zfs, you are going to need this to configre grub--grub-install: error: failed to get canonical path of tank/nodes/deb-j41-1--and again when mounting our filesystem.

# dpkg-reconfigure spl-dkms
# apt install dpkg-dev zfs-dkms zfs-initramfs
# ln -s /bin/rm /usr/bin/rm
# modprobe zfs

So we can get into our server

# apt install openssh-server

Edit /etc/ssh/sshd_config and change "#PermitRootLogin prohibit-password" to

PermitRootLogin yes

Add root password for login

# passwd

Needed this for my Realtek NICs:

# apt install firmware-realtek

Install other network packages as needed

# apt install bridge-utils
# apt install vlan
# modprobe 8021q

Optional

Install some standard packages:

# tasksel install standard

Lastly, clean up /var/cache/apt/archives/

# apt clean

Now some network configuration:

# echo 'deb-j41-1' > /etc/hostname

Edit /etc/network/interfaces. Make sure there is no hot-swap or auto line and use the correct interface name. You can also use dhcp if you want. Just make sure that your use of dhcp or static and the interface name are both consistent with grub.cfg later.

iface enp2s0 inet static
    address 192.168.195.11
    netmask 255.255.255.0

Modify /etc/fstab:

/dev/nfs   /            nfs     tcp,nolock 0 0
proc       /proc        proc    defaults   0 0
none       /media       tmpfs   defaults   0 0
none       /var/run     tmpfs   defaults   0 0
none       /var/lock    tmpfs   defaults   0 0

# Persistent strorage on the flash drive
UUID=1234what-ever-your-uuid-is5678901234 /local ext4  noatime  0 0


Optional

Mount these in RAM rather than NFS. I kept mine on the NFS server. This uses the network but files and logs are persistent. A good option would be to mount /var/log in tmpfs and use a log server. If you have plenty of memory feel free to put tmp into ram. You can also cap the size of you tmpfs if you want to keep tmp from using too much.

none /tmp     tmpfs defaults 0 0
none /var/tmp tmpfs defaults 0 0
none /var/log tmpfs defaults 0 0

4. Make the boot files


Enable NFS in the initial ramdisk image configuration file by editing `/etc/initramfs-tools/initramfs.conf to add:

BOOT=nfs

Create the image and save it in the boot folder.

# mkinitramfs -d /etc/initramfs-tools -o /boot/initrd.img-4.9.0-8-amd64
# apt-get install grub-efi-amd64
# grub-install --target=x86_64-efi --recheck --removable --efi-directory=/mnt/boot/efi --boot-directory=/mnt/boot

Optional

Download iso images for alternative or rescue booting

# mkdir /mnt/boot/iso/
# cd /mnt/boot/iso/

Then download. For example:

# wget http://releases.ubuntu.com/18.04/ubuntu-18.04.1-desktop-amd64.iso

Save the old grub scripts

# cp -r /etc/grub.d /etc/grub.old

Get rid of the OS scripts

# rm /etc/grub.d/{1*,2*,3*}

Edit /etc/grub.d/40_custom and add something like the following. Don't erase the existing contents of the file.

Important

Make sure that the linux line is all one line.

# Make the boot location persistent by setting root by UUID
# alternatively, use the hdd(0,1) or similar notation if you don't plan to have any other storage.

insmod search_fs_uuid
search --no-floppy --set=root --fs-uuid 1234what-ever-your-uuid-is5678901234

menuentry "Debian deb-j41-1 4.9.0-8-amd64" {
   set client_ip='192.168.195.11'
   set server_ip='192.168.195.100'
   set gw_ip=''
   set netmask='255.255.255.0'
   set hostname='deb-j41-1'
   set domain='.caiuscorvus.net'
   set device='enp2s0'

   set server_root='/srv/tank/nodes/'

   linux /boot/vmlinuz-4.9.0-8-amd64 root=/dev/nfs ip=$client_ip:$server_ip:$gw_ip:$netmask:$hostname$domain:$device nfsroot=$server_ip:$server_root$hostname rw quiet
   initrd /boot/initrd.img-deb-today
}

menuentry "Ubuntu 18.04 (LTS) Live Desktop amd64" --class ubuntu {
   set isofile='/boot/iso/ubuntu-18.04.1-desktop-amd64.iso'
   loopback loop $isofile
   linux (loop)/casper/vmlinuz boot=casper img_dev=$root iso-scan/filename=$isofile quiet splash
   initrd (loop)/casper/initrd.lz
}

When you have finished modifiying the file, commit the changes to grub.cfg

# update-grub

Make sure all the files are where they are supposed to be. In particular, make sure the kernel, initrd, grub.cfg, and EFI are on the flash drive.

# cp -r /boot/* /mnt/boot/

# exit
# umount --recursive .

Insert the flash drive drive in the client machine. Test everything out, look around, then save your progress. If you built your root with extra datasets then make sure the snapshot is recursive (i.e. zfs snapshot -r ...). If you are only using a root dataset, then recursion isn't necessary.

nfsserver# zfs snapshot tank/nodes/deb-j41-1@today-debianinstalled

5. Install Proxmox

There are two ways to do this. One, you can connect the usb device to the server, chroot into your client, update, and copy the new boot files over as before. Two, you can just update it on the client device. I am opting for the latter just to demonstrate how kernel upgrades go when a remote root is involved.

The problem is that grub will be unable to find the canonical path to root so we will be unable to update-grub. This has two primary effects. The first is that the Proxmox installer will complain, repeatedly. As far as I can tell you can ignore this. The other problem is that we will need to manually move the kernel, create the initial ramdisk, and modify grub.cfg directly. Every grub tutorial and response on the internet says don't do the latter but they say don't do this because grub.cfg is overwritten every time you run update-grub--which happens whenever you install something that modifies your kernel. Since update-grub will not be able to run on the client we are fairly safe.

Warning

This means that whenever you add a package that you need reflected in the initrd you will need to make a new image and move it to your usb/boot. Examples of this include using zfs, bridges, or vlans at boot time and installing these packages after you last updated the initrd image.

However, when installing software while choorted on the server (with the flash drive connected) you will lose any modifications made to grub.cfg. So if you want to update-grub in the future, make sure any and all changes are reflected in the 40_custom file. Since the way I have installed grub you have to manually copy the config file to the flash drive, you have to really want to overwrite your config in order to lose it.

Furthermore, you will need to keep the EFI files up-to-date with later kernel updates. Failure to do so could result in an unbootable system. (So reads the Arch Wiki.) As for how to do this without being able to run grub-install.... So if you update the kernel and are unable to boot, I would recommend moving the flash drive (or another one) to the root server, mounting the EFI partition in the client's .../boot/efi/, and running grub-install while chrooted. Remember to copy your grub.cfg modifications to the 40_custom file first.

On the server


First, lets clone our Debian root. This keeps Debian as a bootable option while we install Proxmox. And if there are any problems with installation then no harm no foul.

Note

Creating a clone is not creating a copy. A ZFS clone uses the same blocks as the original dataset. This saves space but if you decide to keep both the clone and the origin for a long time they will diverge while still being inextricably linked. That is, you cannot destroy one without the other. So use clones when you are testing a new configuration or creating an ephemeral dataset--not when you want to create a persistent dataset.

nfsserver# zfs clone tank/nodes/deb-j41-1@today-debianinstalled tank/nodes/pve-j41-1

Cloning may not keep all the same options, so check and make sure sharenfs and other options are on:

nfsserver# zfs get all tank/nodes/pve-j41-1

On the Client

There are two ways to do this. You could mount the clone via NFS to /mnt and chroot into that environment. I am going to modify grub to load the new root, reboot, and install normally.

Configure grub to load the new root dataset

Make the changes inside the 40_custom section so you can easily copy them back to the 40_custom file if/when needed. I am adding the new one to the top (above the other menuentrys but below search --no-floppy...) because the first item is the default. (This is configurable before you update-grub by modifying /etc/default/grub or by finding the setting earlier in the file.) I retained the old entry so that we can still boot the Debian kernel if there are any issues.

Notice

We will use the same Debian kernel and initrd image for now.


Important

Make sure that the linux line is all one line.

Edit /local/boot/grub/grub.cfg and insert something like:

menuentry "Proxmox pve-j41-1 4.15.18-9-pve" {
   set client_ip='192.168.195.11'
   set server_ip='192.168.195.100'
   set gw_ip=''
   set netmask='255.255.255.0'
   set hostname='pve-j41-1'
   set domain='.caiuscorvus.net'
   set device='enp2s0'

   set server_root='/srv/tank/nodes/'

   linux /boot/vmlinuz-4.9.0-8-amd64 root=/dev/nfs ip=$client_ip:$server_ip:$gw_ip:$netmask:$hostname$domain:$device nfsroot=$server_ip:$server_root$hostname rw quiet
   initrd /boot/initrd.img-deb-today
}

Modify the hostname to match the new entry

# echo 'pve-j41-1' > /etc/hostname

Now reboot and you should be on the clone. If you want to confirm this modify a file and look for it on the server.

# touch /IAMHERE

Add an /etc/hosts entry for your IP address

127.0.0.1        localhost.localdomain localhost
192.168.195.11   pve-j41-1.caiuscorvus.net pve-j41-1 pvelocalhost

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

Modify the file /etc/kernel/postinst.d/zz-update-grub and comment out `exec update-grub`. This will help quiet some of the errors you get when you update your kernel. This may need to be repeated with future kernel updates.

###     exec update-grub

Add the Proxmox VE repository:

# echo "deb http://download.proxmox.com/debian/pve stretch pve-no-subscription" > /etc/apt/sources.list.d/pve-free-repo.list

Add the Proxmox VE repository key:

# wget http://download.proxmox.com/debian/proxmox-ve-release-5.x.gpg -O /etc/apt/trusted.gpg.d/proxmox-ve-release-5.x.gpg

Update your repository and system by running:

# apt update && apt dist-upgrade

Like before, let's look at the kernels and select one. This time seach for pve-kernel.

# apt search pve-kernel

Then install the kernel package of your choice using its package name when you install Proxmox. For example:

# apt install proxmox-ve pve-firmware pve-kernel-4.15.18-9-pve
# apt install postfix open-iscsi

Proxmox may add the enterprise repo. If you will be using the community version of proxmox feel free to remove it.

# rm /etc/apt/sources.list.d/pve-enterprise.list

Clean up

# apt remove os-prober
# apt clean

6. Prepare the New Boot Files


The grub issue leaves us without a pve initrd image. So, let's make one.

# update-initramfs -c -v -k 4.15.18-9-pve

Copy the new image and the new kernel to the flash drive

# cp /boot/*pve /local/boot/
# ls /local/boot

Lastly, we need to modify our grub entry to reflect the new kernel and initrd image. Just change the two lines in the Proxmox menuentry in /local/boot/grub/grub.cfg

linux /boot/vmlinuz-4.15.18-9-pve ...
initrd /boot/initrd.img-4.15.18-9-pve

Now reboot and if everything works feel free to take another snapshot on the server

nfsserver# zfs snapshot -r tank/nodes/pve-j41-1@today-proxmoxinstalled

Additionally, unless you really want to keep the debian system around, let's get rid of it. Be careful to not destroy the origin before you promote the clone.

nfsserver# zfs promote tank/nodes/pve-j41-1
nfsserver# zfs destroy -r tank/nodes/deb-j41-1

Another thing I did on the server was create a dataset with files I want to push to all Proxmox clients--like the hosts file.

This is the header I include in those files. Note the command I use to push updates.

# *** Warning! ***
# This file is updated on the root server. Changes made here will be
# overwritten by files updated there.
#
# Run the following command on the server to update all pve clients
#
# echo /srv/tank/nodes/pve*/etc/hosts | \
# xargs -n 1 cp -v /srv/tank/nodes/pve-common/hosts
#


7. Creating a second client

To make a new client, send|recv, setup a new the flash drive, update grub, copy files, and update files like hostname and postfix. For example:

# zfs send -RDp tank/nodes/pve-j41-1@today-proxmoxinstalled | \
   zfs recv tank/nodes/pve-j41-2

When creating the new usb, you can dd the whole thing or just copy the files. I prefer to keep the UUIDs the same so you don't have to modify any UUIDs in grub.cfg. You can do this when you format the new partitions:

# mkfs.fat -F32 -i 8CHAR1ID /dev/sdX1
# mkfs.ext4 -U 1234what-ever-your-uuid-is5678901234 /dev/sdX2
# mount /dev/sdX2 /mnt
# cp -r /srv/tank/nodes/pve-j41-2/boot /mnt

Modify /mnt/boot/grub/grub.cfg to reflect the new hostname and root

menuentry "Debian deb-j41-2 4.9.0-8-amd64" {
   set client_ip='192.168.195.12'
   set server_ip='192.168.195.100'
   set gw_ip=''
   set netmask='255.255.255.0'
   set hostname='deb-j41-2'
   set domain='.caiuscorvus.net'
   set device='enp2s0'

Note

you will have to pull the EFI directory from an existing USB or copy it from a client's flash drive to their nfs-mounted directories and grab it from there. The alternative would be to chroot into the new system and run grub-install

# umount /mnt
# mount /dev/sdX1 /mnt
# cp -r /???/EFI /mnt
# umount /mnt

To finish modify your hostname, postfix, and any other config files with the old hostname or ip. If you copied a client which was already in a cluster you will need to make a number of other changes.