Monthly Archives: February 2023

The Proxmox Container Saga

My experiences moving to Proxmox and using Proxmox containers

Summary

I’ve been using Linux for a while in my home lab.  In the last few years I have been using various flavours of virtualization.
I recently switched to Proxmox as my hypervisor, and have moved most of my services into separate VMs or containers.  Along the way some gotchas were encountered that forced the choice between VM and container.

The services I run

My “home lab” server runs a number of services, some of which are accessible externally:

  • 2 wordpress websites
  • A next cloud instance
  • An emby media server
  • An SSH server
  • A database
  • Self-hosted Jitsi meetings
  • Photo gallery (piwigo)
  • Various utility python scripts run from crontab
  • Free PBX
  • Hassio home assistant
  • iRedMail mail server
  • TrueNas Scale file server
  • An web proxy and Lets Encrypt client
  • A Subversion server

Server configurations

A brief history of my server configurations (somewhat simplified):

  • I started with Linux in 2005, running Red Hat on a 300MHz Compaq Deskpro SFF.
  • Then Dell Optiplex 170L running Fedora Core 3. ReiserFS filesystems on LVM
  • 2011 – 1GB RAM “Vanilla” PC running Fedora Core 10.  Two 1TB hard disks mirrored, + SSD.  Separate NAS with two 1TB hdds.
  • 2013 – IBM server with 5TB software raid. LVM.
  • 2013 – Dell poweredge 2950.
  • 2018 – Supermicro Xeon,  2TB SSD,  4 x 8TB WD red hdds.  64 GB RAM.  Ubuntu root on ZFS.
    • 2021 – added a Fujutsu Intel i7 with 32 GB RAM,  8TB HDD,  1TB SDD to act as a backup target.

Some history unpacked

I’d been running Ubuntu for a while (since about 2013),  and was familiar with managing it.  I started with all my services running on the host.  Then I added a few VMs to support things like the PBX and Mail services using packaged installations (FreePBX, iRedMail).

At some point,  I added docker,  and converted all of my services into docker stacks, so that my server did nothing except run docker and provide a zfs filesystem.   In retrospect,  some docker stacks were never meant to be.  An example being a complete mail server in a docker-compose stack of about 6 containers.  I had a lot of difficulty keeping my mail stack running across various upgrades, and it proves difficult to debug and correct complex interactions between docker networking and multiple containers when it is difficult to access and understand the internals of those containers.

In 2022,  I started a series of experiments to find a better server management solution.  Requirements: must be free,  must be fun to play with,  must be reliable,  must allow me to move my services to a backup server to minimise downtime.  I also wanted to revert some of my pathological docker-compose stacks to VMs.

I started with XCP-NG as a hypervisor.   I created a single VM to host docker,  and moved all my docker services into this VM.  Moving the services was relatively straightforward because each service had its own zfs dataset,  so a zfs snapshot plus zfs send/recv, and docker-compose down/up could move a service with minimal downtime.  I also added a VM for TrueNAS with passthrough of the 8 TB hard disks for TrueNAS to own.

I gave up on XCP-NG after a couple of months.  I didn’t like the fact it doesn’t understand ZFS in any useful sense.  It doesn’t do much or provide adequate tools to manage or protect the data it holds.  And it crashed way too easily.

Next I tried TrueNAS Scale as my hypervisor.   TrueNAS running on bare metal should give me the best performance as a file server.  And it supports virtual machines and containers, with a huge range of templates.  Unfortunately,  not docker stacks,  but LXC style linux containers.

The weaknesses in TrueNAS Scale I felt were its management of VMs. The gui doesn’t support all the options of the underlying QEMU/KVM.  This is true also for Proxmox (later),  but in the case of TrueNAS, it didn’t support stuff I really wanted it to support, related to hardware passthrough, there were workarounds, but they were clunky.  It’s a great file server, and a good hypervisor for simple VMs and pre-packaged containers.  But not quite good enough for my application.

So then I moved to Proxmox.  Proxmox is a much more open system than TrueNAS Scale.  Where the gui doesn’t support all the features of the underlying OS,  there is a command-line tool to perform that function (such as hardware passthrough of disks),  and really good documentation about how to use it.  It knows about ZFS and uses ZFS snapshotting and datasets intelligently.

What I did in Proxmox

I started in Proxmox by bringing up my VM with all the docker services in it, and adding a VM for TrueNAS Scale – effectively replicating the configuration in XCP-NG.

Proxmox can also move a running VM or container from one server to another with minimal downtime (a few seconds).  I used this in various stages of setting up,  but it’s not something I use day to day.  It was not something I considered valuable before using it,   but it has proved to be useful.

I wanted to experiment with Proxmox containers (CT). I started off with a template container for nextcloud.  Installation was easy.  Because my nextcloud usage involves SMB/CIFS external storage,  which is handled by nextcloud internally, I didn’t need to make any changes to the container to provide access to my filestore.  Setup was easy, and nextcloud provides a number of useful services to me – including moving my contacts database from my google account to my nextcloud server.   CardDAV access from Thunderbird is much nicer (no unexplained behaviour).

Moving the CTs

Then I went on a campaign (or was it a rampage?) to move *all* my remaining docker-compose stacks into CTs.

Why?  Well,  firstly because I’m an inveterate tinkerer, and the tinkerability quotient of a new (to me) technology was high.  Secondly, because containers have a lower impact (CPU, memory, disk) than the equivalent CT.   I also wanted to break up my monolithic (from the hypervisor’s point of view) VM containing all the docker stacks, so that individual services could be managed at the hypervisor level.

I started by moving a couple of wordpress websites into CTs.   There is a wordpress CT template, to make this easier.  So, using the migration tools available in WordPress plugins (e.g. All-in-one migration, and Better Search and Replace),  I was able to export the site from my docker-compose stack implementation,  and import the same into the CT pretty easily.

My services VM included a proxy manager (NPM).  When the WordPress site was a docker-stack, the NPM used a dedicated docker network to communicate privately with the wordpress instance.  In the new configuration,  I had to create a DNS naming convention for service stacks.   So xxx.chezstephens.org.uk (how it is named externally) became xxx.lan on the internal lan,  and the external xxx.chezstephens.org.uk was passed by NPM to xxx.lan.   I could choose to go through the NPM or not by using one of these to addresses on an internal browser.

There are relatively few templates matching my services.  I had used nextcloud and wordpress templates,  but there were none for the other services.  So I decided to create a CT template which was based on the debian 11 template, with support for docker and docker-compose added.   My docker-compose stacks would be moved as a docker-compose stack from inside a VM to inside a CT.

But isn’t this just as bad as operating in a VM?  Doesn’t performance take a hit?   Well,  no and yes in that order.   As a docker-stack in a VM, there are two levels of virtualization.  As a docker stack in a CT, there are also two levels of virtualization, but one of them is less resource heavy.  So the overall outcome is a net gain.

Moving Emby & Mountpoint woes

So I transferred the emby (a media server) docker-compose stack into a dedicated CT running docker and docker-compose.

This is where my problems started.   You have to understand that a CT doesn’t have independent mount points.  My VM did.  And I needed to mount my media collections from my fileserver into the emby stack.   Previously, in the VM, I had a docker volume that did an NFS mount.  So, how to replicate in a CT?

The first option is to mount the resource in the Proxmox host (e.g., via fstab) and then do a bind mount into the container.  This kind-of works.  It has the advantage of not requiring any privilege in the container.  But it has the disadvantage of requiring coupling (dependence) between the hypervisor and the NFS server.  In my case the NFS server was running in a VM on the hypervisor, and therefore dependent on it.  We have a circular depdendency between the two which is bad both in theory and practice.  It is bad in theory because it just is,  as every software engineer will tell you.   I was bad in practice because when I halted proxmox, it would hang forever or for a long timeout trying to unmount the NFS mount,  having already stopped the VM that was serving it.

The second option is to make the CT privileged and permit it to do the NFS mount itself.  This kind-of-works too.  But the existence of the NFS mount in the CT makes the proxmox hypervisor unreliable.   After a day of operation, the server wouldn’t complete a “df” (disk free) operation, and a clone operation from the GUI hung.  It appears that, while NFS mounts from a privileged container are explicitly supported by proxmox,  they are also capable of breaking the operating system.  There is plenty of evidence online that this is a bad thing to do.

So, having discovered that NFS mounts are dangerous in practice for a CT, what next?   I could mount the media resources using SMB/CIFS instead of NFS from a privileged container.   The appears to work reliably, even though there are the odd console log messages about duplicate cookies.

I ended up with a functional emby CT, having determined that CIFS mounts were adequate for it.  I proceeded to attempt the same thing with my remaining services.

Services needing to be in a VM – cannot use an SMB mount

Some services just wouldn’t work properly with a CIFS mount.   These are piwigo, npm and a certificate update script.

Piwigo is a media gallery.  I use this to provide my family with access to our photos.  The site is public,  it’s just not very interesting to any third party.  There is a docker-compose stack for it.   Moving stacks to a new server is slightly awkward,  as it involves setting up the mysql server with user, database, permissions,  and exporting and importing a database dump.   I did manage to make this work,  but performance was poor.   There was a startup delay every time the docker-stack was started when it did a “change owner” (chown) to each of the 40,000 photos in my collection across the SMB mount.   Took it half an hour.   Also,  its access to the photos seemed very slow,  as was its generation of thumbnails.

So I created a VM for it based on debian 11,  and followed the various instructions online,  installing nginx, php, php-fpm, a bunch of php modules, mariadb, piwigo and then the same import steps as above.   The benefit of the docker -compose stack is it saves you all that setup cost.   Anyhow,  I was able to get the VM running with the media accessed via an NFS mount.  Everything ran a lot faster and that’s how I left it.

I was very happy with my proxy manager (Nginx proxy manager – NPM),  which is available only as a docker stack.  So I tried this in a CT, with a SMB mount to the directory where it stores the certificates,  to make them accessible for other services (such as mail).  This didn’t work.  NPM includes a letsencrypt module that expects to see a certain file structure that includes links from live certificates into an archive of certificates.  That structure can be mounted OK with NFS (but we can’t use NFS in a CT).  It cannot be mounted using SMB, because the SMB server flattens the links before serving them.  So letsencrypt throws a wobbly and won’t renew certificates.

Another issue is that letsencrypt creates private key files that are owned by root and readable only by root.  An SMB mount can mount any user (at the server) into root (at the client).  So the client can write files it things are root-only readable.  The server’s view is different,  it sees the file as owned by some other uid (can’t SMB mount as a root user).  Clients of the certificates,  such as my mail server,  expect to see root-owned and root-only-readable private keys.

So I had to move my NPM into a docker-compose stack in a VM with an NFS mount of the letsencrypt file hierarchy.   It’s working just fine there.   Having it in a VM probably adds 300MB RAM and 1 GB disk usage on top of the CT.   This additional overhead is nothing in the context of my server.

So there we leave it.  I have 8 CTs and 4VMs instead of the starting-point 2 VMs.

What will I find to tinker with next?