Dividing a single machine into multiple different Linux environments (e.g. private and work related) can be challenging. Especially when dealing with the usual approaches such as dual-booting and virtualization. In this post I will describe a different approach in which the user can run multiple graphical environments without sacrificing convenience (as with dual-booting) or performance (as with virtualization or local VNC connections).
Introduction
This specific setup has been achieved on Arch Linux host with either Gentoo or Arch Linux as the containerized system. The configuration process should be fairly similar for any other distribution and for various other system approaches.
The target machine contains an 500G NVMe disk which has been partitioned into two GPT partitions:
-
EFI System around 2G in size for the kernel and UEFI bootloader
-
LUKS on LVM partition which spans the remaining disk space
- LVM swap partition, 16G in size
- LUKS-encrypted work rootfs partition, around 230G in size
- LUKS-encrypted private rootfs partition, remaining partition space
The LightDM display manager is automatically started when booting both Linux systems and is used to login into the i3 window manager environment.
Running a container using systemd nspawn
It is relatively trivial to create and start a Linux container. The underlying technology, Linux namespaces, has allowed the proliferation of various containerization methodologies such as Docker, Kubernetes, Linux Containers, and others. However, for the purpose of containerizing an entire root filesystem we may choose between Linux Containers (LXC) and systemd-nspawn.
Given that systemd is the init system of choice on Arch Linux and given that it provides a containerization mechanism by default — the systemd-nspawn(1) tool — it will be used as the containerization tool of choice in this post.
We begin by decrypting the LUKS partition and mounting the underlying
filesystem below the /var/lib/machines
namespace:
~ # cryptsetup open /dev/vgstrthinpad/work rootfs-work
Enter passphrase for /dev/vgstrthinpad/work:
~ # mount /dev/mapper/rootfs-work /var/lib/machines/work
A systemd instalation usually provides the systemd-nspawn@.service
template
unit which can be used to run containers as background services. Since it is a
template unit, starting the systemd-nspawn@work
service would indeed create a
new container based on the contents of /var/lib/machines/work
. However, the
default nspawn configuration performs certain destructive operations and
is therefore necessary to configure nspawn appropriately.
Therefore, we create the /etc/systemd/nspawn
directory (if it does not exist)
containing the following
systemd.nspawn(5)
configuration file:
[Exec]
Boot=yes
Parameters="systemd.legacy_systemd_cgroup_controller=0 systemd.mask=docker.service"
Ephemeral=no
ProcessTwo=no
PrivateUsers=no
Capability=all
SystemCallFilter=add_key keyctl openat
Hostname=strthinpad
[Files]
ReadOnly=no
Volatile=no
Bind=/sys/fs/cgroup/unified
[Network]
Private=yes
VirtualEthernet=yes
As mentioned, the PrivateUsers
option and its systemd-nspawn(1) equivalent
--private-users=
are destructive operation when combined with
--private-users-chown
. This is the default operation of the
systemd-nspawn@.service
unit file. Although the operation is destructive,
there is a way for it to be undone as explained in the manpages:
--private-users-chown If specified, all files and directories in the container's directory tree will be adjusted so that they are owned by the appropriate UIDs/GIDs selected for the container (see above). This operation is potentially expensive, as it involves iterating through the full directory tree of the container. Besides actual file ownership, file ACLs are adjusted as well. This option is implied if --private-users=pick is used. This option has no effect if user namespacing is not used. -U Note: it is possible to undo the effect of --private-users-chown (or -U) on the file system by redoing the operation with the first UID of 0: systemd-nspawn ... --private-users=0 --private-users-chown
Note that we have configured both Bind=/sys/fs/cgroup/unified
bind mount and
systemd.legacy_systemd_cgroup_controller=0
init process parameter. This is a
recent development stemming from the fact that systemd v233 release had
introduced a new hybrid
control group
mode. These options
directly influence whether systemd
init process of the container will be able
to successfully start. Here we concern ourselves exclusively with default
system configuration i.e. “unified” cgroups-v2 mode.
To actually start the containerized system we use the systemctl(1) tool:
~ # systemctl start systemd-nspawn@work
~ # systemctl status systemd-nspawn@work
● systemd-nspawn@work.service - Container work
Loaded: loaded (/usr/lib/systemd/system/systemd-nspawn@.service; disabled; vendor preset: disabled)
Active: active (running) since Tue 2021-03-16 17:22:28 CET; 9s ago
Docs: man:systemd-nspawn(1)
Main PID: 11023 (systemd-nspawn)
Status: "Container running: Startup finished in 254ms."
Tasks: 21 (limit: 16384)
Memory: 41.3M
CGroup: /machine.slice/systemd-nspawn@work.service
├─payload
│ ├─init.scope
│ │ └─11025 /usr/lib/systemd/systemd systemd.legacy_systemd_cgroup_controller=0 systemd.mask=docker.service
Since it is a registered container, we may use the machinectl(1) tool to open a shell into the system:
nyqcd@private ~ $ sudo machinectl shell nyqcd@work /bin/bash
Connected to machine work. Press ^] three times within 1s to exit session.
[nyqcd@work ~]$
Excellent, now we can move over to configuring the target container system.
Running X11 on different TTY
The LightDM instance starting inside the container attempts to spawn an X11
process onto the vt7
virtual console:
root@work ~ # lightdm --debug
...
[+0.00s] DEBUG: Using VT 7
[+0.00s] DEBUG: Seat seat0: Starting local X display on VT 7
[+0.00s] DEBUG: XServer 0: Logging to /var/log/lightdm/x-0.log
[+0.00s] DEBUG: XServer 0: Writing X server authority to /run/lightdm/root/:0
[+0.00s] DEBUG: XServer 0: Launching X Server
[+0.00s] DEBUG: Launching process 314: /usr/bin/X :0 -seat seat0 -auth /run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch
[+0.00s] DEBUG: XServer 0: Waiting for ready signal from X server :0
...
However, the tty7
console is not only inacessible due to nspawn configuration
above, it is also used by the X11 server on the host system. To resolve
this issue, we simply instruct LightDM to spawn X11 on another virtual
terminal, in this case tty8
, by changing the /etc/lightdm/lightdm.conf
file:
#
# General configuration
#
...
[LightDM]
#start-default-seat=true
#greeter-user=lightdm
#minimum-display-number=0
minimum-vt=8 # Setting this to a value < 7 implies security issues, see FS#46799
...
To actually allow container access to the /dev/tty8
virtual device, we must
stop the container and make several changes to the nspawn mechanism. First, we
must allow the systemd-nspawn
process to access the virtual device by
creating an /etc/systemd/system/systemd-nspawn@work.service.d/override.conf
file with the following contents:
[Service]
Environment=SYSTEMD_NSPAWN_USE_CGNS=0
DevicePolicy=auto
DeviceAllow=/dev/tty8 rwm
DeviceAllow=/dev/dri/card0 rwm
DeviceAllow=char-drm rwm
Second, the nspawn configuration file at /etc/systemd/nspawn/work.nspawn
must
be changed to expose the relevant tty8
device using bind mounts:
diff --git a/etc/systemd/nspawn/work.nspawn b/etc/systemd/nspawn/work.nspawn
index c4a9e06..85b7b39 100644
--- a/etc/systemd/nspawn/work.nspawn
+++ b/etc/systemd/nspawn/work.nspawn
@@ -16,6 +16,10 @@ ReadOnly=no
Volatile=no
Bind=/sys/fs/cgroup/unified
+Bind=/dev/tty8
+Bind=/dev/fb0
+Bind=/dev/dri
+
[Network]
Private=yes
VirtualEthernet=yes
Starting the container should now spawn a new X11 environment and immediately switch to it. Hovever, keyboard and mouse have not been configured yet so it would not be possible to interact with the system in this state. Moreover, switching back to the host system graphical interface (or any other virtual terminal) would be similarly difficult.
Enabling keyboard and mouse input
Although a keyboard is usually used for switching between virtual terminals,
this can also be achieved on the command line using the
chvt(1) tool. Obviously, in case keyboard
input on the host machine is currently not available, the chvt
command should
be used via remote shell access like OpenSSH. For example, to switch back to
the host system graphical interface invoke the following command:
~ # chvt 7
In order to allow keyboard input inside the container X11 display session modifications to both the nspawn configuration file and the unit drop-in configuration file should be made. The drop-in file must be changed as follows:
diff --git a/etc/systemd/system/systemd-nspawn@work.service.d/override.conf b/etc/systemd/system/systemd-nspawn@work.service.d/override.conf
index 52a01e4..4f97e5a 100644
--- a/etc/systemd/system/systemd-nspawn@work.service.d/override.conf
+++ b/etc/systemd/system/systemd-nspawn@work.service.d/override.conf
@@ -4,3 +4,4 @@ DevicePolicy=auto
DeviceAllow=/dev/tty8 rwm
DeviceAllow=/dev/dri/card0 rwm
DeviceAllow=char-drm rwm
+DeviceAllow=char-input rwm
Intuitively, we must also expose the /dev/input
device inside the container.
However, the system is usually made aware of various input devices once they
are scanned by udev(7).
Therefore we must make the following modifications to the nspawn configuration:
diff --git a/etc/systemd/nspawn/work.nspawn b/etc/systemd/nspawn/work.nspawn
index bec6a3c..5395d5d 100644
--- a/etc/systemd/nspawn/work.nspawn
+++ b/etc/systemd/nspawn/work.nspawn
@@ -20,6 +20,9 @@ Bind=/dev/tty8
Bind=/dev/fb0
Bind=/dev/dri
+BindReadOnly=/run/udev/data
+Bind=/dev/input
+
[Network]
Private=yes
VirtualEthernet=yes
Now both keyboard and mouse input should be enabled in the container graphical interface.
Enabling audio in the container using PulseAudio
Access to audio peripherals inside the network namespace would be quite challenging since they are used by the host system. However, if PulseAudio is used on both the host and container system it is fairly trivial to connect the container PulseAudio stack into PulseAudio server on the host. This approach is not only easier from a configurational standpoint, it also allows transparent passthrough of e.g. bluetooth microphone input to container applications such as web browsers.
Network setup
Network should automatically be configured by systemd both in the host network namespace and the container network namespace.
Conclusion
Running multiple X11 servers as shown is indeed quite easy. I was taken by surprise just how simple it is to pass along keyboard and mouse input onto the X11 display of the container. Switching between the environments seems to be quite convenient!
This setup hasn’t shown to be problematic, although some bugs are quite interesting:
- Upon keyboard and mouse dis- and re-connect while in container display
environment, no further input will be possible until
chvt
is used.
Further experiments should be with regards to Wayland and more security-oriented setups.
2020-04-13 update issue
An update broke this configuration recently:
systemd-nspawn[156347]: Failed to stat /sys/fs/cgroup/unified: No such file or directory
systemd[1]: systemd-nspawn@work.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: systemd-nspawn@work.service: Failed with result 'exit-code'.
systemd[1]: Failed to start Container work.
systemd[1]: Starting Container work...
systemd[1]: Started Container work.
As mentioned previously, this can be resolved by replacing the unified
portion of the cgroups bind:
diff --git a/etc/systemd/nspawn/work.nspawn b/etc/systemd/nspawn/work.nspawn
index 6ad7865..0420878 100644
--- a/etc/systemd/nspawn/work.nspawn
+++ b/etc/systemd/nspawn/work.nspawn
@@ -14,7 +14,7 @@ Hostname=strthinpad
[Files]
ReadOnly=no
Volatile=no
-Bind=/sys/fs/cgroup/unified
+Bind=/sys/fs/cgroup
Bind=/dev/tty8
Bind=/dev/fb0