Systemd in a container will call sd_notify when it has finished
booting, so we can use that to signal that the container is
ready. This does require some fiddling with $NOTIFY_SOCKET.
Previously "machinectl reboot/poweroff" brutally killed the container,
as did "systemctl stop/restart". And reboot didn't actually work. Now
everything is fine.
By setting a line like
MACVLANS="eno1"
in /etc/containers/<name>.conf, the container will get an Ethernet
interface named mv-eno1, which represents an additional MAC address on
the physical eno1 interface. Thus the container has direct access to
the physical network. You can specify multiple interfaces in MACVLANS.
Unfortunately, you can't do this with wireless interfaces.
Note that dhcpcd is disabled in containers by default, so you'll
probably want to set
networking.useDHCP = true;
in the container, or configure a static IP address.
To do: add a containers.* option for this, and a flag for
"nixos-container create".
Using pkgs.lib on the spine of module evaluation is problematic
because the pkgs argument depends on the result of module
evaluation. To prevent an infinite recursion, pkgs and some of the
modules are evaluated twice, which is inefficient. Using ‘with lib’
prevents this problem.
The command nixos-container can now create containers. For instance,
the following creates and starts a container named ‘database’:
$ nixos-container create database
The configuration of the container is stored in
/var/lib/containers/<name>/etc/nixos/configuration.nix. After editing
the configuration, you can make the changes take effect by doing
$ nixos-container update database
The container can also be destroyed:
$ nixos-container destroy database
Containers are now executed using a template unit,
‘container@.service’, so the unit in this example would be
‘container@database.service’.
For example, the following sets up a container named ‘foo’. The
container will have a single network interface eth0, with IP address
10.231.136.2. The host will have an interface c-foo with IP address
10.231.136.1.
systemd.containers.foo =
{ privateNetwork = true;
hostAddress = "10.231.136.1";
localAddress = "10.231.136.2";
config =
{ services.openssh.enable = true; };
};
With ‘privateNetwork = true’, the container has the CAP_NET_ADMIN
capability, allowing it to do arbitrary network configuration, such as
setting up firewall rules. This is secure because it cannot touch the
interfaces of the host.
The helper program ‘run-in-netns’ is needed at the moment because ‘ip
netns exec’ doesn't quite do the right thing (it remounts /sys without
bind-mounting the original /sys/fs/cgroups).
These are stored on the host in
/nix/var/nix/{profiles,gcroots}/per-container/<container-name> to
ensure that container profiles/roots are not garbage-collected.
On the host, you can run
$ socat unix:<path-to-container>/var/lib/login.socket -,echo=0,raw
to get a login prompt. So this allows logging in even if the
container has no SSH access enabled.
You can also do
$ socat unix:<path-to-container>/var/lib/root-shell.socket -
to get a plain root shell. (This socket is only accessible by root,
obviously.) This makes it easy to execute commands in the container,
e.g.
$ echo reboot | socat unix:<path-to-container>/var/lib/root-shell.socket -
It is parameterized by a function that takes a name and evaluates to the
option type for the attribute of that name. Together with
submoduleWithExtraArgs, this subsumes nixosSubmodule.
You can now say:
systemd.containers.foo.config =
{ services.openssh.enable = true;
services.openssh.ports = [ 2022 ];
users.extraUsers.root.openssh.authorizedKeys.keys = [ "ssh-dss ..." ];
};
which defines a NixOS instance with the given configuration running
inside a lightweight container.
You can also manage the configuration of the container independently
from the host:
systemd.containers.foo.path = "/nix/var/nix/profiles/containers/foo";
where "path" is a NixOS system profile. It can be created/updated by
doing:
$ nix-env --set -p /nix/var/nix/profiles/containers/foo \
-f '<nixos>' -A system -I nixos-config=foo.nix
The container configuration (foo.nix) should define
boot.isContainer = true;
to optimise away the building of a kernel and initrd. This is done
automatically when using the "config" route.
On the host, a lightweight container appears as the service
"container-<name>.service". The container is like a regular NixOS
(virtual) machine, except that it doesn't have its own kernel. It has
its own root file system (by default /var/lib/containers/<name>), but
shares the Nix store of the host (as a read-only bind mount). It also
has access to the network devices of the host.
Currently, if the configuration of the container changes, running
"nixos-rebuild switch" on the host will cause the container to be
rebooted. In the future we may want to send some message to the
container so that it can activate the new container configuration
without rebooting.
Containers are not perfectly isolated yet. In particular, the host's
/sys/fs/cgroup is mounted (writable!) in the guest.