Monday, February 15, 2016

Adventures in Containerization - Part 2: Unprivileged LXC Containers in Fedora 23 (And Probably Some Other Distros Too)

Hey everyone - Last week, we talked about what containers are and why you might want to learn about them.  Today, we will take a look at getting LXC installed and a container spun up.  When I started writing this entry, I tried to get things running on a fully updated CentOS 7 VM.  Unfortunately, to get unprivileged containers working, we needed a few things that CentOS 7 does not have, including a new enough version of shadow-utils.  Because we need to map user IDs and group IDs in the container to user and group IDs on the system, we need a version of shadow-utils that can let us add subuids to our user.  The version of shadow-utils that is available for CentOS 7 as of this writing (4.1.5.1-18.el7) does not support subuids (we need 4.2.1).  There are a number of other dependencies (including a newer kernel version - at least 3.13), and unfortunately, as of this writing not too many distributions have everything that we need. We could build all of the necessary dependencies from source, but I wanted to keep that to a minimum, and there are other packages we will need to build from source later (see below).

One of the rolling release distros like Arch, OpenSuSE Tumbleweed, or Fedora Rawhide might do the trick.  Today, I am going to explore unprivileged LXC containers with Fedora 23.  There are a few tutorials on how to get this working in Ubuntu (like this one), which we will draw from.  Most of the tutorials I have seen say to use Ubuntu because that is ready to go, but I wanted to try something different.  Maybe this will help someone who is used to RedHat / Fedora.

Most of these steps should work on any Linux distro with new enough packages:
  • LXC >= 1.0 (we are running 1.1.5)
  • Linux Kernel >= 3.13 (we are running 4.3.3)
  • Shadow-Utils >= 4.2.1 (we are running 4.2.1)
After getting everything installed, the lxc commands should be the same on any Linux you decide to run this on.  More after the jump...

To get lxc installed on your system, you can install the LXC package from your distribution's package manager.  I started with a default installation of Fedora 23 Server Edition.  After updating everything with dnf, you can install LXC.


After that is installed, you can install LXC:

sudo dnf install lxc lxc-templates


lxc-templates is a collection of default LXC templates that we can use to deploy containers from.  Once you get them installed, you can look in /usr/share/lxc/templates to see what templates are available:





















We will build a Debian container, but before we get started, we want to make sure the kernel has all of the support that we need.  We can do this with the lxc-checkconfig command:














































Everything looks good.

We want to create an unprivileged container so that we do not have to run the container as root.  In case an attacker is able to compromise the container and break out of it, we do not want him or her to have root access on the host.

To do that, as a user, we will run the following:
lxc-create -t debian -n DebLXCTest


lxc-create is the command to create a container.  -t specifies the template we want to use (we saw the list above) and -n allows us to specify a name for the container.

Mapping Container Users to System Users

Regardless of the system you are using, you will probably run into an error saying that your system is not configured with subuids.  That is because we have not set up a mapping of users in the container to users on the system.  We are going to reserve a range of subordinate user IDs (subids) and subordinate group IDs (subgids) for our containers to use.  Because we are not going to have many containers or users on this system, we will use the range 65537 to 131073.  We want to be outside of the range of available UIDs and GIDs on our system, and we want a full range available to the containers.  Before we create the mapping, we need to let the system know which subordinate user IDs and group IDs are available to our user.  To do this, we have to create two files as root: /etc/subuid and /etc/subgid.  According to the man page for subuid, the file will look like this:

username:first subordinate UID:number of subordinate UIDs


For subordinate group IDs, the file is similar.  In our case, we will define it this way:
containeruser:65537:65536


Substitute containeruser for the username of the user that will be running the containers, and subsitute the first subuid and number of subuids that I have if you want something different.

Make a file with the same line and save it as /etc/subgid.

Now we will assign this range to our user:


sudo usermod --add-subuids 65537-131073 $USER
sudo usermod --add-subgids 65537-131073 $USER

These two commands will add the specified subuids and subgids to your user.  Bash (if you are using bash) will substitute your username for $USER when you run the command.  Make sure the numbers you specify here match the range in /etc/subuid and /etc/subgid that you created.

Networking

Next, we are going to get networking for our containers set up.  We are going to set up a bridge that lets us talk to the containers (and gives the containers a network to talk amongst themselves).  To do this we need two packages: dnsmasq (to do DHCP for the containers) and bridge-utils (to set up the bridges).

You can install those with dnf:

sudo dnf install dnsmasq bridge-utils

On my installation, they were already there, but just in case they are not, the command above will add them.

Now, we need a systemd unit file to create the bridge when the computer boots up.  Credit for these unit files goes to this link.  So create /etc/systemd/system/lxc-net.service, and put the following in it:


[Unit]
Description=Bridge Interface for LXC Containers

[Service]
Type=oneshot

# Use brctl to make the bridge
ExecStart=/sbin/brctl addbr lxcbr0

# Give the bridge an IP address space.  You can change the subnet if you like
# We will use 10.0.1.1/24
ExecStart=/sbin/ip address add 10.0.1.1/24 dev lxcbr0

# Bring the bridge up
ExecStart=/sbin/ip link set lxcbr0 up

RemainAfterExit=yes

# Do these commands when we need to tear the interface down
# When we shut the computer down or reboot for example
ExecStop=/sbin/ip link set lxcbr0 down

# Delete the bridge
ExecStop=/sbin/brctl delbr lxcbr0


Since we want DHCP for our containers, we need another systemd unit file to get that set up.  Create the following as /etc/systemd/system/lxc-dhcp.service.


[Unit]
# This unit file depends on the bridge (lxcbr0) being up.
# If you called your bridge something different, then
# substitute the correct unit file here
Requires=lxc-net.service
Requires=sys-devices-virtual-net-lxcbr0.device

# Make sure this unit file starts after lxc-net
After=sys-devices-virtual-net-lxcbr0.device

# Substitute your LXC bridge's IP information
# for the listen address (the address that will
# act as the DHCP server) and the DHCP range
[Service]
ExecStart=/sbin/dnsmasq \
                    --dhcp-leasefile=/var/run/lxc-dhcp.leases \
                    --user=nobody \
                    --group=nobody \
                    --keep-in-foreground \
                    --listen-address=10.0.1.1 \
                    --except-interface=lo \
                    --bind-interfaces \
                    --dhcp-range=10.0.1.100,10.0.1.254

# Start with the default run level of the machine
[Install]
WantedBy=default.target

Now enable lxc-dhcp which will pull in lxc-net:

sudo systemctl enable lxc-dhcp.service
sudo systemctl start lxc-dhcp.service
Now we will verify that we have a bridge set up:

ip link
ip addr



The last thing we need to do for networking is set up the firewall rules to allow traffic to and from our containers.

For that, we will add a new 'LXC' zone to our firewall and allow traffic out.  For your use case, you will need to further lock things down.

sudo firewall-cmd --permanent --new-zone=lxc
sudo firewall-cmd --permanent --zone=lxc --add-interface=lxcbr0
sudo firewall-cmd --permanent --zone=lxc --set-target=ACCEPT

Now we need to enable NAT (masquerading) on the host's interface.  This is done by adding masquerading to the zone with that interface.  To find that out, we can see which zones are active:
sudo firewall-cmd --get-active-zones


Our active zone is called FedoraServer, so we will add masquerading to that:
sudo firewall-cmd --permanent --zone=FedoraServer --add-masquerade


Now reload the firewall:
sudo firewall-cmd --reload


The final thing we have to do for networking is enable forwarding in the kernel.  We do this via sysctl.

Edit the file /etc/sysctl.d/99_sysctl.conf and add the following:
net.ipv4.ip_forward = 1


Reload with:
sudo sysctl --system

We want to enable networking for our containers, so we will tell LXC to allow our user to create interfaces on our bridge.  We do this using by creating the file /etc/lxc/lxc-usernet and adding the following line:

containeruser veth lxcbr0 10


Substitute the user name I am using (containeruser) with the user name you are using.  veth tells LXC we are going to use virtual ethernet interfaces.  The bridge we created is lxcbr0, and we will create ten virtual interfaces on the bridge.

Networking is done!

Cgroup Management

This next part is a bit tougher.  Since support for unprivileged containers is so new, not every distribution has the tools necessary to make it work.  To complicate things, systems using systemd need special tools to manage things like cgroups (since systemd controls cgroups).  We will need to install two packages: lxcfs (which makes storage cgroup aware) and cgmanager (which allows for better management of cgroups).  As of this writing, neither of those packages are available from any Fedora repository that I could find.  Therefore, we need to build them from source.  I normally do not like to build things from source that the system will depend on because I like to have the convenience of a package manager to keep the packages up to date for me.  Hopefully soon, these packages will be available in other distributions (they are available in Arch and Ubuntu).

To get started, we need to install the build tools so we can build the packages.  If this box is going to be exposed to the internet or some other untrusted network, get rid of these tools when we are done.  We do not want to give an attacker a means of compiling arbitrary code on the box.  Sure, they could get root and install the tools, but hopefully you would notice that.

sudo dnf install make automake gcc gcc-c++


For LXCFS, we have the following dependencies which also need to be installed:
sudo dnf install fuse fuse-dev


Download the source code from here.  I am using 0.16.  Extract the source code using tar:


tar xvf lxcfs-0.16.tar.gz
cd lxcfs-0.16
./configure
make
sudo make install

Next, we need to install cgmanager.  That has a few dependencies of its own.  We need to build one of them from source (libnih 1.0.3).  Fedora ships with libnih 1.0.2, but cgmanager needs 1.0.3.

To do this, we need to get a few more dependencies:
sudo dnf install dbus-devel expat-devel pam-devel


Then, we take the source code for libnih from here, and we do something similar to what we had to do for lxcfs:

tar xvf libnih-1.0.3.tar.gz
cd libnih-1.0.3
./configure
make
sudo make install

Unfortunately, libnih is put in a place that is different from where Fedora puts it.  I tried changing the prefix for the configure script, but the source code complained that it had to be installed in /usr/local/lib.  Instead of fighting it, we will tell cgmanager where the nih libraries live when we make it.  Download the cgmanager source code from here, and build it similarly:

tar xvf cgmanager-0.3.9.tar.gz
cd cgmanager-0.3.9
NIH_LIBS=/usr/local/lib/libnih.so NIH_CFLAGS="-march -O2 -pipe" NIH_DBUS_LIBS=/usr/local/lib/libnih-dbus.so NIH_DBUS_CFLAGS="-march -O2 -pipe" ./configure
make
sudo make install

Once this is done, to run cgmanager, we have to make sure it knows where the libraries it needs are (specifically nih and nih-dbus).  We will create symbolic links to them:

sudo ln -s /usr/local/lib/libnih.so.1.0.0 /lib/libnih.so.1
sudo ln -s /usr/local/lib/libnih.so.1.0.0 /lib64/libnih.so.1
sudo ln -s /usr/local/lib/libnih-dbus.so.1.0.0 /lib/libnih-dbus.so.1
sudo ln -s /usr/local/lib/libnih-dbus.so.1.0.0 /lib64/libnih-dbus.so.1

This will give you a systemd unit file that we want to enable but we have to fix the path it is pointing to.  The unit file is in:
/usr/local/lib/systemd/system/cgmanager.service

On my system, cgmanager was installed to /usr/local/sbin/cgmanager.  The systemd unit file is pointing at /sbin/cgmanager.  You can either make a symbolic link in /sbin/cgmanager or change the unit file to point at /usr/local/sbin/cgmanager.  I changed the unit file.

Then enable and start the service:

sudo systemctl enable cgmanager
sudo systemctl start cgmanager

Make sure it is running:
systemctl status cgmanager

Now we need to make the cgroups for our user using cgmanager (cgm):

# Create all cgroups for our user
sudo /usr/local/bin/cgm create all $USER

# Give our user ownership of the cgroups we just created
sudo /usr/local/bin/cgm chown all $USER $(id -u $USER) $(id -g $USER)

# Move those cgroups to our login shell ($$ is the PID of the process that we are running this command from)
# Notice this command is not run as root
/usr/local/bin/cgm movepid all $USER $$

Unfortunately, I have to run this every time a container is started.  Does anyone have a solution for that?  I tried freezing the configuration with cgm (see the bottom of the man page here), but that did not work.

Now, let's get the user set up.  We need to create analogous files and directories for the LXC configuration files that are specific to our user.  We need to create at least the following:
  • Directory: ~/.config/lxc (where all of this user's LXC config files are held)
  • File: ~/.config/lxc/default.conf (Default options for containers that this user will create)
  • Directory: ~/.local/share/lxc (where the containers are stored)
  • Directory: ~/.cache/lxc (We need to download the containers we are going to use (more on that in a bit), so this directory is used when downloading them.)
Unprivileged LXC templates are designed a bit differently than the normal LXC templates, and they do not ship with LXC, so we need to download them from the LXC project to be able to use them.  We create the directory in .cache to save these templates as we download them so that we do not need to download the same template every time we want to create a container from it.

So to make the directories we need (-p makes parent directories if they do not exist):
mkdir -p ~/.config/lxc ~/.local/share/lxc ~/.cache/lxc

Creating the LXC Configuration

Now, we will make the default container configuration file.  We need to tell LXC what subordinate UIDs and GIDs to use, and we need to set up networking since we want our containers to have network access.

We also need to make our home directory executable by everyone because the container root user (mapped with UID 65537 on the host in our case) needs to access ~/.local/share/lxc.  To do that, we will run:


chmod +x /home/containeruser


If you do not like the idea of giving everyone execute permissions on the home directory, then you can enable ACLs on your system (this is what I am doing for this example):

Make sure ACLs are enabled for the drive that is mounted:


sudo tune2fs -l /dev/sdaX | grep acl


where sdaX is the drive that the home directory lives (typically sda1 or sda2).  If you see something like:

Default mount options: user_xattr acl


Then you have ACLs enabled.  If you do not, you need to add them under /etc/fstab for your mount point.  Usually, the line in /etc/fstab will have something like defaults 0 0 at the end.  Simply add acl to the list of mount options (like defaults, acl) so the line would read defaults, acl 0 0.  If you needed to change /etc/fstab, either unmount and remount your partition if it is not in use or reboot.

When ACLs are enabled, you can make the following change to allow UID 65537 to execute the directory:
chmod u:65537:x /home/containeruser


Substitute 65537 for the first subordinate UID that you created, and /home/containeruser for the user that will be running the containers.

The result will be something like this (verify by running getfacl on the home directory):











LXC Configuration for Unprivileged Containers

Now, we can create our default LXC config file.  Here are the contents of ~/.config/lxc/default.conf:

lxc.network.type = veth
lxc.network.link = lxcbr0
lxc.network.flags = up
lxc.network.hwaddr = 00:16:3e:a1:b2:c3

lxc.id_map = u 0 65537 65536
lxc.id_map = g 0 65537 65536 
Let's take a look line by line:
  • lxc.network.type = veth: We are going to use virtual ethernet interfaces
  • lxc.network.link = lxcbr0: They will be linked to the host via bridge lxcbr0
  • lxc.network.flags = up: We want to interfaces to be brought up by default
  • lxc.network.hwaddr = 00:16:3e:a1:b2:c3 - We want MAC addresses to be assigned from the pool starting with 00:16:3e which is the OUI for Xen.  We could use whatever we want here, but it is good to use something that is allocated for virtual interfaces.
  • lxc.id_map = u 0 65537 65536: This tells LXC to map UID 0 in the container to UID 65537 on our host.  This number corresponds to the first subordinate UID we specified in /etc/subuid and the beginning of the range in the usermod command.  The last number tells LXC that it has 65536 subordinate UIDs to work with.  This number corresponds to the number of subordinate UIDs we allocated in /etc/subuid and to the end of the range in the usermod command.
  • lxc.id_map = g 0 65537 65536: This is the same as the last line but with subordinate GIDs instead of subordinate UIDs.

Creating and Launching the Container

Now, we should have everything we need to set up our first container.  We will use a Debian image.  Remember that we have to use a downloaded image because we are running unprivileged containers.  We should first check to see what containers are available:


lxc-create -t download -n test -- --list


This tells lxc-create (the command used to create LXC containers) to use the download template and make a container named (-n) test.  No actual container will be made here because of the template options we are using.  The two dashes (--) after the name are there because everything after those two dashes are template options.  The only option we want here is to list (--list) which templates are available.

Here are some of the templates are available as of this writing:








































We will make an Debian Jessie container for the AMD64 architecture.  To see the options available for the download template, run the following command:
lxc-create -t download -h


This asks lxc-create to give us the help (-h) for download templates (-t download).

Here is the command to create our Debian Jessie container:

lxc-create -t download -n DebLXCTest -- -d debian -r jessie -a amd64


This will create a container called DebLXCTest from the download template.  We specified that we want the Debian distro (-d debian), the Jessie release (-r jessie), and AMD64 as the architecture (-a amd64).

To start the container, run the command:

lxc-attach -n DebLXCTest

Unfortunately when I went to start it, networking would not work.  I got a message that said the veth interface could not be created (Operation not permitted).  I have a feeling this has something to do with cgroups, and I spent several hours trying to diagnose the issue to no avail.  When I removed the networking options from the container, it started up just fine.  When I ran it as root with networking, it worked just fine.  I do not think it is an SELinux issue because I did not notice any SELinux warnings in the logs, and I even disabled SELinux just to see.  I was able to get things working in Ubuntu using the instructions found here, so it could be a systemd issue or a kernel issue.

Conclusions and Thoughts About LXC

I think that containers are a really interesting concept.  However, right now, I do not think they are going to replace VMs anytime soon.  While containers are lightweight and easy to spin up and destroy, I think the implementation is too new.  As you can see, it took us a lot of work to get unprivileged containers to work in anything other than Ubuntu.  Even then, it required very new packages that are not available everywhere.

I do not like the idea of running containers as root because if the container is compromised or misconfigured, it could lead to compromise of the system that the container is running on.  With VMs, the user running the VMs typically is not root.  This means that if an attacker can break out of a VM, they might compromise other VMs but it would take more work to compromise the system.  Unprivileged containers are a great step forward in terms of security with respect to containers.  When other Linux distributions catch up to have everything in place to support them, I think I could get behind them more.

So in summary, here is my pros and cons list for LXC and containers in general right now:

Pros

  • Containers are lightweight and do not require the overhead of a separate operating system installation.  That means you can get more containers on a given hardware configuration than VMs (depending on the VMs).
  • Containers are arguably more portable than a VM because they are typically smaller (because they do not have the OS install overhead) and you do not need to worry as much about converting them for use with different hypervisors.  For example, if you want to use a VMware VM on VirtualBox or KVM, you have to convert it and deal with any virtual hardware driver issues that may arise.
  • Containers can be spun up and destroyed very quickly which means that you could use them to serve customer requests to your web application for example.  Spinning up and destroying a VM takes more time and is more suitable for more permanent use.

Cons

  • By default, Docker and LXC containers run as root which can lead to system compromise if the container is compromised or misconfigured.
  • Getting unprivileged containers working on any Linux distribution other than Ubuntu may be more work than is worth the effort.  Hopefully this will change soon.
  • If you have a mixed operating system workload (i.e. Linux and anything other than Linux), you cannot put the non-Linux operating systems in containers (you would need a VM or bare metal for that).

What are your thoughts?  Do you use containers for anything?  Please let me know.

Thanks for reading!

References

https://www.stgraber.org/2014/01/17/lxc-1-0-unprivileged-containers/
http://www.linuxquestions.org/questions/linux-kernel-70/lxc-unprivileged-container-in-debian-jessie-cgroups-permissions-4175540174/
https://linuxcontainers.org/lxc/getting-started/
https://github.com/fgrehm/vagrant-lxc/wiki/Usage-on-fedora-hosts
http://stackoverflow.com/questions/17989306/what-does-docker-add-to-lxc-tools-the-userspace-lxc-tools

No comments:

Post a Comment