Engineering Blog - October 17, 2016

Our first private cloud with OpenStack Kolla and CentOS Atomic

Conor Callahan

At Zoosk, each developer is given access to a virtual machine with the entire Zoosk stack installed on it for development and testing. Historically, these VMs ran on a cluster of 10–15 individual hypervisors, each running CentOS 6. We managed the configuration both inside VMs and on the hypervisors using our production configuration management system, CFEngine.

We decided to try an image-based workflow, and developed an image build process (more about this in a future blog post). As we worked on the VM image, we realized that we needed a more reliable system to host the VMs. We prototyped oVirt, Eucalyptus, and libvirtd before eventually settling on OpenStack. We call the combination of the OpenStack cloud and VM images DevCloud (for Development Cloud).

OpenStack

OpenStack is an open source cloud platform made up of many individual components, similar to the various services offered by vendors like Amazon Web Services.

We built our original prototype using Packstack, a Puppet based utility to deploy OpenStack services. Packstack was rather painful and difficult to get working—we had to dig through Puppet modules and patch them ourselves to solve some of the problems, but we were able to get our prototype cloud functional! While researching OpenStack, we stumbled on Kolla, which packages OpenStack services in Docker containers and provides a deployment tool using Ansible.

At Zoosk we’ve been using Docker in production for well over a year, so we were excited to find a Docker-based solution for managing OpenStack. Kolla takes away a lot of the pain of installing, configuring, and running the various OpenStack services. It also supports Ceph, our storage solution. With a fairly limited amount of customization to the Ansible playbooks, we were able to get a solution running that works for us.

Atomic

ceph diagram

Since this was a completely new environment that we were constructing, we decided to use the opportunity to experiment with different bare metal OSes for running Docker. We’ve been using CentOS 6 for several years and have built up a lot of knowledge around RPM-based distributions, so we had a preference to similar distros. After considering CentOS 7 and various other distros, we found Project Atomic, a collection of tools to build distributions based on the idea of immutable infrastructure. CentOS and Fedora both release Atomic images. With a traditional system, it’s difficult to roll back after applying updates, but with Atomic, we’re able to have completely atomic package upgrades! Atomic allows us to treat the entire OS like a software artifact that can be swapped out as a whole, and even built using traditional continuous integration tools like Jenkins. Admins don’t even have the option of making one-off modifications, because most of the filesystem is read-only! While this may seem difficult to deal with, it’s a great structure that forces more thought and promotes repeatability. Gone are the days when a system’s state would constantly change due to modifications by configuration management systems and ad-hoc changes made by admins!

We use NewRelic in our production environment, so we chose to use NewRelic’s server monitoring for the DevCloud hosts. This allows us to see host-level metrics like CPU, memory usage, and network statistics at a glance.

Network

network diagram

DevCloud is intermingled with our production infrastructure. The physical hosts are connected via 2 physical links to each side of our redundant Juniper QFabric switching infrastructure.

We decided to separate the cloud network into 3 VLANs with corresponding subnets—a management network, one network for developer VMs, and one general VM network. The hosts are connected to trunk ports, so we’re able to use multiple VLANs on the same port using VLAN tagging. Inside a host, we’re running Open vSwitch, with the two physical links bonded together into an OVSBond in an active-backup mode. This has to be done before Docker starts, so this is configured as part of the kickstart. With the proper Neutron configuration, traffic on the VM networks is tagged with the appropriate VLAN ID. (Neutron is OpenStack’s network component).

Using the OpenStack command line tools, we created two VLAN-type provider networks and two corresponding subnets, both configured to use a specific set of custom DNS servers.

Lessons Learned

Authentication

Authentication with OpenStack is more complicated than it initially appears. There are domains, projects, users, and various combinations of the three! Here’s my attempt at an explanation:

Domains are best thought of as authentication realms. We added a new domain that connects to our LDAP infrastructure so that we can take advantage of our existing authentication setup. This requires adding a new Keystone (OpenStack’s identity service) configuration file.

Projects are a logical way to segregate tenants, which makes much more sense in a public cloud. Projects are also associated with domains. In DevCloud, we have one main project.

Users have a primary association with a domain, but also have a secondary association with projects. Because of this, we’re able to add users to the internal default domain (which is separate from our LDAP), and give them access to our project. This is really useful for creating service accounts that are only used for OpenStack access.

In order to authenticate to an OpenStack cloud, a combination of a user, project, and domain is necessary. It gets tricky when you want to authenticate using a user from the internal domain to a project associated with an LDAP domain. Here’s how we do it:
openstack role add \ --os-domain-name DOMAIN_NAME_FOR_PROJECT \ --project PROJECT_NAME \ --user-domain INTERNAL_DOMAIN_NAME \ --user USER_IN_INTERNAL_DOMAIN _member_

We can then use the OS_PROJECT_NAME, OS_PROJECT_DOMAIN_NAME, OS_USER_DOMAIN_NAME, and OS_USERNAME variables with an OpenStack command.

Uploading images to Ceph

As we started launching instances in DevCloud, we grew concerned that the build/launch time was unreasonably long, and began suspecting the image format. Our image builder outputs images in the QCOW2 format. While QCOW2 works with OpenStack and ceph, it is slow because the images have to be converted to RAW at instance launch time. After a few tests, we discovered that RAW images were much, much faster to launch! After a lot of research, we were able to script a process that uploads a RAW-converted image directly into our Ceph cluster as a new RBD image, and then creates the image in Glance (OpenStack’s image service).

We added a new user in Ceph and set up a Ceph configuration file for this process:
[global] fsid = CEPH_CLUSTER_FSID mon initial members = CEPH_MON_HOST mon host = CEPH_MON_HOST mon addr = CEPH_MON_HOST:6789 auth cluster required = cephx auth service required = cephx auth client required = cephx rbd default format = 2 # allows snapshotting and protecting images keyring = /etc/ceph/ceph.client.CEPH_USERNAME.keyring
This configuration is then used in combination with the new user’s keyring in the following process:
# IMAGE_ID is generated by uuidgen # CEPH_CLUSTER_FSID is the ID of the Ceph cluster # /etc/ceph/ceph.conf should include # Convert the QCOW2 image to RAW and upload into Ceph qemu-img convert \ -p \ -O raw \ ${qcow2_image} \ rbd:images/${IMAGE_ID}:id=${CEPH_USERNAME}:conf=/etc/ceph/ceph.conf

# Create an RBD snapshot of the image rbd -c /etc/ceph/ceph.conf \ -p images \ snap create \ --snap snap ${IMAGE_ID} \ --keyring /etc/ceph/ceph.client.${CEPH_USERNAME}.keyring \ -n client.${CEPH_USERNAME}

# Protect the RBD snapshot of the image (required for Glance) rbd -p images \ -c /etc/ceph/ceph.conf \ snap protect ${IMAGE_ID} \ --snap snap \ --keyring /etc/ceph/ceph.client.${CEPH_USERNAME}.keyring \ -n client.${CEPH_USERNAME}

# Create the image in Glance using the V1 API - required to specify location glance --os-image-api-version 1 \ image-create \ --id ${IMAGE_ID} \ --name ${IMAGE_NAME} \ --store rbd \ --disk-format raw \ --container-format bare \ --location rbd://${CEPH_CLUSTER_FSID}/images/${IMAGE_ID}/snap

Ceph

While Ceph is usually quite performant and reliable, we discovered that when adding new instances to the cluster using Kolla’s Ansible deployment method, Ceph I/O slowed to a crawl, which rendered instances nearly unusable. We’ve resorted to performing cluster expansions during low usage times, but aim to experiment with the initial storage device (OSD) weights to hopefully allow us to add new compute nodes at any time.

Conclusion

conclusion

With immutable images, containers, and host OSes, our test VM infrastructure is in a truly repeatable state. This particularly shines with our test automation, which currently recreates 20 instances from scratch each night. We currently are running ~90 instances in DevCloud. Since we’re running on hardware in our datacenter that we already own, we’re saving a lot by running this ourselves.

As we containerize more of our applications at Zoosk, we’re evaluating our current production Docker setup. Since Atomic is working so well as our bare metal OS for DevCloud, it’s likely we’ll adopt it for our production environment as well.