Engineering Blog - May 3, 2017

Self Service Deploys at Zoosk

Sandeep Raju

Once upon a time at Zoosk, the deploy process used to be a lengthy ritual that could only be performed by a select few. It required coordination and planning between multiple teams, and the developers had to wait in line for the “next deploy window” to monitor their code that was being shipped to production along with a bunch of other changes that were scheduled to go out during the same window. You even had to wait up to two weeks before certain changes would be sent out. And during such big deploys, work would slow down. While the process did what it was supposed to do, it had its bottlenecks and drawbacks (more on the dev process change in a future blog post!).

We started using Docker here at Zoosk, first in dev environments and then in production. This also meant that the time was ripe for us to also upgrade our deploy process.

We were inspired by Airbnb’s democratic deploys and Github’s Hubot based deploys after hearing about them at a conference and thought we should build something similar. We thought about using Hubot, but it was not a good fit—it was written in CoffeeScript while most of our back-end deployment code was in Python. So we ended up building a slackbot that’s written entirely in Python from scratch. We call it “shippy” and the main source file is conveniently named “ship.py”. Shippy uses the slack RTM api to listen to events from slack and interact with users.

We also ended up using gevent, which is a coroutine based networking library in python that uses “monkey patching” to create lightweight pseudo-threads (e.g.: to listen to slack events at the same time performing a back-end deployment). This gives the user the illusion of having threads while under the covers, there is always only one thread executing. This also eliminates the need for complicated locking mechanisms.

Our deploy workflow:

Developers push their changes to their feature branch and create a pull request in GitHub Enterprise
Developers also ensure that their branch is up to date with the master branch
Our CI builds generate containers on every push
Our QA team signs off on the commit (assuming the changes have passed QA):

The CI builds also create commit statuses in GitHub Enterprise on each commit that is being built indicating either a successful build or a failure:

Devs ship their changes to production using shippy:

Shippy merges the shipped code (now that it is known to work in production) into the master branch:

The deploy process itself:

While the UI is deceptively simple, the machinery behind it is quite thorough. The process starts with a series of pre-deploy checks (see screenshot above). The bot checks the commit status on GitHub for the commit being shipped to check if the build and QA were successful (this list of commit statuses could be extended to perform further checks). The bot then checks if the feature branch being shipped is up to date with master. Next, shippy looks up the commit that’s currently in production (master if there was no deviation to the process) and generates a link showing the “diff” for what’s being shipped.

Shippy then begins the actual deploy process by calling our backend machinery which, among other things, cleans up old containers, pulls new containers onto hosts, starts them, and enables load balancer rules to send the appropriate amount of traffic to the new containers. This backend really requires a blog post of its own.

One of the cool things about our homebrew deploy process is that we’re able to do incremental roll-outs of traffic to the new containers. Once the containers are up on all the hosts (or a pre-specified minimum number of hosts), an internal VIP (virtual IP) is provided where the code can be verified in production.

The bot also provides various links to our monitoring tools (we use NewRelic, Rollbar, and Splunk among others). Once the user has verified that their code is working as expected, they can either increase the roll-out percentage or just “commit” the deploy. Committing the deploy takes the roll-out percentage to 100% and releases the lock for other deployers to ship their code. The user could at any point abort the deploy without rolling it out should such an action be deemed necessary.

Conclusion:

Our SVP of Engineering, Mike Riccio noticed an interesting statistic after we enabled “shippy” deploys at Zoosk—the number of deploys we did saw a 90% increase in March (year over year). Which means we’re shipping code at an ever faster pace. We also feel there is increased developer happiness as far as the deploy process goes! While Slack itself is a fun tool for team communications, making it part of the deploy process makes it even more fun for the engineers and is something everyone should consider.