Six years ago, Zoosk was a web and desktop application built with a monolithic architecture. Today, the Zoosk franchise runs on multiple platforms (Android, iOS, web, and touch) with a public facing API gateway and 17 micro-services in a hybrid data center / cloud environment. This blog summarizes the Zoosk engineering history of, and lessons learned from, refactoring that monolith into micro-services.
We have already blogged about the photo service which was one of the earliest efforts to split up our monolith. Version 2 of the photo service was responsible for the transcoding work. Version 3 is actually composed of three separate micro-services in the bounded contexts of gallery, transcoding, and fraud detection.
We have always fronted our SOLR data stores with micro-services. The earliest efforts used SOLR to optimize access to a socially broadcast feed of romantic moments and for user profile search results. I gave a presentation on these services during the 2012 Lucene Revolution conference. Later on, we added micro-services for accessing SOLR indexes on Facebook affinities, for user location search that is a part of the signup funnel, and to help our user support team quickly find users in our system when they call.
We also use micro-services to front our Cassandra cluster. We have already blogged about the notifications service that accessed Cassandra with Hector. More recently, we wrote another micro-service that accesses our Cassandra cluster via CQL for user event information.
We also use micro-services for service discovery, orchestration, and health monitoring. One of our services consumes web access logs to surface per minute percentile based latency metrics. Another micro-service participates in the monitoring of our Real-Time Communications infrastructure. We have a micro-service that is responsible for monitoring photo service health and shedding load during degraded network incidents in order to accelerate recovery and stabilization time.
Not all micro-services are for production. We also have a micro-service dedicated to helping developers troubleshoot and recover from problems encountered on their virtual machines.
Other micro-services include user authentication, phone validation, message queuing, and accessing the configuration repository.
What lessons have we learned regarding micro-services? First of all, this is not a blog about how to write well designed micro-services. There are plenty of good advice online about that already. Try searching for DDD aggregates or event sourcing. This blog is also not about sound Computer Engineering practices which would apply equally well to both monolithic applications and to micro-services. This blog is more about the engineering cultural shift we experienced when transitioning from the monolith to micro-services.
Refactoring a monolith into micro-services is an expensive activity which takes a long time and introduces lots of risk to your application and development process. You need to have plenty of justification before endeavoring in that adventure.
Are your sprint launches highly disruptive? Does QA frequently complain about misunderstood scope? Does it take a long time for engineers to fix relatively simple bugs? Do engineers have to fix a lot of bugs introduced by fixes to previous bugs? Is a lot of engineering time spent resolving merge conflicts and making hot fixes to bugs that got past QA? Is engineering hampered by having to develop with out-of-date technology that cannot be easily upgraded?
If you answered yes to two or more of these questions, then perhaps some degree of refactoring your monolith into micro-services makes sense for your organization.
Sometimes the best refactoring is no refactoring at all. If you have new requirements for a new feature, then consider creating a micro-service for that feature instead of just adding more code to the existing monolith.
If the data should be in a new database, then the code should be in a new micro-service. The reverse is also true. Refactoring code from the monolith to micro-services also means migrating data from the old database to a new one. Why should each micro-service have exclusive access to the underlying data? Because a shared database means shared state means leaky abstractions means opportunity for undocumented and unanticipated side effects. When micro-services own their own data, that means all access to that data goes through that micro-service whose end points are formally documented as service level contracts.
Here is another cultural difference between the monolith and micro-services. The monolithic programmer says “Why do I have to make another curl or thrift call when I can just join over to that other table or spin up a new object and get what I want quickly?” The micro-service developer knows that no back doors means more controlled scope during testing and fewer surprises at launch time. Caching must also be isolated too.
When you propose a new micro-service, you also need to make plans about the type of underlying data storage. Be ready to validate which type of data storage (e.g. SQL vs NoSQL) technology you choose. SOLR is best for keyword based search but not for name = value based search. Cassandra is best when you have more writes than reads on a large collection of atomically maintained lists.
Most of the benefits to micro-services lie in the fact that the scope of a micro-service is less than that of a monolith. When engineers don’t have to focus as much on keeping new code changes from breaking old code, they can devote more time and mind to supporting equally important non-functional requirements.
Be sure to negotiate acceptable Service Level Agreements during the requirements capture phase for each service. Examples include per minute error rate, throughput per server, and latency (both average and percentile based). What is the appropriate escalation procedure when the service falls out of acceptable SLA? How does the appropriate engineering group get notified with actionable content? How do we prevent alert fatigue?
Monolithic applications tend to focus more on supporting the data needs of the client software that calls it. In addition to that, micro-services also focus on system wide resilience and on protecting the underlying database from abuse. You tend to find more efforts on circuit breaking and bulk heading in micro-services than in monoliths.
Micro-service architectures tend to be more distributed than monolithic applications. This means that engineering will need to focus more attention on distributed computing related issues such as race conditions and the correct understanding of eventual consistency. There is less risk in deploying to hybrid cloud environments with micro-services than with a monolithic application.
In the monolith, there is only one application to build and deploy so it is expected that each developer self services their own environment. In micro-service architectures, there may be dozens of applications to build and deploy. That is asking too much for each developer to have to deal with so many services that they are not responsible for and may not even know exist. It is more important to get your Continuous Delivery story straight with micro-services than it is with the monolith.
Developers are people too. Make sure that your Micro Service Architecture is conducive to a friendly Developer experience. Be sure to provide good tooling support for creating, deploying, verifying, and testing new micro-services in the developer environment.
Here at Zoosk, we have used micro-services for SOLR, photos, Cassandra, orchestration, and various other areas. What we have learned when it comes to planning out micro-services is to be strategic when refactoring, to focus more on non-functional requirements, and to include release engineering in the process.