From a Ruby on Rails Monolith to Microservices and CQRS
This is the first of many posts from the Fiverr Dev Team, sharing their insights and early experiences building the Fiverr platform.
Like many startups, the dev team at Fiverr wanted to get its initial product rolled out quickly. This meant delivering value as fast as possible. To start things off, we used a classic Ruby on Rails application on a LAMP stack because of its simplicity and ease of bootstrapping.
Everything was fine for the first couple of years. New feature? No problem! Just add db migration, a new route to the controller, new scopes to model, some spaghetti code to view and run Capistrano from a console…that was it.
But then things started to get a bit nasty. Application responsiveness plummeted and downtime soared. The code base became so complex that even the smallest changes required a lot of time. We tried several quick wins. For example, we tried to keep more entities in the cache. It helped a little at first but made things even worse in the end because of the complexity of cache keys management.
That’s when we fully understood that system scalability and long-term quality were every bit as important as delivery velocity.
Here’s the approach we took to ensure system scalability and long-term quality:
First, we mapped all problems to four major groups:
- Application performance
- System quality
- System scalability
- Delivery velocity
Next, we defined our targets:
- Single feature failure should not cause whole system failure.
- Single feature slowness should not cause whole system slowness.
- Business flows should have clear implementation and deployment boundaries. For example the flow responsible for communication between users should not be mixed with the flow responsible for payments.
- The system should be scalable at the lowest possible level.
- Polyglot programming should be supported.
Then, after exploring several options, we chose to make a major change in the system architecture by:
- Breaking the monolithic system into small modules
- Implementing separate pipes for read and write operations
Two paradigms became the basis for the architecture change:
- Microservices – We decided that microservices would be responsible for query operations when the higher application layer interacts with microservices using RESTful protocol.
- CQRS (Command and Query Responsibility Segregation) – We moved command operations (INSERT, UPDATE, DELETE) to an event-based asynchronous platform using RabbitMQ as a message broker. You can find more about CQRS here.
To implement the two new paradigms, we introduced the following changes:
- Business domains were recognized and defined (Users, Orders, Marketplace Content, etc).
- A software module consisting of a RESTful microservice and an Events Worker was created for each business domain. It was called a Chimera module because it had two heads, one for Q (query) and another for C (command). The Q and C heads were riding on the same body, providing code reuse and a common configuration and infrastructure layer.
- The microservice part of the Chimera module took responsibility for fetch operations, while the worker part took responsibility for asynchronous events processing.
- To orchestrate events processing, we implemented a Topology Manager responsible for events propagation between business domains.
- Relational databases running on MySQL were partitioned to follow the business domain boundaries.
- Most of the new additions to the data structure of the business entities were implemented using NoSQL databases.
- The Chimera module was deployed as a single unit, but service and worker processes could be run separately, thus providing code reuse and process segregation at the same time.
The final architecture looks like this:
- Each Chimera module can be implemented and tested separately.
- Each Chimera module can be deployed, monitored and scaled separately.
- Business domain modules can be isolated at run time, providing greater system resilience.
- The learning curve for new developers can be divided into stages, taking business domain modules one by one.
- Different modules can be implemented in different languages. For example, we use node.js for high-throughput services.
- More moving parts results in higher complexity.
- Some code duplication is inevitable.
Lessons learned (the hard way)
- Do not implement scalability from day one, but do design for scalability from day one.
We’ll continue to share what we’ve been doing to grow the world’s largest marketplace for services. Look out for our next post in the Tech category. Leave any comments or questions below.
This post was originally published on Fiverr.