While it's not something you would want to start with (well, unless you already have all the building blocks) a micro-service oriented architecture comes with more gains than pains. What you are looking at is the current state of TT architecture. Yes, You can probably recognise the pattern - backend being a monolith application being split into more and more services. But, it's not what I wanted to write about.
How long does it take for a new service to be in production?
Before we dive into it. How long does it take you to build a new application and ship it to production. Let's say, only basic server with monitoring and logging, CI/CD pipeline, and 1 test endpoint. How long does it take you from the first command line call to the moment when the app is available? Think about it and we will come back to it at the end.
Highly aligned
First thing that we really wanted to see is how we will ensure that with the growing number of newly created applications we do not have to rebuild existing things over and over again. We agreed on a set of important items that a service needs to have:
- centralised logs - here is where DataDog comes to play
- each app is monitored and alerts when there are problems - this is a mix of Kubernetes features and DataDog alerts
- each build step is available as a command line tool - we decided for Makefile
- each service defines each own running environment - Helm is our friend here
- CI pipeline with steps is defined - CircleCI is our tool of choice
- we get CD pipeline out of the box - CI provides Docker images and Argo-Cd takes care of deployments
In the end we wanted to get as close to 12factor apps as possible. As You can imagine, this level of alignment is not easy to achieve, even more difficult to implement for each service and very hard to maintain.
So, to the next step
Single source of truth
We have built a single source of truth project which we could use as a reference point. This is where a lot of discussions around How happened. We have built a simple Spring Boot application first. The app was exposing a sample endpoint and that's pretty much it. Then the work around CI added a circleci/config.yaml, a need to have centralised logs pushed us towards DataDog agents deployed on our Kubernetes, the need to define the environments ended up to be a bunch of Helm files in the project. You get the point. One app to have it all.
In the end, a developer could clone the repo, rename packages, and run the app locally.
Then just fiddle with CircleCI, register an app with our gitops repo, add your DB lib, secrets to Kubernetes. Drink coffee and boom.
I know what you are thinking... such an effort. We thought the same. This was great but still... not good enough.
Ease of use
The next step was to apply one of the CD rules: automate everything. We have created a command line tool which does the following:
- creates a new project out of the source of truth repository using the right name
- registers a new app with the CI pipeline (CircleCI)
- registers a new app with the GitOps continuous delivery (ArgoCD)
The only thing the developer needs to do is to push the code to a remote repo and they will see the app.
The mindset
I believe that was the hardest part. It has always been. How do we buy in to something where the pros are not so clearly visible but the issues are obvious from the very beginning?
- why lift table joins to code when DB has been successfully doing that for decades
- what about transactions?
- this will increase the cost of infra, each service comes with a memory footprint
- rather than having 1 place to look, we will have to chase the code around in case of problems
and many many more which I will not mention here. I guess another article need to be written to fully explain what was blocking us and how we worked around these. I must say that I am very proud of the team here.
How long does it take for a new service to be in production?
We can say this:
- 30 seconds to create a new service, it's just a 1 command line call
- 5 mins to run the CD pipeline (the image creation could be improved)
- 40 seconds to deploy and have the service boot up
So all together 7 mins. How long does it take you?