Greening the CI/CD pipeline
Have you ever wondered how running the CI/CD pipeline(s) impacts the environment? If so, in this article you can find some additional ideas and possible starting points. If you haven't, well, check it out as well, I'm sure you'll find it useful.
Working as a DevOps engineer (to be honest, I don't quite like that term) has taught me the principles and practices of Continuous Integration and Continuous Delivery / Deployment. Or, CI/CD for short.
Going through those processes, from making change in the code, to deploying the change on a different environment, caused me to try and experiment with various things and approaches. Spoiler alert - it was/is bash all the way!
Jokes aside, daily preoccupation with the pipelines caused me to think in a direction - is there a way to make them greener?
With this article, we will take a high-level approach and check out the things to consider when making the whole CI/CD process more sustainable. We will not focus on the environmental impact of the application itself and the infrastructure it is running on. But on the process happening before, from the code commit, until applying that commit to the specific environment.
How to measure the current impact?
Okay, so we need to start somewhere. We can ask ourselves - what is the current impact of our CI/CD process?
Now, answering this can be a lot harder than you think. But it's not impossible.
Depending on where we run our CI/CD pipelines, we can use different ways to monitor the current setup.
- If running in the Cloud, you can opt for:
- checking the cloud-provider-specific dashboards (not ideal)
- using the Cloud Carbon Footprint tool
- If running on-prem, or to improve monitoring of your Cloud CI/CD agents, you can check out:
The above tools can give you an overview of what is the current status of the CI/CD process. They can be a good starting point.
What gets neglected?
The size of different artefacts matters. By artefacts in this context, I mean the following:
- single (or multiple) code repository
- used libraries and packages
- binaries
- container images (if any)
- all other artefacts not mentioned above.
For example, it's not the same if you have an application binary that is 10 MBs or 1GB, from both performance AND environmental impact perspective.
The environmental impact of the artefacts' size can include downloading those artefacts and storing them. This is what gets neglected often.
Measuring the impact of download and storing the artefacts can be quite tricky, and hard. We can leverage the tools above to help us in the measures. But not just that.
How to improve the impact?
Okay, let's assume we are able to measure the impact of our CI/CD process, with some of the tooling from above. We now see the numbers, and we don't like them. How can we improve? There are a couple of things we can do.
Avoid bloatware
In our code, both application, and infrastructure, the hard truth is we have a lot of bloatware.
Well, bigger doesn’t imply better. Bigger means someone has lost control. Bigger means we don’t know what’s going on. Bigger means complexity tax, performance tax, reliability tax.
You can deliver a lot of functionality even with a limited amount of code and dependencies.
https://spectrum.ieee.org/lean-software-development
Instead of getting that cool library or a tool that solves your problem, consider solving it by adding a couple of more lines of code in your application. Or, if applicable, maybe some simple bash script in your container image... These are just some examples from the top of my head.
With this, we can definitely impact the size of our artefacts, and therefore have impact on the environment itself.
Use cache where you can
Caching of libraries, packages, or even container images, can improve both the execution and the overall environmental impact of the CI/CD pipelines.
There are numerous ways to do so - for example, caching locally on the CI/CD agent. Or, in the context of container images, have a remote caching of layers. There are some possibilities, we just need to look for them.
Use temporary (spot) agents
If you are running the CI/CD pipeline in the Cloud, you can configure the CI/CD server to spin temporary agents, where it will run the job(s). After the pipeline has finished, the agent is turned off and destroyed.
For example, in GitLab, you can configure the runners (CI/CD agents) to run on AWS Spot instances. This allows you to re-use the existing infrastructure, and not reserve new capacity.
This approach makes sense if you have a simple and small application, with not that big amount of dependencies and binaries to download/store.
If you have application(s) that size is measured in gigabytes, this approach might not be for you.
Leverage running scheduled builds
If your CI/CD pipeline doesn't need to run on every commit, or some part takes too long, maybe you can choose to run it once a day. Or, for example, when the energy is coming from renewable sources.
Using Green APIs can help you there. You can check when you're getting the energy from renewables, and trigger the CI/CD process to run at that point in time.
Leverage different regions
If your CI/CD pipeline is running in the Cloud, you can use the above-mentioned Green APIs to check which regions are getting the energy from renewables and spin the agents there.
This, however, might not improve the environmental impact if your build process takes too long, and/or the artefacts you download/upload are big.
Turn off agents when not used
Machines consume power when not used. Why not turning them off when not used? For example, on weekends, or on non-working hours during the work week. If turning off CI/CD agents is not an option, maybe you can decrease their number?
Summary
Running the CI/CD pipelines sustainably can have a big impact on the environment. Adding this to consideration, and not just focusing on the environmental impact of running the application, can be of great importance.
A couple of side effects of greening the pipelines could be:
- decreasing costs of
- infrastructure
- data transfer
- improving performance and execution time, which leads to less power consumption.
How to start? As I've written above, you can:
- Measure your current state with the tools from the above.
- Leverage different improvement approaches I've mentioned.
I would love to hear from you on the topic! Have you found the things I've written about useful? Do you think there is something I'm missing? Add your thoughts in the comments below!
If you found this topic interesting, consider it sharing with a larger audience. It would mean a lot!
Thank you and see you in the next article!