From Kubernetes to Low Carbon-netes: Optimizing K8s Infrastructure and Workloads for Sustainability

Wondering what is the environmental impact of your Kubernetes clusters? How much energy is consumed and Carbon emitted by them? You are in the right place! Discover the steps that can help you measure and reduce the environmental footprint of your Kubernetes environments.

audio-thumbnail
From Kubernetes to Low Carbon-netes
0:00
/717.930657

Listen to me read this post here (not an AI-generated voice!) or subscribe to the feed in your podcast app.


Before I start the main story, I just want to quickly reflect on the current situation in Serbia. If you don't already know, from November last year, students are actively protesting against corruption in the government that was the main cause of the tragedy at the main train station in Novi Sad. As with many people here, this situation has left its toll on me and my writing as well. With this little paragraph I want to first thank students for their persistence and to say this - although small and with not many readers, this blog and its author stand with students!

Now back to the main article.


Some time ago I've listened an episode of Environmental Variables podcast about New Research Horizons, and the host Chris Adams used the term - from Kubernetes to Low Carbon-netes (around 42nd minute). So I jotted this sentence down somewhere for it to be used in some future article. The time has finally come for the article to be published. So, thanks Chris for the great idea!

The Week in Green Software: New Research Horizons
This Week in Green Software has Dr. Daniel Schien from the University of Bristol, UK, joining host Chris Adams to talk about old research, recent news, and future prospects all revolving around digital sustainability. This conversation touches on some of the work Daniel has done in the past (and plans to do in the future) as well as their thoughts and reckons on this and how it can be used to steer our efforts towards a sustainable future. Together, they cover topics such as streaming being the new flying, and some ways in which new research has changed their perspectives on some problems in green tech.

We will start this article by describing how can we measure energy usage of Kubernetes cluster, and with this, the Carbon footprint of the cluster(s).

How can we measure emissions?

If you can't measure it, you can't manage it.

Peter Drucker

The first step in any effort that is related to reduction, especially Carbon emissions, is to measure what our current usage is. In this case - what is our Carbon footprint.

Being the first step, it is for sure one of the hardest, because we can't say with the exact certainty if the numbers are true or not. The overall Carbon emissions of our infrastructure can be mainly calculated from:

  • the amount of CO2/Carbon emitted in the production and delivery of the hardware to our premises (e.g. data centre) - this is known as embodied carbon - and
  • the amount of electricity this hardware uses while in running - known as operational emissions.

Calculating the embodied carbon can be really difficult, so we need to rely on the data from our manufacturers and distributors. The embodied carbon is already emitted to the atmosphere so we cannot do much about that other than using our hardware for longer period and opt for re-use rather than buying new hardware. To find out more about embodied or embedded carbon, check out one of my previous articles.

Why you don’t need that new and cool device everyone is talking about?
Buying a new device is not always a good option. This is because devices emit carbon long before we start using them. Find out why this happens and how can we help, in this article.

In this article however, we'll focus on the second point. On the things we can do while operating our hardware.

If the most of the electricity we're using is coming from renewable sources - that is great! However, a lot of times, the source of electricity is rather dynamic - we cannot be 100% certain from which source we're getting the power. What we can do is to measure the power our infrastructure uses, and based on the numbers of Watt hours (Wh), we make calculations and predictions.

For the Kubernetes infrastructure, and other infrastructure for that matter, we can use one of the following tools:

  • Scaphandre - a monitoring agent that keeps track of energy consumption of your system.
  • Kepler - a Prometheus exporter that uses eBPF to probe energy-related system stats and exports them as metrics.

And guess what - I've written about both these tools here! Checkout the links below to find out more!

Demoing Scaphandre
Tracking power consumption of your computing devices is important. That’s why, in this article, we go through another tool that helps you do just that! Join me in the journey of learning about and exploring Scaphandre.
Demoing Kepler Exporter
Have you ever wondered if you can track the power consumption of your machines? The short answer is - yes! The “how?” question is the interesting part of it. And we dive into that, and many more, in this article!

These tools mainly export power usage metrics that we can later convert into CO2 emissions based on the data from our power source provider. You can check my article from a while back, where I'm discussing the emission data sources, or Green APIs and how you can use them.

Exploring the Green APIs
Have you ever wondered, is there an API you can call and get the carbon emission data of a specific location? It would be cool to have something like this, so we can make our applications carbon-aware, wouldn’t it? Look no further! In this article I’m exploring just that!
💡
There might be some other tools that also provide these kinds of metrics. I am aware of only these two. If you have something in mind, feel free to add your recommendations in the comment section below.

What can we do to reduce the emissions?

So, we have completed the first step - measuring. We now know where we stand. Or at least have some idea where we stand. Next, let's go through some possible emission reduction steps starting from the easier ones first towards the more complicated ones.

Adding resource requests and limits

By default, workloads in Kubernetes cluster run without any resource limitations. Because of this, as a first step, we should always have limits and requests defined on them. It is a good practice, and as well a recommended step, for two reasons:

  1. Adding them limits uncontrolled usage of resources by cluster workloads.
  2. Adding them allows Kubernetes to better apply its Quality of Service classes.

The yaml definition, should look something like below.

    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
        ephemeral-storage: "100M"
      limits:
        memory: "128Mi"
        cpu: "500m"
        ephemeral-storage: "500M"

More information can be found on the following link from Kubernetes documentation.

Resource Management for Pods and Containers
When you specify a Pod, you can optionally specify how much of each resource a container needs. The most common resources to specify are CPU and memory (RAM); there are others. When you specify the resource request for containers in a Pod, the kube-scheduler uses this information to decide which node to place the Pod on. When you specify a resource limit for a container, the kubelet enforces those limits so that the running container is not allowed to use more of that resource than the limit you set.

Add limit ranges and resource quotas

We all know users cannot be trusted (being that end users, developers, system administrators, etc). Therefore, we have some mechanisms that we can apply to make sure that resource requests and limits are always defined.

First, if we want to force no workload runs unbound in the cluster, we can add LimitRange to the namespace. This resource allows us the following:

  • enforce min/max resources usage per Pod or Container,
  • enforce min/max storage request per Persistent Volume Claim,
  • enforce ratio between request and limit for a resource,
  • set default request/limit for resources and automatically inject them to Containers at runtime.

The main thing to remember here is that LimitRange works inside a namespace we define. It won't work on cluster level, so we might apply this resource by default to every namespace that is created.

To find out more about limit ranges, go to the link below.

Limit Ranges
By default, containers run with unbounded compute resources on a Kubernetes cluster. Using Kubernetes resource quotas, administrators (also termed cluster operators) can restrict consumption and creation of cluster resources (such as CPU time, memory, and persistent storage) within a specified namespace. Within a namespace, a Pod can consume as much CPU and memory as is allowed by the ResourceQuotas that apply to that namespace. As a cluster operator, or as a namespace-level administrator, you might also be concerned about making sure that a single object cannot monopolize all available resources within a namespace.

If our resources on the cluster(s) are limited, we might go one step further and enable ResourceQuota on each namespace. This will make sure that no workload takes too much of the resources that are available. If a workload is being deployed and requests resources that crosses the quota, the deployment will be blocked. This way you can safeguard the limited resources of your cluster.

To find out more about resource quotas, checkout the link below.

Resource Quotas
When several users or teams share a cluster with a fixed number of nodes, there is a concern that one team could use more than its fair share of resources. Resource quotas are a tool for administrators to address this concern. A resource quota, defined by a ResourceQuota object, provides constraints that limit aggregate resource consumption per namespace. It can limit the quantity of objects that can be created in a namespace by type, as well as the total amount of compute resources that may be consumed by resources in that namespace.

Application changes

Not all applications handle containerization the same way. For example, when calculating default Heap memory, Java applications (before version 17) didn't use the resource limitations of the Container level, but on a Node level. It didn't properly handle container awareness. This is a problem because application crashes with java.lang.OutOfMemoryError and you don't know why. To fix this, you would need to add -Xms and -Xmx arguments to your JAVA_OPTS.

Turn on/off workloads when not used

One of the most used fix in an endless amount of IT problems - turning it off and on again, can also be quite effective when it comes to reducing the CO2 emissions.

Some data shows that just turning off machines when not used can save us a lot of energy, and therefore reduce Carbon emissions. What is known as the LightSwitchOps.

To do this in a Kubernetes cluster environments, there are two approaches:

  1. manually scale up/down resources when not used
  2. automatically (on a schedule) turn off/on resources when not used with kube-green.

The first option is quite simple, but rather painful. If you have multiple resources in the cluster, as you probably do, it can be quite taxing and boring to go from one resource to another and just scale it down when not used.

Another option is to have sort of a schedule when your resources will automatically scale down and up. This can be done with the help of kube-green tool. It is quite easy and simple to set up. And guess what - I already mention this in one of my previous articles.

Turning the lights on/off in Kubernetes clusters
Some time ago, cruising through the Sustainability-related parts of the Internet, I’ve arrived upon a term LightSwitchOps. Since I’m a fool for all the things Ops, I decided to have a look. In this article, we will dip our fingers in the concept of LightSwitchOps and how to apply it

Run batch jobs when energy is greener

Based on the metrics in the first part, you can easily determine when your energy is greener - coming from renewable sources, and when it's not. You can take this information and reschedule your CronJob resources or any batch jobs to run on electricity from renewable sources.

The simple solution would be to adjust the schedule of your cron jobs to run when the energy is greener. The problem with this is approach is the dynamic nature of power sources. We can't easily predict from which sources the energy comes from all the time. Therefore, we need some automation that can help us here. We discuss this in the following section.

Dynamically scale resources based on carbon intensity

Due to its dynamic nature, energy sources cannot be predicted 100% of the time. This is where you can do what is called event based scheduling. There is a tool that allows you to do just that, called KEDA - Kubernetes-based Event Driven Autoscaler. This tool works as a dynamic autoscaler of workloads based on certain events.

Following is the idea of the process you can apply to dynamically autoscale workloads:

  1. Call one of the Green APIs (e.g. Electricity Maps) to check for Carbon intensity of your location.
  2. If the Carbon Intensity is currently low, trigger the scale up of certain workloads through KEDA.
  3. If the Carbon Intensity is getting high, trigger the scale down of the workloads in the same way.

This solution is, from my point of view, the most complicated one to configure. I am going to spend some time in the next weeks to try it out and write up a demo in one of my next articles.

Summary

When looking into reducing your Kubernetes electricity and Carbon footprint, we need to start with the baseline - what amount of electricity we use, and what amount of CO2 do we emit. Then we can go to the next steps of actually reducing the footprint. The ones I mentioned here are the following:

  • adding resource requests and limits, either through individual definitions or limit ranges,
  • rejecting the workloads that use more than available with resource quotas,
  • make applications changes (e.g. Xmx and Xms options for Java),
  • turning the workloads off when not used,
  • running batch jobs when energy is greener,
  • dynamically scale resources based on locations' carbon intensity.

All the above options should help you reduce the carbon and energy footprint of your cluster. Let me know if you have tried some of them and what have you noticed. If you have something else I didn't consider here, even better! Write down your thoughts, feedback in the comment section below, I am eager to find out more about this topic!