Summary: OpenShift Day-2
The Hitchhiker's Guide to Observability Introduction - Part 1
With this article I would like to summarize and, especially, remember my setup. This is Part 1 of a series of articles that I split up so it is easier to read and understand and not too long. Initially, there will be 6 parts, but I will add more as needed.
The Hitchhiker's Guide to Observability - Grafana Tempo - Part 2
After covering the fundamentals and architecture in Part 1, it’s time to get our hands dirty! This article walks through the complete implementation of a distributed tracing infrastructure on OpenShift.
We’ll deploy and configure the Tempo Operator and a multi-tenant TempoStack instance. For S3 storage we will use the integrated OpenShift Data Foundation. However, you can use whatever S3-compatible storage you have available.
The Hitchhiker's Guide to Observability - Central Collector - Part 3
With the architecture defined in Part 1 and TempoStack deployed in Part 2, it’s time to tackle the heart of our distributed tracing system: the Central OpenTelemetry Collector. This is the critical component that sits between your application namespaces and TempoStack, orchestrating trace flow, metadata enrichment, and tenant routing.
The Hitchhiker's Guide to Observability - Example Applications - Part 4
With the architecture defined, TempoStack deployed, and the Central Collector configured, we’re now ready to complete the distributed tracing pipeline. It’s time to deploy real applications and see traces flowing through the entire system!
In this fourth installment, we’ll focus on the application layer - deploying Local OpenTelemetry Collectors in team namespaces and configuring example applications to generate traces. You’ll see how applications automatically get enriched with Kubernetes metadata, how namespace-based routing directs traces to the correct TempoStack tenants, and how the entire two-tier architecture comes together.
The Hitchhiker's Guide to Observability - Understanding Traces - Part 5
With the architecture established, TempoStack deployed, the Central Collector configured, and applications generating traces, it’s time to take a step back and understand what we’re actually building. Before you deploy more applications and start troubleshooting performance issues, you need to understand how to read and interpret distributed traces.
Let’s decode the matrix of distributed tracing!
The Hitchhiker's Guide to Observability - Adding A New Tenant - Part 6
While we have created our distributed tracing infrastructure, we created two tenants as an example. In this article, I will show you how to add a new tenant and which changes must be made in the TempoStack and the OpenTelemetry Collector.
This article was mainly created as a quick reference guide to see which changes must be made when adding new tenants.
Automated ETCD Backup
Securing ETCD is one of the major Day-2 tasks for a Kubernetes cluster. This article will explain how to create a backup using OpenShift Cronjob.
Working with Environments
Imagine you have one OpenShift cluster and you would like to create 2 or more environments inside this cluster, but also separate them and force the environments to specific nodes, or use specific inbound routers. All this can be achieved using labels, IngressControllers and so on. The following article will guide you to set up dedicated compute nodes for infrastructure, development and test environments as well as the creation of IngressController which are bound to the appropriate nodes.
Introduction
Pod scheduling is an internal process that determines placement of new pods onto nodes within the cluster. It is probably one of the most important tasks for a Day-2 scenario and should be considered at a very early stage for a new cluster. OpenShift/Kubernetes is already shipped with a default scheduler which schedules pods as they get created accross the cluster, without any manual steps.
However, there are scenarios where a more advanced approach is required, like for example using a specifc group of nodes for dedicated workload or make sure that certain applications do not run on the same nodes. Kubernetes provides different options:
Controlling placement with node selectors
Controlling placement with pod/node affinity/anti-affinity rules
Controlling placement with taints and tolerations
Controlling placement with topology spread constraints
This series will try to go into the detail of the different options and explains in simple examples how to work with pod placement rules. It is not a replacement for any official documentation, so always check out Kubernetes and or OpenShift documentations.
Node Affinity
Node Affinity allows to place a pod to a specific group of nodes. For example, it is possible to run a pod only on nodes with a specific CPU or disktype. The disktype was used as an example for the nodeSelector and yes … Node Affinity is conceptually similar to nodeSelector but allows a more granular configuration.
NodeSelector
One of the easiest ways to tell your Kubernetes cluster where to put certain pods is to use a nodeSelector specification. A nodeSelector defines a key-value pair and are defined inside the specification of the pods and as a label on one or multiple nodes (or machine set or machine config). Only if selector matches the node label, the pod is allowed to be scheduled on that node.
Pod Affinity/Anti-Affinity
While noteSelector provides a very easy way to control where a pod shall be scheduled, the affinity/anti-affinity feature, expands this configuration with more expressive rules like logical AND operators, constraints against labels on other pods or soft rules instead of hard requirements.
The feature comes with two types:
pod affinity/anti-affinity - allows constrains against other pod labels rather than node labels.
node affinity - allows pods to specify a group of nodes they can be placed on
Taints and Tolerations
While Node Affinity is a property of pods that attracts them to a set of nodes, taints are the exact opposite. Nodes can be configured with one or more taints, which mark the node in a way to only accept pods that do tolerate the taints. The tolerations themselves are applied to pods, telling the scheduler to accept a taints and start the workload on a tainted node.
A common use case would be to mark certain nodes as infrastructure nodes, where only specific pods are allowed to be executed or to taint nodes with a special hardware (i.e. GPU).
Topology Spread Constraints
Topology spread constraints is a new feature since Kubernetes 1.19 (OpenShift 4.6) and another way to control where pods shall be started. It allows to use failure-domains, like zones or regions or to define custom topology domains. It heavily relies on configured node labels, which are used to define topology domains. This feature is a more granular approach than affinity, allowing to achieve higher availability.
Using Descheduler
Descheduler is a new feature which is GA since OpenShift 4.7. It can be used to evict pods from nodes based on specific strategies. The evicted pod is then scheduled on another node (by the Scheduler) which is more suitable.
This feature can be used when:
nodes are under/over-utilized
pod or node affinity, taints or labels have changed and are no longer valid for a running pod
node failures
pods have been restarted too many times
Copyright © 2020 - 2025 Toni Schmidbauer & Thomas Jungbauer

Thomas Jungbauer




