<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Pod Placement on TechBlog about OpenShift/Ansible/Satellite and much more</title><link>https://blog.stderr.at/categories/pod-placement/</link><description>TechBlog about OpenShift/Ansible/Satellite and much more</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><copyright>Toni Schmidbauer &amp; Thomas Jungbauer</copyright><lastBuildDate>Wed, 12 Jan 2022 00:00:00 +0000</lastBuildDate><atom:link href="https://blog.stderr.at/categories/pod-placement/index.xml" rel="self" type="application/rss+xml"/><item><title>Working with Environments</title><link>https://blog.stderr.at/openshift-platform/day-2/labels_environmets/2022-01-12-creatingenvironment/</link><pubDate>Wed, 12 Jan 2022 00:00:00 +0000</pubDate><guid>https://blog.stderr.at/openshift-platform/day-2/labels_environmets/2022-01-12-creatingenvironment/</guid><description>&lt;div class="paragraph"&gt;
&lt;p&gt;Imagine you have one OpenShift cluster and you would like to create 2 or more environments inside this cluster, but also separate them and force the environments to specific nodes, or use specific inbound routers. All this can be achieved using labels, IngressControllers and so on. The following article will guide you to set up dedicated compute nodes for infrastructure, development and test environments as well as the creation of IngressController which are bound to the appropriate nodes.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_prerequisites"&gt;Prerequisites&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Before we start we need an OpenShift cluster of course. In this example we have a cluster with typical 3 control plane nodes (labelled as master) and 7 compute nodes (labelled as worker)&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-138-104.us-east-2.compute.internal Ready master 13h v1.21.1+6438632
ip-10-0-149-168.us-east-2.compute.internal Ready worker 13h v1.21.1+6438632 # &amp;lt;-- will become infra
ip-10-0-154-244.us-east-2.compute.internal Ready worker 16m v1.21.1+6438632 # &amp;lt;-- will become infra
ip-10-0-158-44.us-east-2.compute.internal Ready worker 15m v1.21.1+6438632 # &amp;lt;-- will become infra
ip-10-0-160-91.us-east-2.compute.internal Ready master 13h v1.21.1+6438632
ip-10-0-188-198.us-east-2.compute.internal Ready worker 13h v1.21.1+6438632 # &amp;lt;-- will become worker-test
ip-10-0-191-9.us-east-2.compute.internal Ready worker 16m v1.21.1+6438632 # &amp;lt;-- will become worker-test
ip-10-0-192-174.us-east-2.compute.internal Ready master 13h v1.21.1+6438632
ip-10-0-195-201.us-east-2.compute.internal Ready worker 16m v1.21.1+6438632 # &amp;lt;-- will become worker-dev
ip-10-0-199-235.us-east-2.compute.internal Ready worker 16m v1.21.1+6438632 # &amp;lt;-- will become worker-dev&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;We will use the 7 nodes to create the environments for:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Infrastructure Services (3 nodes) - will be labelled as &lt;code&gt;infra&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Development Environment (2 nodes) - will be labelled as &lt;code&gt;worker-dev&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Test Environment (2 nodes) - will be labelled as &lt;code&gt;worker-test&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;To do this, we will label the nodes and create dedicated roles for them.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_create_nodes_labels_and_machineconfigpools"&gt;Create Nodes Labels and MachineConfigPools&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Let’s create a maschine config pool for the different environments.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;The pool inherits the configuration from &lt;code&gt;worker&lt;/code&gt; nodes by default, which means that any new update on the worker configuration will also update custom labeled nodes.
Or in other words: it is possible to remove the worker label from the custom pools.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Create the following objects:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-yaml hljs" data-lang="yaml"&gt;# Infrastructure
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: infra
spec:
machineConfigSelector:
matchExpressions:
- {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]} &lt;i class="conum" data-value="1"&gt;&lt;/i&gt;&lt;b&gt;(1)&lt;/b&gt;
maxUnavailable: 1 &lt;i class="conum" data-value="2"&gt;&lt;/i&gt;&lt;b&gt;(2)&lt;/b&gt;
nodeSelector:
matchLabels:
node-role.kubernetes.io/infra: &amp;#34;&amp;#34; &lt;i class="conum" data-value="3"&gt;&lt;/i&gt;&lt;b&gt;(3)&lt;/b&gt;
paused: false
---
# Worker-DEV
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: worker-dev
spec:
machineConfigSelector:
matchExpressions:
- {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,worker-dev]}
nodeSelector:
matchLabels:
node-role.kubernetes.io/worker-dev: &amp;#34;&amp;#34;
---
# Worker-TEST
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: worker-test
spec:
machineConfigSelector:
matchExpressions:
- {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,worker-test]}
nodeSelector:
matchLabels:
node-role.kubernetes.io/worker-test: &amp;#34;&amp;#34;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="colist arabic"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td&gt;&lt;i class="conum" data-value="1"&gt;&lt;/i&gt;&lt;b&gt;1&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;User worker and infra so that the default worker configuration gets applied during upgrades&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;i class="conum" data-value="2"&gt;&lt;/i&gt;&lt;b&gt;2&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;Whenever an update happens, do only 1 at a time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;i class="conum" data-value="3"&gt;&lt;/i&gt;&lt;b&gt;3&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;This pool is valid for nodes which are labelled as &lt;code&gt;infra&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Now let’s label our nodes.
First add the new, additional label:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;# Add the label &amp;#34;infra&amp;#34;
oc label node ip-10-0-149-168.us-east-2.compute.internal ip-10-0-154-244.us-east-2.compute.internal ip-10-0-158-44.us-east-2.compute.internal node-role.kubernetes.io/infra=
# Add the label &amp;#34;worker-dev&amp;#34;
oc label node ip-10-0-195-201.us-east-2.compute.internal ip-10-0-199-235.us-east-2.compute.internal node-role.kubernetes.io/worker-dev=
# Add the label &amp;#34;worker-test&amp;#34;
oc label node ip-10-0-188-198.us-east-2.compute.internal ip-10-0-191-9.us-east-2.compute.internal node-role.kubernetes.io/worker-test=&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;This will result in:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;NAME STATUS ROLES AGE VERSION
ip-10-0-138-104.us-east-2.compute.internal Ready master 13h v1.21.1+6438632
ip-10-0-149-168.us-east-2.compute.internal Ready infra,worker 13h v1.21.1+6438632
ip-10-0-154-244.us-east-2.compute.internal Ready infra,worker 20m v1.21.1+6438632
ip-10-0-158-44.us-east-2.compute.internal Ready infra,worker 19m v1.21.1+6438632
ip-10-0-160-91.us-east-2.compute.internal Ready master 13h v1.21.1+6438632
ip-10-0-188-198.us-east-2.compute.internal Ready worker,worker-test 13h v1.21.1+6438632
ip-10-0-191-9.us-east-2.compute.internal Ready worker,worker-test 20m v1.21.1+6438632
ip-10-0-192-174.us-east-2.compute.internal Ready master 13h v1.21.1+6438632
ip-10-0-195-201.us-east-2.compute.internal Ready worker,worker-dev 20m v1.21.1+6438632
ip-10-0-199-235.us-east-2.compute.internal Ready worker,worker-dev 20m v1.21.1+6438632&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;We can remove the worker label from the worker-dev and worker-test nodes now.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="admonitionblock warning"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td class="icon"&gt;
&lt;i class="fa icon-warning" title="Warning"&gt;&lt;/i&gt;
&lt;/td&gt;
&lt;td class="content"&gt;
Keep the label &amp;#34;worker&amp;#34; for the infra nodes as this is the default worker label which is used when no nodeselector is in use. You can use any other node, just keep in mind that per default new applications will be started on nodes with the labels &amp;#34;worker&amp;#34;. As an alternative, you can also define a cluster-wide default node selector.
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc label node ip-10-0-195-201.us-east-2.compute.internal ip-10-0-199-235.us-east-2.compute.internal node-role.kubernetes.io/worker-
oc label node ip-10-0-188-198.us-east-2.compute.internal ip-10-0-191-9.us-east-2.compute.internal node-role.kubernetes.io/worker-&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;The final node labels will look like the following:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;NAME STATUS ROLES AGE VERSION
ip-10-0-138-104.us-east-2.compute.internal Ready master 13h v1.21.1+6438632
ip-10-0-149-168.us-east-2.compute.internal Ready infra,worker 13h v1.21.1+6438632
ip-10-0-154-244.us-east-2.compute.internal Ready infra,worker 22m v1.21.1+6438632
ip-10-0-158-44.us-east-2.compute.internal Ready infra,worker 21m v1.21.1+6438632
ip-10-0-160-91.us-east-2.compute.internal Ready master 13h v1.21.1+6438632
ip-10-0-188-198.us-east-2.compute.internal Ready worker-test 13h v1.21.1+6438632
ip-10-0-191-9.us-east-2.compute.internal Ready worker-test 21m v1.21.1+6438632
ip-10-0-192-174.us-east-2.compute.internal Ready master 13h v1.21.1+6438632
ip-10-0-195-201.us-east-2.compute.internal Ready worker-dev 21m v1.21.1+6438632
ip-10-0-199-235.us-east-2.compute.internal Ready worker-dev 22m v1.21.1+6438632&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="admonitionblock note"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td class="icon"&gt;
&lt;i class="fa icon-note" title="Note"&gt;&lt;/i&gt;
&lt;/td&gt;
&lt;td class="content"&gt;
Since the custom pools (infra, worker-test and worker-dev) inherit their configuration from the default worker pool, no changes on the files on the nodes themselves are triggered at this point.
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_create_custom_configuration"&gt;Create Custom Configuration&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Let’s test our setup by deploying a configuration on specific nodes. The following MaschineConfig objects will create a file at &lt;strong&gt;/etc/myfile&lt;/strong&gt; on the nodes labelled either &lt;em&gt;infra&lt;/em&gt;, &lt;em&gt;worker-dev&lt;/em&gt; or &lt;em&gt;worker_test&lt;/em&gt;.
Dependent on the node role the content of the file will vary.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-yaml hljs" data-lang="yaml"&gt;apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: infra &lt;i class="conum" data-value="1"&gt;&lt;/i&gt;&lt;b&gt;(1)&lt;/b&gt;
name: 55-infra
spec:
config:
ignition:
version: 2.2.0
storage:
files:
- contents:
source: data:,infra &lt;i class="conum" data-value="2"&gt;&lt;/i&gt;&lt;b&gt;(2)&lt;/b&gt;
filesystem: root
mode: 0644
path: /etc/myfile &lt;i class="conum" data-value="3"&gt;&lt;/i&gt;&lt;b&gt;(3)&lt;/b&gt;
---
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker-dev
name: 55-worker-dev
spec:
config:
ignition:
version: 2.2.0
storage:
files:
- contents:
source: data:,worker-dev
filesystem: root
mode: 0644
path: /etc/myfile
---
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker-test
name: 55-worker-test
spec:
config:
ignition:
version: 2.2.0
storage:
files:
- contents:
source: data:,worker-test
filesystem: root
mode: 0644
path: /etc/myfile&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="colist arabic"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td&gt;&lt;i class="conum" data-value="1"&gt;&lt;/i&gt;&lt;b&gt;1&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;Valid for node with the role xyz&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;i class="conum" data-value="2"&gt;&lt;/i&gt;&lt;b&gt;2&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;Content of the file&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;i class="conum" data-value="3"&gt;&lt;/i&gt;&lt;b&gt;3&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;File to be created&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Since this is a new configuration, all nodes will get reconfigured.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;NAME STATUS ROLES AGE VERSION
ip-10-0-138-104.us-east-2.compute.internal Ready master 13h v1.21.1+6438632
ip-10-0-149-168.us-east-2.compute.internal Ready,SchedulingDisabled infra,worker 13h v1.21.1+6438632
ip-10-0-154-244.us-east-2.compute.internal Ready infra,worker 28m v1.21.1+6438632
ip-10-0-158-44.us-east-2.compute.internal Ready infra,worker 27m v1.21.1+6438632
ip-10-0-160-91.us-east-2.compute.internal Ready master 13h v1.21.1+6438632
ip-10-0-188-198.us-east-2.compute.internal Ready worker-test 13h v1.21.1+6438632
ip-10-0-191-9.us-east-2.compute.internal NotReady,SchedulingDisabled worker-test 27m v1.21.1+6438632
ip-10-0-192-174.us-east-2.compute.internal Ready master 13h v1.21.1+6438632
ip-10-0-195-201.us-east-2.compute.internal Ready worker-dev 27m v1.21.1+6438632
ip-10-0-199-235.us-east-2.compute.internal NotReady,SchedulingDisabled worker-dev 28m v1.21.1+6438632&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Wait until all nodes are ready and test the configuration by verifying the content of &lt;em&gt;/etc/myfile&lt;/em&gt;:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;###
# infra nodes:
###
oc get pods -n openshift-machine-config-operator -l k8s-app=machine-config-daemon --field-selector &amp;#34;spec.nodeName=ip-10-0-149-168.us-east-2.compute.internal&amp;#34;
NAME READY STATUS RESTARTS AGE
machine-config-daemon-f85kd 2/2 Running 6 16h
# Get file content
oc rsh -n openshift-machine-config-operator machine-config-daemon-f85kd chroot /rootfs cat /etc/myfile
Defaulted container &amp;#34;machine-config-daemon&amp;#34; out of: machine-config-daemon, oauth-proxy
infra
###
#worker-dev:
###
oc get pods -n openshift-machine-config-operator -l k8s-app=machine-config-daemon --field-selector &amp;#34;spec.nodeName=ip-10-0-195-201.us-east-2.compute.internal&amp;#34;
NAME READY STATUS RESTARTS AGE
machine-config-daemon-s6rr5 2/2 Running 4 3h5m
# Get file content
oc rsh -n openshift-machine-config-operator machine-config-daemon-s6rr5 chroot /rootfs cat /etc/myfile
Defaulted container &amp;#34;machine-config-daemon&amp;#34; out of: machine-config-daemon, oauth-proxy
worker-dev
###
# worker-test:
###
oc get pods -n openshift-machine-config-operator -l k8s-app=machine-config-daemon --field-selector &amp;#34;spec.nodeName=ip-10-0-188-198.us-east-2.compute.internal&amp;#34;
NAME READY STATUS RESTARTS AGE
machine-config-daemon-m22rf 2/2 Running 6 16h
# Get file content
oc rsh -n openshift-machine-config-operator machine-config-daemon-m22rf chroot /rootfs cat /etc/myfile
Defaulted container &amp;#34;machine-config-daemon&amp;#34; out of: machine-config-daemon, oauth-proxy
worker-test&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;The file /etc/myfile exists on all nodes and depending on their role the files have a different content.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_bind_an_application_to_a_specific_environment"&gt;Bind an Application to a Specific Environment&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;The following will label the nodes with a specific environment and will deploy an example application, which should only be executed on the appropriate nodes.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="olist arabic"&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;
&lt;p&gt;Let’s label the nodes with &lt;strong&gt;environment=worker-dev&lt;/strong&gt; and &lt;strong&gt;environment=worker-test&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc label node ip-10-0-195-201.us-east-2.compute.internal ip-10-0-199-235.us-east-2.compute.internal environment=worker-dev
oc label node ip-10-0-188-198.us-east-2.compute.internal ip-10-0-191-9.us-east-2.compute.internal environment=worker-test&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create a namespace for the example application&lt;/p&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc new-project bookinfo&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create an annotation and a label for the namespace. The annotation will make sure that the application will only be started on nodes with the same label. The label will be later used for the IngressController setup.&lt;/p&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc annotate namespace bookinfo environment=worker-dev
oc annotate namespace bookinfo openshift.io/node-selector: environment=worker-test
oc label namespace bookinfo environment=worker-dev&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Deploy the example application. In this article the sample application of Istio was used:&lt;/p&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc apply -f https://raw.githubusercontent.com/istio/istio/release-1.11/samples/bookinfo/platform/kube/bookinfo.yaml&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;This will start the application on &lt;code&gt;worker-dev`&lt;/code&gt; nodes only, because the annotation in the namespace was created accordingly.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc get pods -n bookinfo -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
details-v1-86dfdc4b95-v8zfv 1/1 Running 0 9m19s 10.130.2.17 ip-10-0-199-235.us-east-2.compute.internal &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
productpage-v1-658849bb5-8gcl7 1/1 Running 0 7m17s 10.128.4.21 ip-10-0-195-201.us-east-2.compute.internal &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
ratings-v1-76b8c9cbf9-cc4js 1/1 Running 0 9m19s 10.130.2.19 ip-10-0-199-235.us-east-2.compute.internal &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
reviews-v1-58b8568645-mbgth 1/1 Running 0 7m44s 10.128.4.20 ip-10-0-195-201.us-east-2.compute.internal &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
reviews-v2-5d8f8b6775-qkdmz 1/1 Running 0 9m19s 10.130.2.21 ip-10-0-199-235.us-east-2.compute.internal &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
reviews-v3-666b89cfdf-8zv8w 1/1 Running 0 9m18s 10.130.2.22 ip-10-0-199-235.us-east-2.compute.internal &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_create_dedicated_ingresscontroller"&gt;Create Dedicated IngressController&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;IngressController are responsible to bring the traffic into the cluster. OpenShift comes with one default controller, but it is possible to create more in order to use different domains and separate the incoming traffic to different nodes.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Bind the default ingress controller to the infra labeled nodes, so we can be sure that the default router pods are executed only on these nodes:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc patch ingresscontroller default -n openshift-ingress-operator --type=merge --patch=&amp;#39;{&amp;#34;spec&amp;#34;:{&amp;#34;nodePlacement&amp;#34;:{&amp;#34;nodeSelector&amp;#34;: {&amp;#34;matchLabels&amp;#34;:{&amp;#34;node-role.kubernetes.io/infra&amp;#34;:&amp;#34;&amp;#34;}}}}}&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;The pods will get restarted, to be sure they are running on infra:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre&gt;oc get pods -n openshift-ingress -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
router-default-78f8dd6f69-dbtbv 0/1 ContainerCreating 0 2s &amp;lt;none&amp;gt; ip-10-0-149-168.us-east-2.compute.internal &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
router-default-78f8dd6f69-wwpgb 0/1 ContainerCreating 0 2s &amp;lt;none&amp;gt; ip-10-0-158-44.us-east-2.compute.internal &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
router-default-7bbbc8f9bd-vfh84 1/1 Running 0 22m 10.129.4.6 ip-10-0-158-44.us-east-2.compute.internal &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
router-default-7bbbc8f9bd-wggrx 1/1 Terminating 0 19m 10.128.2.8 ip-10-0-149-168.us-east-2.compute.internal &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Create the following IngressController objects for &lt;strong&gt;worker-dev&lt;/strong&gt; and &lt;strong&gt;worker-test&lt;/strong&gt;. Replace with the domain of your choice&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-yaml hljs" data-lang="yaml"&gt;apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
name: ingress-worker-dev
namespace: openshift-ingress-operator
spec:
domain: worker-dev.&amp;lt;yourdomain&amp;gt; &lt;i class="conum" data-value="1"&gt;&lt;/i&gt;&lt;b&gt;(1)&lt;/b&gt;
endpointPublishingStrategy:
type: HostNetwork
httpErrorCodePages:
name: &amp;#39;&amp;#39;
namespaceSelector:
matchLabels:
environment: worker-dev &lt;i class="conum" data-value="2"&gt;&lt;/i&gt;&lt;b&gt;(2)&lt;/b&gt;
nodePlacement:
nodeSelector:
matchLabels:
node-role.kubernetes.io/worker-dev: &amp;#39;&amp;#39; &lt;i class="conum" data-value="3"&gt;&lt;/i&gt;&lt;b&gt;(3)&lt;/b&gt;
replicas: 3
tuningOptions: {}
unsupportedConfigOverrides: null
---
apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
name: ingress-worker-test
namespace: openshift-ingress-operator
spec:
domain: worker-test.&amp;lt;yourdomain&amp;gt;
endpointPublishingStrategy:
type: HostNetwork
httpErrorCodePages:
name: &amp;#39;&amp;#39;
namespaceSelector:
matchLabels:
environment: worker-test
nodePlacement:
nodeSelector:
matchLabels:
node-role.kubernetes.io/worker-test: &amp;#39;&amp;#39;
replicas: 3
tuningOptions: {}
unsupportedConfigOverrides: null&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="colist arabic"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td&gt;&lt;i class="conum" data-value="1"&gt;&lt;/i&gt;&lt;b&gt;1&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;Domainname which is used by this Controller&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;i class="conum" data-value="2"&gt;&lt;/i&gt;&lt;b&gt;2&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;Namespace selector …​ namespaces with such label will be handled by this IngressController&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;i class="conum" data-value="3"&gt;&lt;/i&gt;&lt;b&gt;3&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;Node Placement …​ This Controller should run on nodes with this label/role&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;This will spin up additional router pods on the collect labelled nodes:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc get pods -n openshift-ingress -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
router-default-78f8dd6f69-dbtbv 1/1 Running 0 8m7s 10.128.2.11 ip-10-0-149-168.us-east-2.compute.internal &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
router-default-78f8dd6f69-wwpgb 1/1 Running 0 8m7s 10.129.4.10 ip-10-0-158-44.us-east-2.compute.internal &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
router-ingress-worker-dev-76b65cf558-mspvb 1/1 Running 0 113s 10.130.2.13 ip-10-0-199-235.us-east-2.compute.internal &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
router-ingress-worker-dev-76b65cf558-p2jpg 1/1 Running 0 113s 10.128.4.12 ip-10-0-195-201.us-east-2.compute.internal &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
router-ingress-worker-test-6bbf9967f-4whfs 1/1 Running 0 113s 10.131.2.13 ip-10-0-191-9.us-east-2.compute.internal &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
router-ingress-worker-test-6bbf9967f-jht4w 1/1 Running 0 113s 10.131.0.8 ip-10-0-188-198.us-east-2.compute.internal &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_verify_ingress_configuration"&gt;Verify Ingress Configuration&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;To test our new ingress router lets create a route object for our example application:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-yaml hljs" data-lang="yaml"&gt;kind: Route
apiVersion: route.openshift.io/v1
metadata:
name: productpage
namespace: bookinfo
spec:
host: productpage-bookinfo.worker-dev.&amp;lt;yourdomain&amp;gt;
to:
kind: Service
name: productpage
weight: 100
port:
targetPort: http
wildcardPolicy: None
---
kind: Route
apiVersion: route.openshift.io/v1
metadata:
name: productpage-worker-test
namespace: bookinfo
spec:
host: productpage-bookinfo.worker-test.&amp;lt;yourdomain&amp;gt;
to:
kind: Service
name: productpage
weight: 100
port:
targetPort: http
wildcardPolicy: None&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="admonitionblock warning"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td class="icon"&gt;
&lt;i class="fa icon-warning" title="Warning"&gt;&lt;/i&gt;
&lt;/td&gt;
&lt;td class="content"&gt;
Be sure that the name is resolvable and a load balancer is configured accordingly
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Verify that the router pod has the correct configuration in the file &lt;strong&gt;haproxy.config&lt;/strong&gt; :&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc get pods -n openshift-ingress
NAME READY STATUS RESTARTS AGE
router-default-78f8dd6f69-dbtbv 1/1 Running 0 95m
router-default-78f8dd6f69-wwpgb 1/1 Running 0 95m
router-ingress-worker-dev-76b65cf558-mspvb 1/1 Running 0 88m
router-ingress-worker-dev-76b65cf558-p2jpg 1/1 Running 0 88m
router-ingress-worker-test-6bbf9967f-4whfs 1/1 Running 0 88m
router-ingress-worker-test-6bbf9967f-jht4w 1/1 Running 0 88m&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Verify the content of the haproxy configuration for one of the &lt;code&gt;worker-dev&lt;/code&gt; router&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc rsh -n openshift-ingress router-ingress-worker-dev-76b65cf558-mspvb cat haproxy.config | grep productpage
backend be_http:bookinfo:productpage
server pod:productpage-v1-658849bb5-8gcl7:productpage:http:10.128.4.21:9080 10.128.4.21:9080 cookie 3758caf21badd7e4f729209173eece08 weight 256&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Compare with &lt;code&gt;worker-test&lt;/code&gt; router&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc rsh -n openshift-ingress router-ingress-worker-test-6bbf9967f-jht4w cat haproxy.config | grep productpage
--&amp;gt; Empty result, this router is not configured with that route.&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Compare with &lt;code&gt;default&lt;/code&gt; router:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;backend be_http:bookinfo:productpage
server pod:productpage-v1-658849bb5-8gcl7:productpage:http:10.128.4.21:9080 10.128.4.21:9080 cookie 3758caf21badd7e4f729209173eece08 weight 256&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Why does this happen? Why are the default router and the router for worker-dev configured?
This happens because it is the default router and we must explicitly tell it to ignore certain labels.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Modify the default IngressController&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc edit ingresscontroller.operator default -n openshift-ingress-operator&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Add the following&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-yaml hljs" data-lang="yaml"&gt; namespaceSelector:
matchExpressions:
- key: environment
operator: NotIn
values:
- worker-dev
- worker-test&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;This will tell the default IngressController to ignore selectors on &lt;code&gt;worker-dev&lt;/code&gt; and &lt;code&gt;worker-test&lt;/code&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Wait a few seconds until the route pods have been restarted:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc get pods -n openshift-ingress
NAME READY STATUS RESTARTS AGE
router-default-744998df46-8lh4t 1/1 Running 0 2m32s
router-default-744998df46-hztgf 1/1 Running 0 2m31s
router-ingress-worker-dev-76b65cf558-mspvb 1/1 Running 0 96m
router-ingress-worker-dev-76b65cf558-p2jpg 1/1 Running 0 96m
router-ingress-worker-test-6bbf9967f-4whfs 1/1 Running 0 96m
router-ingress-worker-test-6bbf9967f-jht4w 1/1 Running 0 96m&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;And test again&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc rsh -n openshift-ingress router-default-744998df46-8lh4t cat haproxy.config | grep productpage
--&amp;gt; empty result&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="admonitionblock caution"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td class="icon"&gt;
&lt;i class="fa icon-caution" title="Caution"&gt;&lt;/i&gt;
&lt;/td&gt;
&lt;td class="content"&gt;
At this point the new router feels responsible. Be sure to have a load balancer configured correctly.
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_appendix"&gt;Appendix&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Bind other infra-workload to infrastructure nodes:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="sect2"&gt;
&lt;h3 id="_internal_registry"&gt;Internal Registry&lt;/h3&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc patch configs.imageregistry.operator.openshift.io/cluster -n openshift-image-registry --type=merge --patch &amp;#39;{&amp;#34;spec&amp;#34;:{&amp;#34;nodeSelector&amp;#34;:{&amp;#34;node-role.kubernetes.io/infra&amp;#34;:&amp;#34;&amp;#34;}}}&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect2"&gt;
&lt;h3 id="_openshift_monitoring_workload"&gt;OpenShift Monitoring Workload&lt;/h3&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Create the following file and apply it.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;cat &amp;lt;&amp;lt;&amp;#39;EOF&amp;#39; &amp;gt; cluster-monitoring-config-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |+
alertmanagerMain:
nodeSelector:
node-role.kubernetes.io/infra: &amp;#34;&amp;#34;
prometheusK8s:
nodeSelector:
node-role.kubernetes.io/infra: &amp;#34;&amp;#34;
prometheusOperator:
nodeSelector:
node-role.kubernetes.io/infra: &amp;#34;&amp;#34;
grafana:
nodeSelector:
node-role.kubernetes.io/infra: &amp;#34;&amp;#34;
k8sPrometheusAdapter:
nodeSelector:
node-role.kubernetes.io/infra: &amp;#34;&amp;#34;
kubeStateMetrics:
nodeSelector:
node-role.kubernetes.io/infra: &amp;#34;&amp;#34;
telemeterClient:
nodeSelector:
node-role.kubernetes.io/infra: &amp;#34;&amp;#34;
EOF&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc create -f cluster-monitoring-config-cm.yaml&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</description></item><item><title>Introduction</title><link>https://blog.stderr.at/openshift-platform/day-2/pod-placement/2021-08-26-podplacement-intro/</link><pubDate>Thu, 26 Aug 2021 00:00:00 +0000</pubDate><guid>https://blog.stderr.at/openshift-platform/day-2/pod-placement/2021-08-26-podplacement-intro/</guid><description>&lt;div class="paragraph"&gt;
&lt;p&gt;Pod scheduling is an internal process that determines placement of new pods onto nodes within the cluster. It is probably one of the most important tasks for a Day-2 scenario and should be considered at a very early stage for a new cluster. OpenShift/Kubernetes is already shipped with a &lt;strong&gt;default scheduler&lt;/strong&gt; which schedules pods as they get created accross the cluster, without any manual steps.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;However, there are scenarios where a more advanced approach is required, like for example using a specifc group of nodes for dedicated workload or make sure that certain applications do not run on the same nodes. Kubernetes provides different options:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Controlling placement with node selectors&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Controlling placement with pod/node affinity/anti-affinity rules&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Controlling placement with taints and tolerations&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Controlling placement with topology spread constraints&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;This series will try to go into the detail of the different options and explains in simple examples how to work with pod placement rules.
It is not a replacement for any official documentation, so always check out Kubernetes and or OpenShift documentations.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_pod_placement_series"&gt;Pod Placement Series&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="olist arabic"&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-nodeselector/"&gt;NodeSelector&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-pod-affinity/"&gt;Pod Affinity and Anti Affinity&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-node-affinity/"&gt;Node Affinity&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-taints-and-tolerations"&gt;Taints and Tolerations&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-topology-spread-constraints/"&gt;Topology Spread Constraints&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/descheduler/"&gt;Descheduler&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_prerequisites"&gt;Prerequisites&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="admonitionblock note"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td class="icon"&gt;
&lt;i class="fa icon-note" title="Note"&gt;&lt;/i&gt;
&lt;/td&gt;
&lt;td class="content"&gt;
The following prerequisites are used for all examples.
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Let’s image that our cluster (OpenShift 4) has 4 compute nodes&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc get node --selector=&amp;#39;node-role.kubernetes.io/worker&amp;#39;
NAME STATUS ROLES AGE VERSION
compute-0 Ready worker 7h1m v1.19.0+d59ce34
compute-1 Ready worker 7h1m v1.19.0+d59ce34
compute-2 Ready worker 7h1m v1.19.0+d59ce34
compute-3 Ready worker 7h1m v1.19.0+d59ce34&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;An example application (from the catalog Django + Postgres) has been deployed in the namespace &lt;code&gt;podtesting&lt;/code&gt;. It contains by default 1 pod for a PostGresql database and one pod for a frontend web application.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc get pods -n podtesting -o wide | grep Running
django-psql-example-1-h6kst 1/1 Running 0 20m 10.130.2.97 compute-2 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
postgresql-1-4pcm4 1/1 Running 0 21m 10.131.0.51 compute-3 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Without any configuration the OpenShift scheduler will try to spread the pods evenly accross the cluster.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Let’s increase the replica of the web frontend:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc scale --replicas=4 dc/django-psql-example -n podtesting&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Eventually 4 additional pods will be started accross the compute nodes of the cluster:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc get pods -n podtesting -o wide | grep Running
django-psql-example-1-842fl 1/1 Running 0 2m7s 10.131.0.65 compute-3 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-1-h6kst 1/1 Running 0 24m 10.130.2.97 compute-2 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-1-pxhlv 1/1 Running 0 2m7s 10.128.2.13 compute-0 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-1-xms7x 1/1 Running 0 2m7s 10.129.2.10 compute-1 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
postgresql-1-4pcm4 1/1 Running 0 26m 10.131.0.51 compute-3 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;As you can see, the scheduler already tries to spread the pods evenly, so that every worker node will host one frontend pod.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;However, let’s try to apply a more advanced configuration for the pod placement, starting with the
&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-nodeselector/"&gt;Pod Placement - NodeSelector&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</description></item><item><title>Node Affinity</title><link>https://blog.stderr.at/openshift-platform/day-2/pod-placement/2021-08-29-node-affinity/</link><pubDate>Thu, 26 Aug 2021 00:00:00 +0000</pubDate><guid>https://blog.stderr.at/openshift-platform/day-2/pod-placement/2021-08-29-node-affinity/</guid><description>&lt;div class="paragraph"&gt;
&lt;p&gt;Node Affinity allows to place a pod to a specific group of nodes. For example, it is possible to run a pod only on nodes with a specific CPU or disktype. The disktype was used as an example for the &lt;code&gt;nodeSelector&lt;/code&gt; and yes …​ Node Affinity is conceptually similar to nodeSelector but allows a more granular configuration.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_pod_placement_series"&gt;Pod Placement Series&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Please check out other ways of pod placements:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;div class="expand"&gt;
&lt;div class="expand-label" style="cursor: pointer;" onclick="$h = $(this);$h.next('div').slideToggle(100,function () {$h.children('i').attr('class',function () {return $h.next('div').is(':visible') ? 'fas fa-chevron-down' : 'fas fa-chevron-right';});});"&gt;
&lt;i style="font-size:x-small;" class="fas fa-chevron-right"&gt;&lt;/i&gt;
&lt;span&gt;
&lt;a&gt;Expand me...&lt;/a&gt;
&lt;/span&gt;
&lt;/div&gt;
&lt;div class="expand-content" style="display: none;"&gt;
&lt;div class="olist arabic"&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-nodeselector/"&gt;NodeSelector&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-pod-affinity/"&gt;Pod Affinity and Anti Affinity&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-node-affinity/"&gt;Node Affinity&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-taints-and-tolerations"&gt;Taints and Tolerations&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-topology-spread-constraints/"&gt;Topology Spread Constraints&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/descheduler/"&gt;Descheduler&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_using_node_affinity"&gt;Using Node Affinity&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Currently two types of affinity settings are known:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;requiredDuringSchedulingIgnoreDuringExecuption (short &lt;em&gt;required&lt;/em&gt;) - a hard requirement which must be met before a pod can be scheduled&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;preferredDuringSchedulingIgnoredDuringExecution (short &lt;em&gt;preferred&lt;/em&gt;) - a soft requirement the scheduler &lt;strong&gt;tries&lt;/strong&gt; to meet, but does not guarantee it&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="admonitionblock note"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td class="icon"&gt;
&lt;i class="fa icon-note" title="Note"&gt;&lt;/i&gt;
&lt;/td&gt;
&lt;td class="content"&gt;
Both types can be specified. In such case the node must first meet the required rule and then attempt to meet the preferred rule.
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;div class="sect2"&gt;
&lt;h3 id="_preparing_node_labels"&gt;Preparing node labels&lt;/h3&gt;
&lt;div class="admonitionblock note"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td class="icon"&gt;
&lt;i class="fa icon-note" title="Note"&gt;&lt;/i&gt;
&lt;/td&gt;
&lt;td class="content"&gt;
Remember the prerequisites explained in the . &lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-pod-affinity/"&gt;Pod Placement - Introduction&lt;/a&gt;. We have 4 compute nodes and an example web application up and running.
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Before we start with affinity rules we need to label all nodes. Let’s create 2 zones (east and west) for our compute nodes.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="admonitionblock note"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td class="icon"&gt;
&lt;i class="fa icon-note" title="Note"&gt;&lt;/i&gt;
&lt;/td&gt;
&lt;td class="content"&gt;
You can skip this, if these labels are still set.
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;div class="imageblock"&gt;
&lt;div class="content"&gt;
&lt;img src="https://blog.stderr.at/openshift-platform/day-2/images/affinity-kubernetes.zones.png" alt="Node Zones"/&gt;
&lt;/div&gt;
&lt;div class="title"&gt;Figure 1. Node Zones&lt;/div&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc label nodes compute-0 compute-1 topology.kubernetes.io/zone=east
oc label nodes compute-2 compute-3 topology.kubernetes.io/zone=west&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect2"&gt;
&lt;h3 id="_configure_node_affinity_rule"&gt;Configure node affinity rule&lt;/h3&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Like pod affinity the node affinity is defined on the pod specification:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-yaml hljs" data-lang="yaml"&gt;kind: DeploymentConfig
apiVersion: apps.openshift.io/v1
metadata:
name: django-psql-example
namespace: podtesting
[...]
spec:
[...]
template:
[...]
spec:
[...]
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- west&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;In this example the pods are started only on nodes of the zone &amp;#34;West&amp;#34;. Since the value is an array, multiple zones can be defined letting the web application be executed on West and East for example.
With this setup you can control on which node a specific application shall be executed. For example: you have a group of nodes which provide a GPU and your GPU application must be started only on this group of nodes.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Like with pod affinity you can combine required and preferred settings.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_what_happened_to_node_anti_affinity"&gt;What happened to Node Anti-Affinity?&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Unlike Pod Anti-Affinity, there is no concept to define a node Anti-Affinity. Instead you can use the &lt;code&gt;NotIn&lt;/code&gt; and &lt;code&gt;DoesNotExist&lt;/code&gt; operators to achieve this bahaviour.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_cleanup"&gt;Cleanup&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;As cleanup simply remove the affinity specification from the DeploymentConf. The node labels can stay as they are since they do not hurt.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_summary"&gt;Summary&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;This concludes the quick overview of the node affinity. Further information can be found at &lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity" target="_blank" rel="noopener"&gt;Node Affinity&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</description></item><item><title>NodeSelector</title><link>https://blog.stderr.at/openshift-platform/day-2/pod-placement/2021-08-27-podplacement/</link><pubDate>Thu, 26 Aug 2021 00:00:00 +0000</pubDate><guid>https://blog.stderr.at/openshift-platform/day-2/pod-placement/2021-08-27-podplacement/</guid><description>&lt;div class="paragraph"&gt;
&lt;p&gt;One of the easiest ways to tell your Kubernetes cluster where to put certain pods is to use a &lt;code&gt;nodeSelector&lt;/code&gt; specification. A nodeSelector defines a key-value pair and are defined inside the specification of the pods and as a label on one or multiple nodes (or machine set or machine config). Only if selector matches the node label, the pod is allowed to be scheduled on that node.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Kubernetes distingushes between 2 types of selectors:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="olist arabic"&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;cluster-wide node selectors&lt;/em&gt;: defined by the cluster administrators and valid for the whole cluster&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;project node selectors&lt;/em&gt;: to place new pods inside projects into specific nodes.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_pod_placement_series"&gt;Pod Placement Series&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Please check out other ways of pod placements:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;div class="expand"&gt;
&lt;div class="expand-label" style="cursor: pointer;" onclick="$h = $(this);$h.next('div').slideToggle(100,function () {$h.children('i').attr('class',function () {return $h.next('div').is(':visible') ? 'fas fa-chevron-down' : 'fas fa-chevron-right';});});"&gt;
&lt;i style="font-size:x-small;" class="fas fa-chevron-right"&gt;&lt;/i&gt;
&lt;span&gt;
&lt;a&gt;Expand me...&lt;/a&gt;
&lt;/span&gt;
&lt;/div&gt;
&lt;div class="expand-content" style="display: none;"&gt;
&lt;div class="olist arabic"&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-nodeselector/"&gt;NodeSelector&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-pod-affinity/"&gt;Pod Affinity and Anti Affinity&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-node-affinity/"&gt;Node Affinity&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-taints-and-tolerations"&gt;Taints and Tolerations&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-topology-spread-constraints/"&gt;Topology Spread Constraints&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/descheduler/"&gt;Descheduler&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_using_nodeselector"&gt;Using nodeSelector&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;As previously described, we have a cluster with an example application scheduled accross the worker nodes evenly by the scheduler.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc get pods -n podtesting -o wide | grep Running
django-psql-example-1-842fl 1/1 Running 0 2m7s 10.131.0.65 compute-3 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-1-h6kst 1/1 Running 0 24m 10.130.2.97 compute-2 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-1-pxhlv 1/1 Running 0 2m7s 10.128.2.13 compute-0 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-1-xms7x 1/1 Running 0 2m7s 10.129.2.10 compute-1 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
postgresql-1-4pcm4 1/1 Running 0 26m 10.131.0.51 compute-3 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;However, our 4 compute nodes are assembled with different hardware specification and are using different harddisks (sdd vs hdd).&lt;/p&gt;
&lt;/div&gt;
&lt;div class="imageblock"&gt;
&lt;div class="content"&gt;
&lt;img src="https://blog.stderr.at/openshift-platform/day-2/images/nodeselector-disktypes.png" alt="Node with different disktypes"/&gt;
&lt;/div&gt;
&lt;div class="title"&gt;Figure 1. Nodes with Different Specifications&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Since our web application must run on fast disks must configure the cluster to schedule the pods on nodes with SSD only.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;To start using nodeSelectors we first &lt;strong&gt;label our nodes&lt;/strong&gt; accordingly:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;compute-0 and compute-1 are faster nodes with an SSD attached.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;compute-2 and compute-2 have a HDD attached.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc label nodes compute-0 compute-1 disktype=ssd &lt;i class="conum" data-value="1"&gt;&lt;/i&gt;&lt;b&gt;(1)&lt;/b&gt;
oc label nodes compute-2 compute-3 disktype=hdd&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="colist arabic"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td&gt;&lt;i class="conum" data-value="1"&gt;&lt;/i&gt;&lt;b&gt;1&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;as key we are using &lt;strong&gt;disktype&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;As crosscheck we can list nodes with a specific label:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc get nodes -l disktype=ssd
NAME STATUS ROLES AGE VERSION
compute-0 Ready worker 7h32m v1.19.0+d59ce34
compute-1 Ready worker 7h31m v1.19.0+d59ce34
oc get nodes -l disktype=hdd
NAME STATUS ROLES AGE VERSION
compute-2 Ready worker 7h32m v1.19.0+d59ce34
compute-3 Ready worker 7h32m v1.19.0+d59ce34&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="admonitionblock warning"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td class="icon"&gt;
&lt;i class="fa icon-warning" title="Warning"&gt;&lt;/i&gt;
&lt;/td&gt;
&lt;td class="content"&gt;
If no matching label is found, the pod cannot be scheduled. Therefore, &lt;strong&gt;always&lt;/strong&gt; label the nodes first.
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;The 2nd step is to add the node selector to the specification of the pod. In our example we are using a DeploymentConfig, so let’s add it there:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc patch dc django-psql-example -n podtesting --patch &amp;#39;{&amp;#34;spec&amp;#34;:{&amp;#34;template&amp;#34;:{&amp;#34;spec&amp;#34;:{&amp;#34;nodeSelector&amp;#34;:{&amp;#34;disktype&amp;#34;:&amp;#34;ssd&amp;#34;}}}}}&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;This adds the nodeSelector into: spec/template/spec&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-yaml hljs" data-lang="yaml"&gt; nodeSelector:
disktype: ssd&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Kubernetes will now trigger a restart of the pods on the supposed nodes.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc get pods -n podtesting -o wide | grep Running
django-psql-example-3-4j92k 1/1 Running 0 42s 10.129.2.7 compute-1 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-3-d7hsd 1/1 Running 0 42s 10.129.2.8 compute-1 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-3-fkbfm 1/1 Running 0 14m 10.128.2.18 compute-0 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-3-psskb 1/1 Running 0 14m 10.128.2.17 compute-0 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;As you can see, only nodes with a SSD (compute-0 and compute-1) are being used.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_controlling_pod_placement_with_project_wide_selector"&gt;Controlling pod placement with project-wide selector&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Adding a nodeSelector to a deployment seems fine…​ until somebody forgets to add it. Then the pods would be started anywhere the scheduler finds suitable. Therefore, it might make sense to use a project-wide node selector, which will automatically be applied on all pods on that project. The project selector is added by the cluster administrator to the &lt;strong&gt;Namespace&lt;/strong&gt; object (no matter what the OpenShift documentation says in it’s example) as &lt;code&gt;openshift.io/node-selector&lt;/code&gt; parameter.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Let’s remove our previous configuration and add the setting to our namespace &lt;em&gt;podtesting&lt;/em&gt;:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="olist arabic"&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;
&lt;p&gt;Cleanup&lt;/p&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Remove the nodeSelector from the deployment configuration and wait until all pods have been reshuffeld&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc patch dc django-psql-example -n podtesting --type=&amp;#39;json&amp;#39; -p=&amp;#39;[{&amp;#34;op&amp;#34;: &amp;#34;remove&amp;#34;, &amp;#34;path&amp;#34;: &amp;#34;/spec/template/spec/nodeSelector&amp;#34;, &amp;#34;value&amp;#34;: &amp;#34;disktype=ssd&amp;#34; }]&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Add the label to the project&lt;/p&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc annotate ns/podtesting openshift.io/node-selector=&amp;#34;disktype=ssd&amp;#34;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;The OpenShift scheduler will now spread the Pods accross compute-0 or compute-1 again, but not on compute-2 or 3.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;We can prove that by stressing our cluster (and nodes) a little bit and scale our frontend application to 10:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc get pods -n podtesting -o wide | grep Running
django-psql-example-4-2jn2l 1/1 Running 0 27s 10.128.2.8 compute-0 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-4-6g7ks 1/1 Running 0 7m47s 10.129.2.23 compute-1 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-4-752nm 1/1 Running 0 7m47s 10.128.2.7 compute-0 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-4-c5jvm 1/1 Running 0 27s 10.129.2.4 compute-1 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-4-f5kwg 1/1 Running 0 27s 10.129.2.5 compute-1 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-4-g7bcs 1/1 Running 0 7m47s 10.129.2.24 compute-1 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-4-h5tgb 1/1 Running 0 27s 10.129.2.6 compute-1 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-4-spvpp 1/1 Running 0 28s 10.128.2.5 compute-0 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-4-v9qwj 1/1 Running 0 7m48s 10.129.2.22 compute-1 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-4-zgwcv 1/1 Running 0 27s 10.128.2.6 compute-0 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;As you can see compute-0 and compute-1 are the only nodes which are used.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_well_known_labels"&gt;Well-Known Labels&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;nodeSelector is one of the easiest ways to control where an application shall be started. Working with labels is therefore very important as soon as workload shall be added to the cluster.
Kubernetes reserves some labels which can be leveraged and some are already predefined on the nodes, for example:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;beta.kubernetes.io/arch=amd64&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;kubernetes.io/hostname=compute-0&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;kubernetes.io/os=linux&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;node-role.kubernetes.io/worker=&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;node.openshift.io/os_id=rhcos&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;A list of all known can be found at: [&lt;a href="#source_1"&gt;1&lt;/a&gt;]&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Two of them I would like to mention here, since they might become very important when designing the placement of pods:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;topology.kubernetes.io/zone&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;topology.kubernetes.io/region&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;With these two labels you can create availability zones for your cluster. A &lt;strong&gt;zone&lt;/strong&gt; can be seen a logical failure domain and a cluster is typically spanned across multiple zones. This could be a rack in a data center for example, hardware which is sharing the same switch or simply different data centers. Zones are seen as independent to each other.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;A &lt;strong&gt;region&lt;/strong&gt; is made up of one or more zones. A cluster is usually not spanned across multiple region.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Kubernetes makes a few assumptions about the structure of zones and regions:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;regions and zones are hierarchical: zones are strict subsets of regions and no zone can be in 2 regions&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;zone names are unique across regions; for example region &amp;#34;africa-east-1&amp;#34; might be comprised of zones &amp;#34;africa-east-1a&amp;#34; and &amp;#34;africa-east-1b&amp;#34;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_cleanup"&gt;Cleanup&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;This concludes the chapter about nodeSelectors. For the next chapter of the Pod Placement Series (&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-pod-affinity/"&gt;Pod Affinity and Anti Affinity&lt;/a&gt;) we need to cleanup our configuration.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="olist arabic"&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;
&lt;p&gt;Scale the frontend down to 2&lt;/p&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc scale --replicas=2 dc/django-psql-example -n podtesting&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Remove the label from the namespace&lt;/p&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc annotate ns/podtesting openshift.io/node-selector- &lt;i class="conum" data-value="1"&gt;&lt;/i&gt;&lt;b&gt;(1)&lt;/b&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="colist arabic"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td&gt;&lt;i class="conum" data-value="1"&gt;&lt;/i&gt;&lt;b&gt;1&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;The minus at the end defines that this annotation shall be removed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;And, just to be sure if you have not done this before, remove the nodeSelector from the DeploymentConfig&lt;/p&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc patch dc django-psql-example -n podtesting --type=&amp;#39;json&amp;#39; -p=&amp;#39;[{&amp;#34;op&amp;#34;: &amp;#34;remove&amp;#34;, &amp;#34;path&amp;#34;: &amp;#34;/spec/template/spec/nodeSelector&amp;#34;, &amp;#34;value&amp;#34;: &amp;#34;disktype=ssd&amp;#34; }]&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_sources"&gt;Sources&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a id="source_1"&gt;&lt;/a&gt;[1]: &lt;a href="https://kubernetes.io/docs/reference/labels-annotations-taints/" target="_blank" rel="noopener"&gt;Well-Known Labels, Annotations and Taints&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</description></item><item><title>Pod Affinity/Anti-Affinity</title><link>https://blog.stderr.at/openshift-platform/day-2/pod-placement/2021-08-28-affinity/</link><pubDate>Thu, 26 Aug 2021 00:00:00 +0000</pubDate><guid>https://blog.stderr.at/openshift-platform/day-2/pod-placement/2021-08-28-affinity/</guid><description>&lt;div class="paragraph"&gt;
&lt;p&gt;While noteSelector provides a very easy way to control where a pod shall be scheduled, the affinity/anti-affinity feature, expands this configuration with more expressive rules like logical AND operators, constraints against labels on other pods or soft rules instead of hard requirements.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;The feature comes with two types:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;pod affinity/anti-affinity - allows constrains against other pod labels rather than node labels.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;node affinity - allows pods to specify a group of nodes they can be placed on&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_pod_placement_series"&gt;Pod Placement Series&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Please check out other ways of pod placements:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;div class="expand"&gt;
&lt;div class="expand-label" style="cursor: pointer;" onclick="$h = $(this);$h.next('div').slideToggle(100,function () {$h.children('i').attr('class',function () {return $h.next('div').is(':visible') ? 'fas fa-chevron-down' : 'fas fa-chevron-right';});});"&gt;
&lt;i style="font-size:x-small;" class="fas fa-chevron-right"&gt;&lt;/i&gt;
&lt;span&gt;
&lt;a&gt;Expand me...&lt;/a&gt;
&lt;/span&gt;
&lt;/div&gt;
&lt;div class="expand-content" style="display: none;"&gt;
&lt;div class="olist arabic"&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-nodeselector/"&gt;NodeSelector&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-pod-affinity/"&gt;Pod Affinity and Anti Affinity&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-node-affinity/"&gt;Node Affinity&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-taints-and-tolerations"&gt;Taints and Tolerations&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-topology-spread-constraints/"&gt;Topology Spread Constraints&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/descheduler/"&gt;Descheduler&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_pod_affinity_and_anti_affinity"&gt;Pod Affinity and Anti-Affinity&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;&lt;strong&gt;Affinity&lt;/strong&gt; and &lt;strong&gt;Anti-Affinity&lt;/strong&gt; controls the nodes on which a pod should (or should not) be scheduled &lt;em&gt;based on labels on Pods that are already scheduled on the node&lt;/em&gt;. This is a different approach than nodeSelector, since it does not directly take the node labels into account. That said, one example for such setup would be: You have dedicated nodes for developement and production workload and you have to be sure that pods of dev or prod applications do not run on the same node.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Affinity and Anti-Affinity are shortly defined as:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Pod affinity - tells scheduler to put a new pod onto the same node as other pods (selection is done using label selectors)&lt;/p&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;I want to run where this other labelled pod is already running (Pods from same service shall be running on same node.)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Pod Anti-Affinity - prevents the scheduler to place a new pod onto the same nodes with pods with the same labels&lt;/p&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;I definitely do not want to start a pod where this other pod with the defined label is running (to prevent that all Pods of same service are running in the same availability zone.)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="admonitionblock warning"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td class="icon"&gt;
&lt;i class="fa icon-warning" title="Warning"&gt;&lt;/i&gt;
&lt;/td&gt;
&lt;td class="content"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;As described in the official &lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity" target="_blank" rel="noopener"&gt;Kubernetes documention&lt;/a&gt; two things should be considered:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="olist arabic"&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;
&lt;p&gt;The affinity feature requires processing which can slow down the cluster. Therefore, it is not recommended for cluster with more than 100 nodes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The feature requires that all nodes are consistently labelled (&lt;code&gt;topologyKey&lt;/code&gt;). Unintended behaviour might happen if labels are missing.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_using_affinityanti_affinity"&gt;Using Affinity/Anti-Affinity&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Currently two types of pod affinity/anti-affinity are known:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;requiredDuringSchedulingIgnoreDuringExecuption&lt;/strong&gt; (short &lt;em&gt;required&lt;/em&gt;) - a hard requirement which must be met before a pod can be scheduled&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;preferredDuringSchedulingIgnoredDuringExecution&lt;/strong&gt; (short &lt;em&gt;preferred&lt;/em&gt;) - a soft requirement the scheduler &lt;strong&gt;tries&lt;/strong&gt; to meet, but does not guarantee it&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="admonitionblock note"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td class="icon"&gt;
&lt;i class="fa icon-note" title="Note"&gt;&lt;/i&gt;
&lt;/td&gt;
&lt;td class="content"&gt;
Both types can be defined in the same specification. In such case the node must first meet the required rule and then attempt based on best effort to meet the preferred rule.
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;div class="sect2"&gt;
&lt;h3 id="_preparing_node_labels"&gt;Preparing node labels&lt;/h3&gt;
&lt;div class="admonitionblock note"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td class="icon"&gt;
&lt;i class="fa icon-note" title="Note"&gt;&lt;/i&gt;
&lt;/td&gt;
&lt;td class="content"&gt;
Remember the prerequisites explained in the . &lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-pod-affinity/"&gt;Pod Placement - Introduction&lt;/a&gt;. We have 4 compute nodes and an example web application up and running.
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Before we start with affinity rules we need to label all nodes. Let’s create 2 zones (east and west) for our compute nodes using the well-known label &lt;code&gt;topology.kubernetes.io/zone&lt;/code&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class="imageblock"&gt;
&lt;div class="content"&gt;
&lt;img src="https://blog.stderr.at/openshift-platform/day-2/images/affinity-kubernetes.zones.png" alt="Node Zones"/&gt;
&lt;/div&gt;
&lt;div class="title"&gt;Figure 1. Node Zones&lt;/div&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc label nodes compute-0 compute-1 topology.kubernetes.io/zone=east
oc label nodes compute-2 compute-3 topology.kubernetes.io/zone=west&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect2"&gt;
&lt;h3 id="_configure_pod_affinity_rule"&gt;Configure pod affinity rule&lt;/h3&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;In our example we have one database pod and multiple web application pods. Let’s image we would like to always run these pods in the same zone.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;&lt;strong&gt;The pod affinity defines that a pod can be scheduled onto a node ONLY if that node is in the same zone as at least one already-running pod with a certain label.&lt;/strong&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;This means we must first label the postgres pod accordingly. Let’s labels the pod with &lt;code&gt;security=zone1&lt;/code&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc patch dc postgresql -n podtesting --type=&amp;#39;json&amp;#39; -p=&amp;#39;[{&amp;#34;op&amp;#34;: &amp;#34;add&amp;#34;, &amp;#34;path&amp;#34;: &amp;#34;/spec/template/metadata/labels/security&amp;#34;, &amp;#34;value&amp;#34;: &amp;#34;zone1&amp;#34; }]&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;After a while the postgres pod is restarted on one of the 4 compute nodes (since we did not specify in which zone this single pod shall be started) - here compute-1 (zone == east):&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc get pods -n podtesting -o wide | grep Running
postgresql-5-6v5h6 1/1 Running 0 119s 10.129.2.24 compute-1 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;As a second step, the deployment configuration of the web application must be modified. Remember, we want to run web application pods only on nodes located in the same zone as the postgres pods. In our example this would be either compute-0 or compute-1.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Modify the config accordingly:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-yaml hljs" data-lang="yaml"&gt;kind: DeploymentConfig
apiVersion: apps.openshift.io/v1
[...]
namespace: podtesting
spec:
[...]
template:
[...]
spec:
[...]
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security &lt;i class="conum" data-value="1"&gt;&lt;/i&gt;&lt;b&gt;(1)&lt;/b&gt;
operator: In &lt;i class="conum" data-value="2"&gt;&lt;/i&gt;&lt;b&gt;(2)&lt;/b&gt;
values:
- zone1 &lt;i class="conum" data-value="3"&gt;&lt;/i&gt;&lt;b&gt;(3)&lt;/b&gt;
topologyKey: topology.kubernetes.io/zone &lt;i class="conum" data-value="4"&gt;&lt;/i&gt;&lt;b&gt;(4)&lt;/b&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="colist arabic"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td&gt;&lt;i class="conum" data-value="1"&gt;&lt;/i&gt;&lt;b&gt;1&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;The key of the label of a pod which is already running on that node is &amp;#34;security&amp;#34;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;i class="conum" data-value="2"&gt;&lt;/i&gt;&lt;b&gt;2&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;As operator &amp;#34;In&amp;#34; is used the postgres pod must have a matching key (security) containing the value (zone1). Other options like &amp;#34;NotIn&amp;#34;, &amp;#34;DoesNotExist&amp;#34; or &amp;#34;Exact&amp;#34; are available as well&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;i class="conum" data-value="3"&gt;&lt;/i&gt;&lt;b&gt;3&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;The value must be &amp;#34;zone1&amp;#34;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;i class="conum" data-value="4"&gt;&lt;/i&gt;&lt;b&gt;4&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;As topology the topology.kubernetes.io/zone is used. The application can be deployed on nodes with the same label&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Setting this (and maybe scaling the replicas up a little bit) will start all frontend pods either on compute-0 or on compute-1.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;&lt;strong&gt;In other words: On nodes of the same zone, where the postgres pod with the label security=zone1 is running.&lt;/strong&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc get pods -n podtesting -o wide | grep Running
django-psql-example-13-4w6qd 1/1 Running 0 67s 10.128.2.58 compute-0 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-13-655dj 1/1 Running 0 67s 10.129.2.28 compute-1 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-13-9d4pj 1/1 Running 0 67s 10.129.2.27 compute-1 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-13-bdwhb 1/1 Running 0 67s 10.128.2.61 compute-0 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-13-d4jrw 1/1 Running 0 67s 10.128.2.57 compute-0 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-13-dm9qk 1/1 Running 0 67s 10.128.2.60 compute-0 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-13-ktmfm 1/1 Running 0 67s 10.129.2.25 compute-1 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-13-ldm56 1/1 Running 0 77s 10.128.2.55 compute-0 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-13-mh2f5 1/1 Running 0 67s 10.129.2.29 compute-1 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-13-qfkhq 1/1 Running 0 67s 10.129.2.26 compute-1 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-13-v88qv 1/1 Running 0 67s 10.128.2.56 compute-0 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-13-vfgf4 1/1 Running 0 67s 10.128.2.59 compute-0 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
postgresql-5-6v5h6 1/1 Running 0 3m18s 10.129.2.24 compute-1 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect2"&gt;
&lt;h3 id="_configure_pod_anti_affinity_rule"&gt;Configure pod anti-affinity rule&lt;/h3&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;For now the database pod and the web application pod are running on nodes of the same zone. However, somebody is asking us to configure it vice versa: the web application should not run in the same zone as postgresql.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Here we can use the Anti-Affinity feature.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="admonitionblock note"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td class="icon"&gt;
&lt;i class="fa icon-note" title="Note"&gt;&lt;/i&gt;
&lt;/td&gt;
&lt;td class="content"&gt;
As an alternative, it would also be possible to change the operator in the affinity rule from &amp;#34;In&amp;#34; to &amp;#34;NotIn&amp;#34;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-yaml hljs" data-lang="yaml"&gt;kind: DeploymentConfig
apiVersion: apps.openshift.io/v1
[...]
namespace: podtesting
spec:
[...]
template:
[...]
spec:
[...]
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security &lt;i class="conum" data-value="1"&gt;&lt;/i&gt;&lt;b&gt;(1)&lt;/b&gt;
operator: In &lt;i class="conum" data-value="2"&gt;&lt;/i&gt;&lt;b&gt;(2)&lt;/b&gt;
values:
- zone1 &lt;i class="conum" data-value="3"&gt;&lt;/i&gt;&lt;b&gt;(3)&lt;/b&gt;
topologyKey: topology.kubernetes.io/zone &lt;i class="conum" data-value="4"&gt;&lt;/i&gt;&lt;b&gt;(4)&lt;/b&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;This will force the web application pods to run only on &amp;#34;west&amp;#34; zone nodes.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;django-psql-example-16-4n9h5 1/1 Running 0 40s 10.131.1.53 compute-3 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-16-blf8b 1/1 Running 0 29s 10.130.2.63 compute-2 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-16-f9plb 1/1 Running 0 29s 10.130.2.64 compute-2 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-16-tm5rm 1/1 Running 0 28s 10.131.1.55 compute-3 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-16-x8lbh 1/1 Running 0 29s 10.131.1.54 compute-3 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-16-zb5fg 1/1 Running 0 28s 10.130.2.65 compute-2 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
postgresql-5-6v5h6 1/1 Running 0 18m 10.129.2.24 compute-1 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_combining_required_and_preferred_affinities"&gt;Combining required and preferred affinities&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;It is possible to combine requiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExecution. In such case the required affinity MUST be met, while the preferred affinity is tried to be met. The following examples combines these two types in an affinity and anti-affinity specification.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;The &lt;code&gt;podAffinity&lt;/code&gt; block defines the same as above: schedule the pod on a node of the same zone, where a pod with the label &lt;code&gt;security=zone1&lt;/code&gt; is running.
The &lt;code&gt;podAntiAffinity&lt;/code&gt; defines that the pod should not be started on a node if that node has a pod running with the label &lt;code&gt;security=zone2&lt;/code&gt;. However, the scheduler might decide to do so as long the &lt;code&gt;podAffinity&lt;/code&gt; rule is met.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-yaml hljs" data-lang="yaml"&gt;spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- zone1
topologyKey: topology.kubernetes.io/zone
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100 &lt;i class="conum" data-value="1"&gt;&lt;/i&gt;&lt;b&gt;(1)&lt;/b&gt;
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- zone2
topologyKey: topology.kubernetes.io/zone&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="colist arabic"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td&gt;&lt;i class="conum" data-value="1"&gt;&lt;/i&gt;&lt;b&gt;1&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;The &lt;code&gt;weight&lt;/code&gt; field is used by the scheduler to create a scoring. The higher the scoring the more preferred is that node.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;div class="sect2"&gt;
&lt;h3 id="_topologykey"&gt;topologyKey&lt;/h3&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;It is important to understand the &lt;code&gt;topologyKey&lt;/code&gt; setting. This is the key for the node label. If an affinity rule is met, Kubernetes will try to find suitable nodes which are labelled with the topologyKey. All nodes must be labelled consistently, otherwise unintended behaviour might occur.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;As described in the Kubernetes documentation at &lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity" target="_blank" rel="noopener"&gt;Pod Affinity and Anti-Affinity&lt;/a&gt;, the topologyKey has some constraints:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;&lt;em&gt;Quote Kubernetes:&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class="olist arabic"&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;For pod affinity, empty &lt;code&gt;topologyKey&lt;/code&gt; is not allowed in both &lt;code&gt;requiredDuringSchedulingIgnoredDuringExecution&lt;/code&gt; and &lt;code&gt;preferredDuringSchedulingIgnoredDuringExecution&lt;/code&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;For pod anti-affinity, empty &lt;code&gt;topologyKey&lt;/code&gt; is also not allowed in both &lt;code&gt;requiredDuringSchedulingIgnoredDuringExecution&lt;/code&gt; and &lt;code&gt;preferredDuringSchedulingIgnoredDuringExecution&lt;/code&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;For &lt;code&gt;requiredDuringSchedulingIgnoredDuringExecution&lt;/code&gt; pod anti-affinity, the admission controller &lt;code&gt;LimitPodHardAntiAffinityTopology&lt;/code&gt; was introduced to limit &lt;code&gt;topologyKey&lt;/code&gt; to &lt;code&gt;kubernetes.io/hostname&lt;/code&gt;. If you want to make it available for custom topologies, you may modify the admission controller, or disable it.&lt;/em&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Except for the above cases, the &lt;code&gt;topologyKey&lt;/code&gt; can be any legally label-key.&lt;/em&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;&lt;em&gt;End of quote&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_cleanup"&gt;Cleanup&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;As cleanup simply remove the affinity specification from the DeploymentConf. The node labels can stay as they are since they do not hurt.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_summary"&gt;Summary&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;This concludes the quick overview of the pod affinity. The next chapter will discuss &lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-node-affinity/"&gt;Node Affinity&lt;/a&gt; rules, which allows affinity based on node specifications.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</description></item><item><title>Taints and Tolerations</title><link>https://blog.stderr.at/openshift-platform/day-2/pod-placement/2021-08-30-taints/</link><pubDate>Thu, 26 Aug 2021 00:00:00 +0000</pubDate><guid>https://blog.stderr.at/openshift-platform/day-2/pod-placement/2021-08-30-taints/</guid><description>&lt;div class="paragraph"&gt;
&lt;p&gt;While Node Affinity is a property of pods that attracts them to a set of nodes, taints are the exact opposite. Nodes can be configured with one or more taints, which mark the node in a way to only accept pods that do tolerate the taints. The tolerations themselves are applied to pods, telling the scheduler to accept a taints and start the workload on a tainted node.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;A common use case would be to mark certain nodes as infrastructure nodes, where only specific pods are allowed to be executed or to taint nodes with a special hardware (i.e. GPU).&lt;/p&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_pod_placement_series"&gt;Pod Placement Series&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Please check out other ways of pod placements:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;div class="expand"&gt;
&lt;div class="expand-label" style="cursor: pointer;" onclick="$h = $(this);$h.next('div').slideToggle(100,function () {$h.children('i').attr('class',function () {return $h.next('div').is(':visible') ? 'fas fa-chevron-down' : 'fas fa-chevron-right';});});"&gt;
&lt;i style="font-size:x-small;" class="fas fa-chevron-right"&gt;&lt;/i&gt;
&lt;span&gt;
&lt;a&gt;Expand me...&lt;/a&gt;
&lt;/span&gt;
&lt;/div&gt;
&lt;div class="expand-content" style="display: none;"&gt;
&lt;div class="olist arabic"&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-nodeselector/"&gt;NodeSelector&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-pod-affinity/"&gt;Pod Affinity and Anti Affinity&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-node-affinity/"&gt;Node Affinity&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-taints-and-tolerations"&gt;Taints and Tolerations&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-topology-spread-constraints/"&gt;Topology Spread Constraints&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/descheduler/"&gt;Descheduler&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_understanding_taints_and_tolerations"&gt;Understanding Taints and Tolerations&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Matching taints and tolerations is defined by a key/value pair and a taint effect.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;For example, a node can be tainted as:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-yaml hljs" data-lang="yaml"&gt;spec:
[...]
template:
[...]
spec:
taints:
- effect: NoExecute
key: key1
value: value1
[...]&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;And a pod can have the matching toleration:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-yaml hljs" data-lang="yaml"&gt;spec:
[...]
template:
[...]
spec:
tolerations:
- key: &amp;#34;key1&amp;#34;
operator: &amp;#34;Equal&amp;#34;
value: &amp;#34;value1&amp;#34;
effect: &amp;#34;NoExecute&amp;#34;
tolerationSeconds: 3600
[...]&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;This means that the pod is tolerating the taint of the node with the pair &lt;strong&gt;key1=value1&lt;/strong&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="sect2"&gt;
&lt;h3 id="_parameter_effect"&gt;Parameter: effect&lt;/h3&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Above example is using as effect &lt;code&gt;NoExecute&lt;/code&gt;. The following effects are possible:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;NoSchedule&lt;/code&gt; - new pods are not scheduled, existing pods remain&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;PreferNoSchedule&lt;/code&gt; - new pods are not preferred but can still be scheduled but the scheduler tries to avoid that, existing pods remain&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;NoExecute&lt;/code&gt; - new pods are not scheduled, existing pods (without matching toleration) are &lt;strong&gt;removed&lt;/strong&gt;! The setting &lt;code&gt;tolerationSeconds&lt;/code&gt; is used to define a maximum time until a pod is allowed to stay.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect2"&gt;
&lt;h3 id="_parameter_operator"&gt;Parameter: operator&lt;/h3&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;For the operator two options are possible:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;Exists&lt;/code&gt; - simply checks if a key exists and key and effect matches. No value should be configured in this case.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;Equal&lt;/code&gt; (default) - with this option key, value and effect must match exactly.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_configuring_taints_and_tolerations"&gt;Configuring Taints and Tolerations&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Our compute-0 node is a special node with a GPU installed. We would like that our web application is running on this node and ONLY our web application is running on that node. To achieve this, we will taint the node accordingly with the key/value &lt;code&gt;gpu=enabled&lt;/code&gt; and the effect &lt;code&gt;NoExecute&lt;/code&gt; and configure a toleration to the DeploymentConfig of our web fronted.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="imageblock"&gt;
&lt;div class="content"&gt;
&lt;img src="https://blog.stderr.at/openshift-platform/day-2/images/tainting-node.png" alt="Tainting compute-0"/&gt;
&lt;/div&gt;
&lt;div class="title"&gt;Figure 1. Tainting Node&lt;/div&gt;
&lt;/div&gt;
&lt;div class="admonitionblock warning"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td class="icon"&gt;
&lt;i class="fa icon-warning" title="Warning"&gt;&lt;/i&gt;
&lt;/td&gt;
&lt;td class="content"&gt;
Always configure tolerations before you taint a node. Otherwise the scheduler might not be able to start a pod until it has been configured to tolerate the taints.
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;First we will set the toleration to the DeploymentConfig:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc edit dc/django-psql-example -n podtesting&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;And add a toleration for the key/value pair, the effect NoExecute and a tolerationSeconds of X seconds.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-yaml hljs" data-lang="yaml"&gt;spec:
[...]
template:
[...]
spec:
tolerations:
- key: &amp;#34;gpu&amp;#34;
value: &amp;#34;enabled&amp;#34;
operator: &amp;#34;Equal&amp;#34;
effect: &amp;#34;NoExecute&amp;#34;
tolerationSeconds: 30 &lt;i class="conum" data-value="1"&gt;&lt;/i&gt;&lt;b&gt;(1)&lt;/b&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="colist arabic"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td&gt;&lt;i class="conum" data-value="1"&gt;&lt;/i&gt;&lt;b&gt;1&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;I am setting the tolerationSeconds to 30 seconds to get it done quicker.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Before we taint our node, let’s check which pods are currently running there:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc get pods -A -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,STATE:.status.phase | grep compute-0 | grep Running
tuned-hrvl4 compute-0 Running
dns-default-ln9s4 compute-0 Running
node-resolver-qk24n compute-0 Running
node-ca-lxrvf compute-0 Running
ingress-canary-fv5w6 compute-0 Running
machine-config-daemon-558v4 compute-0 Running
grafana-78ccdb8c9d-8rqsp compute-0 Running
node-exporter-rn8h4 compute-0 Running
prometheus-adapter-66976bf759-fxdcd compute-0 Running
multus-additional-cni-plugins-29qgq compute-0 Running
multus-mkd87 compute-0 Running
network-metrics-daemon-64hrw compute-0 Running
network-check-target-l6l7n compute-0 Running
ovnkube-node-hrtln compute-0 Running
django-psql-example-24-c599b compute-0 Running
django-psql-example-24-l7znv compute-0 Running
django-psql-example-24-zpg77 compute-0 Running&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Multiple different pods are running here, as well as two of our web frontend (django-psql-example)&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;The other django-psql-example pods are started accross the cluster:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc get pods -n podtesting -o wide
oc get pods -n podtesting -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
django-psql-example-25-8ppxc 1/1 Running 0 21s 10.128.0.61 master-2 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-25-8rv76 1/1 Running 0 21s 10.130.2.92 compute-2 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-25-9m8k2 1/1 Running 0 21s 10.129.0.67 master-1 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-25-fhvxg 1/1 Running 0 31s 10.128.2.126 compute-0 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-25-kqgwz 0/1 Running 0 21s 10.130.0.71 master-0 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-25-m66nn 1/1 Running 0 21s 10.130.0.70 master-0 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-25-ntjqb 1/1 Running 0 21s 10.128.0.60 master-2 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-25-nxxqh 1/1 Running 0 21s 10.130.2.93 compute-2 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-25-p8nbz 1/1 Running 0 21s 10.131.1.183 compute-3 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-25-ttr9g 1/1 Running 0 21s 10.129.2.57 compute-1 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-25-xn4fp 1/1 Running 0 21s 10.129.2.56 compute-1 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-25-xpqf4 1/1 Running 0 21s 10.128.2.127 compute-0 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-25-xwmwv 1/1 Running 0 21s 10.131.1.184 compute-3 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;To taint the node we can simply execute the following command. Be sure to use the same values as in the toleration:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc adm taint nodes compute-0 gpu=enabled:NoExecute&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;This will create the following specification in the node object:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-yaml hljs" data-lang="yaml"&gt;spec:
[...]
taints:
- effect: NoExecute
key: gpu
value: enabled&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;OpenShift will allow pods, which are not tolerating the taints, to keep on running for 30 seconds. After that, these pods will be evicted and started elsewhere.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;When we check after the tolerationSeconds time has passed which pods are running on the node compute-0, we will see that most pods have disappeared, except the pods for the webapplication and pods which are part of DaemonSets. (DaemonSets are defined to run on all or specific nodes and are not evicted. The cluster-DaemonSets are tolerating everything)&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc get pods -A -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,STATE:.status.phase | grep compute-0 | grep Running
tuned-hrvl4 compute-0 Running
node-resolver-qk24n compute-0 Running
node-ca-lxrvf compute-0 Running
machine-config-daemon-558v4 compute-0 Running
node-exporter-rn8h4 compute-0 Running
multus-additional-cni-plugins-29qgq compute-0 Running
multus-mkd87 compute-0 Running
network-metrics-daemon-64hrw compute-0 Running
network-check-target-l6l7n compute-0 Running
ovnkube-node-hrtln compute-0 Running
django-psql-example-25-8dzqh compute-0 Running
django-psql-example-25-9p4sb compute-0 Running
django-psql-example-25-pqbvn compute-0 Running
django-psql-example-25-sb6hr compute-0 Running&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_removing_taints"&gt;Removing taints&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Taints can be simply removed with the oc command added a trailing &lt;code&gt;-&lt;/code&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc adm taint nodes compute-0 gpu=enabled:NoExecute-&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_built_in_taints_taint_nodes_by_condition"&gt;Built-in Taints - Taint Nodes by Condition&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Several taints are built into OpenShift and are set during certain events (aka Taint Nodes by Condition) and cleared when the condition is resolved.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;The following list is quoted from the OpenShift documentation and provides a list of taints which are automatically set. For example, when a node becomes unavailable:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;node.kubernetes.io/not-ready&lt;/code&gt;: The node is not ready. This corresponds to the node condition Ready=False.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;node.kubernetes.io/unreachable&lt;/code&gt;: The node is unreachable from the node controller. This corresponds to the node condition Ready=Unknown.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;node.kubernetes.io/out-of-disk&lt;/code&gt;: The node has insufficient free space on the node for adding new pods. This corresponds to the node condition OutOfDisk=True.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;node.kubernetes.io/memory-pressure&lt;/code&gt;: The node has memory pressure issues. This corresponds to the node condition MemoryPressure=True.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;node.kubernetes.io/disk-pressure&lt;/code&gt;: The node has disk pressure issues. This corresponds to the node condition DiskPressure=True.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;node.kubernetes.io/network-unavailable&lt;/code&gt;: The node network is unavailable.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;node.kubernetes.io/unschedulable&lt;/code&gt;: The node is unschedulable.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;node.cloudprovider.kubernetes.io/uninitialized&lt;/code&gt;: When the node controller is started with an external cloud provider, this taint is set on a node to mark it as unusable. After a controller from the cloud-controller-manager initializes this node, the kubelet removes this taint.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Depending on the condition, the node will either have the effect &lt;code&gt;NoSchedule&lt;/code&gt;, which means no new pods will be started there (unless a toleration is configured) or the effect &lt;code&gt;NoExecute&lt;/code&gt;, which will evict pods with no tolerations from the node. Typical examples for NoSchedule condition would be: memory-pressure or disk-pressure. If a node is unreachable or not-ready, then it will be automatically tainted with the effect NoExecute. &lt;code&gt;tolerationSeconds&lt;/code&gt; (default 300) will be respected.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_tolerating_all_taints"&gt;Tolerating all taints&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Tolerations can be configured to tolerate all possible taints. In such case no &lt;code&gt;value&lt;/code&gt; or &lt;code&gt;key&lt;/code&gt; is configured and &lt;code&gt;operator: &amp;#34;Exists&amp;#34;&lt;/code&gt; is used:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-yaml hljs" data-lang="yaml"&gt;spec:
[...]
template:
[...]
spec:
tolerations:
- operator: “Exists”&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="admonitionblock note"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td class="icon"&gt;
&lt;i class="fa icon-note" title="Note"&gt;&lt;/i&gt;
&lt;/td&gt;
&lt;td class="content"&gt;
Some cluster daemonsets (i.e. tuned) are configured this way.
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_cleanup"&gt;Cleanup&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Remove the taint from the node:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc adm taint nodes compute-0 gpu=enabled:NoExecute-&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;And remove the toleration specification from the DeploymentConfig&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-yaml hljs" data-lang="yaml"&gt;spec:
[...]
template:
[...]
spec:
tolerations:
- key: &amp;#34;gpu&amp;#34;
value: &amp;#34;enabled&amp;#34;
operator: &amp;#34;Equal&amp;#34;
effect: &amp;#34;NoExecute&amp;#34;
tolerationSeconds: 30 &lt;i class="conum" data-value="1"&gt;&lt;/i&gt;&lt;b&gt;(1)&lt;/b&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</description></item><item><title>Topology Spread Constraints</title><link>https://blog.stderr.at/openshift-platform/day-2/pod-placement/2021-08-31-topologyspreadcontraints/</link><pubDate>Thu, 26 Aug 2021 00:00:00 +0000</pubDate><guid>https://blog.stderr.at/openshift-platform/day-2/pod-placement/2021-08-31-topologyspreadcontraints/</guid><description>&lt;div class="paragraph"&gt;
&lt;p&gt;&lt;strong&gt;Topology spread constraints&lt;/strong&gt; is a new feature since Kubernetes 1.19 (OpenShift 4.6) and another way to control where pods shall be started. It allows to use failure-domains, like zones or regions or to define custom topology domains. It heavily relies on configured node labels, which are used to define topology domains. This feature is a more granular approach than affinity, allowing to achieve higher availability.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_pod_placement_series"&gt;Pod Placement Series&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Please check out other ways of pod placements:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;div class="expand"&gt;
&lt;div class="expand-label" style="cursor: pointer;" onclick="$h = $(this);$h.next('div').slideToggle(100,function () {$h.children('i').attr('class',function () {return $h.next('div').is(':visible') ? 'fas fa-chevron-down' : 'fas fa-chevron-right';});});"&gt;
&lt;i style="font-size:x-small;" class="fas fa-chevron-right"&gt;&lt;/i&gt;
&lt;span&gt;
&lt;a&gt;Expand me...&lt;/a&gt;
&lt;/span&gt;
&lt;/div&gt;
&lt;div class="expand-content" style="display: none;"&gt;
&lt;div class="olist arabic"&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-nodeselector/"&gt;NodeSelector&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-pod-affinity/"&gt;Pod Affinity and Anti Affinity&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-node-affinity/"&gt;Node Affinity&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-taints-and-tolerations"&gt;Taints and Tolerations&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-topology-spread-constraints/"&gt;Topology Spread Constraints&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/descheduler/"&gt;Descheduler&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_understanding_topology_spread_constraints"&gt;Understanding Topology Spread Constraints&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Imagine, like for pod affinity, we have two zones (east and west) in our 4 compute node cluster.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="imageblock"&gt;
&lt;div class="content"&gt;
&lt;img src="https://blog.stderr.at/openshift-platform/day-2/images/affinity-kubernetes.zones.png" alt="Node Zones"/&gt;
&lt;/div&gt;
&lt;div class="title"&gt;Figure 1. Node Zones&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;These zones are defined by node labels, which can be created with the following commands:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc label nodes compute-0 compute-1 topology.kubernetes.io/zone=east
oc label nodes compute-2 compute-3 topology.kubernetes.io/zone=west&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Now let’s scale our web application to 3 pods. The OpenShift scheduler will try to evenly spread the pods accross the cluster:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc get pods -n podtesting -o wide
django-psql-example-31-jrhsr 0/1 Running 0 8s 10.130.2.112 compute-2 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-31-q66hk 0/1 Running 0 8s 10.131.1.219 compute-3 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;
django-psql-example-31-xv7jc 0/1 Running 0 8s 10.128.3.115 compute-0 &amp;lt;none&amp;gt; &amp;lt;none&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="imageblock"&gt;
&lt;div class="content"&gt;
&lt;img src="https://blog.stderr.at/openshift-platform/day-2/images/topologyspreadconstraints1.png" alt="Running Pods"/&gt;
&lt;/div&gt;
&lt;div class="title"&gt;Figure 2. Running Pods&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;If a new pod is started the scheduler may try to start it on compute-1. However, this is done based on best effort. The scheduler does not guarantee that and may try to start it on one of the nodes in zone &amp;#34;West&amp;#34;. With the configuration &lt;code&gt;topologySpreadConstraints&lt;/code&gt; this can be controlled and incoming pods can only be scheduled on a node of zone &amp;#34;East&amp;#34; (either on compute-0 or compute-1).&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_configure_topologyspreadcontraint"&gt;Configure TopologySpreadContraint&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Let’s configure our topology by adding the following into the DeploymentConfig:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-yaml hljs" data-lang="yaml"&gt;spec:
[...]
template:
[...]
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
name: django-psql-example&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;This defines the following parameter:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;maxSkew&lt;/code&gt; - defines the degree to which pods may be unevenly distributed (must be greater than zero). Depending on the &lt;code&gt;whenUnsatisfiable&lt;/code&gt; parameter the bahaviour differs:&lt;/p&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;whenUnsatisfiable == DoNotSchedule: would not schedule if the maxSkew is is not met.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;whenUnsatisfiable == ScheduleAnyway: the scheduler will still schedule and gives the topology which would decrease the skew a better score&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;topologyKey&lt;/code&gt; - key of node lables&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;whenUnsatisfiable&lt;/code&gt; - defines what to do with a pod if the spread constraint is not met. Can be either &lt;code&gt;DoNotSchedule&lt;/code&gt; (default) or &lt;code&gt;ScheduleAnyway&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;lableSelector&lt;/code&gt; - used to find matching pods. Found pods are considered to be part of the topology domain. In our example we simply use a default label of the application: &lt;code&gt;name: django-psql-example&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;If now a 4th pod shall be started the topologySpreadConstraints will allow this pod on one of the nodes of zone=east (either compute-0 or compute-1). It cannot be scheduled on nodes in zone=west since it would violate the &lt;code&gt;maxSkew: 1&lt;/code&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre&gt;Pod distribution would be 1 in zone east and 3 on zone west.
--&amp;gt; the skew would then be 3 - 1 = 2 which is not equal to 1&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_configure_multiple_topologyspreadcontraint"&gt;Configure multiple TopologySpreadContraint&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;It is possible to configure multiple TopologySpreadConstraints. In such a case all contrains must meet the requirement (logical AND). For example:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-yaml hljs" data-lang="yaml"&gt;spec:
[...]
template:
[...]
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
name: django-psql-example
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
name: django-psql-example&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;The first part is identical to our first example and would allow the schedule to start a pod in &lt;code&gt;topology.kubernetes.io/zone: east&lt;/code&gt; (compute-0 or compute-1). The second configuration defines not a zone but a node hostname. Now the scheduler can only deploy into zone=east AND onto node=compute-1&lt;/p&gt;
&lt;/div&gt;
&lt;div class="sect2"&gt;
&lt;h3 id="_conflicts_with_multiple_topologyspreadconstraints"&gt;Conflicts with multiple TopologySpreadConstraints&lt;/h3&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;If you use multiple constraints conflicts are possible. For example, you have a 3 node cluster and the 2 zones east and west. In such cases the maxSkew might be increased or the &lt;code&gt;whenUnsatisfiable&lt;/code&gt; might be set to ScheduleAnyway&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;See &lt;a href="https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/#example-multiple-topologyspreadconstraints" class="bare"&gt;https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/#example-multiple-topologyspreadconstraints&lt;/a&gt; for further information.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_cleanup"&gt;Cleanup&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Remove the topologySpreadConstraint&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-yaml hljs" data-lang="yaml"&gt;spec:
[...]
template:
[...]
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
name: django-psql-example&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</description></item><item><title>Using Descheduler</title><link>https://blog.stderr.at/openshift-platform/day-2/pod-placement/2021-09-01-descheduler/</link><pubDate>Thu, 26 Aug 2021 00:00:00 +0000</pubDate><guid>https://blog.stderr.at/openshift-platform/day-2/pod-placement/2021-09-01-descheduler/</guid><description>&lt;div class="paragraph"&gt;
&lt;p&gt;&lt;strong&gt;Descheduler&lt;/strong&gt; is a new feature which is GA since OpenShift 4.7. It can be used to evict pods from nodes based on specific strategies. The evicted pod is then scheduled on another node (by the Scheduler) which is more suitable.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;This feature can be used when:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;nodes are under/over-utilized&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;pod or node affinity, taints or labels have changed and are no longer valid for a running pod&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;node failures&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;pods have been restarted too many times&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_pod_placement_series"&gt;Pod Placement Series&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Please check out other ways of pod placements:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;div class="expand"&gt;
&lt;div class="expand-label" style="cursor: pointer;" onclick="$h = $(this);$h.next('div').slideToggle(100,function () {$h.children('i').attr('class',function () {return $h.next('div').is(':visible') ? 'fas fa-chevron-down' : 'fas fa-chevron-right';});});"&gt;
&lt;i style="font-size:x-small;" class="fas fa-chevron-right"&gt;&lt;/i&gt;
&lt;span&gt;
&lt;a&gt;Expand me...&lt;/a&gt;
&lt;/span&gt;
&lt;/div&gt;
&lt;div class="expand-content" style="display: none;"&gt;
&lt;div class="olist arabic"&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-nodeselector/"&gt;NodeSelector&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-pod-affinity/"&gt;Pod Affinity and Anti Affinity&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-node-affinity/"&gt;Node Affinity&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-taints-and-tolerations"&gt;Taints and Tolerations&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/pod-placement-topology-spread-constraints/"&gt;Topology Spread Constraints&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.stderr.at/openshift/day-2/descheduler/"&gt;Descheduler&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_descheduler_profiles"&gt;Descheduler Profiles&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;The following descheduler profiles are known and one or multiple can be configured for the Descheduler. Each profiles enables certain strategies which the Descheduler is leveraging.
Since the strategies names are more or less self explaining, I did not add their full description here. Instead detailed information can be found at: &lt;a href="https://docs.openshift.com/container-platform/4.8/nodes/scheduling/nodes-descheduler.html#nodes-descheduler-profiles_nodes-descheduler" target="_blank" rel="noopener"&gt;Descheduler Profiles&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;AffinityAndTaints&lt;/code&gt; - removes pods that violates affinity and anti-affinity rules or taints&lt;/p&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;RemovePodsViolatingInterPodAntiAffinity&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;RemovePodsViolatingNodeAffinity&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;RemovePodsViolatingNodeTaints&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;TopologyAndDuplicates&lt;/code&gt; - evicts pods which are not evenly spreaded or which are violating the topology domain&lt;/p&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;RemovePodsViolatingTopologySpreadConstraint&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;RemoveDuplicates&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;LifecycleAndUtilization&lt;/code&gt; - evicts long-running pods to balance resource usage of nodes&lt;/p&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;RemovePodsHavingTooManyRestarts - Pods that are restarted more than 100 times&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;LowNodeUtilization - removes pods from overutilized nodes.&lt;/p&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;A node is considered underutilized if its usage is below 20% for all thresholds (CPU, memory, and number of pods).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A node is considered overutilized if its usage is above 50% for any of the thresholds (CPU, memory, and number of pods).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;PodLifeTime - evicts pods that are too old&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_descheduler_mechanism"&gt;Descheduler mechanism&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;The following rules are followed by the Descheduler to ensure that eviction of pods does not go wild. Therefore the following pods will never be evicted:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="ulist"&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;pods in openshift-* or kube-system namespaces&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;pods with priorityClassName equal to &lt;code&gt;system-cluster-critical&lt;/code&gt; or &lt;code&gt;system-node-critical&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;pods which cannot be recreated, for example: static or stand-alone pods/jobs or pods without a replication controller or replica set&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;pods of a daemon set&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;pods with local storage&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;pods which are violating the pod disruption budget&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="sect1"&gt;
&lt;h2 id="_installing_the_descheduler"&gt;Installing the Descheduler&lt;/h2&gt;
&lt;div class="sectionbody"&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;The Descheduler is not installed by default and must be installed after the cluster has been initiated. This is done by installed the &lt;strong&gt;Kube Descheduler&lt;/strong&gt; Operator.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;First we create a separate namespace for our operator, including a label:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-bash hljs" data-lang="bash"&gt;oc adm new-project openshift-kube-descheduler-operator
oc label ns/openshift-kube-descheduler-operator openshift.io/cluster-monitoring=true&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;Then we search for the &lt;strong&gt;Kube Descheduler&lt;/strong&gt; operator and install it, using the newly created namespace:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="imageblock"&gt;
&lt;div class="content"&gt;
&lt;img src="https://blog.stderr.at/openshift-platform/day-2/images/descheduler-install.png?height=400px" alt="Install Descheduler Operator"/&gt;
&lt;/div&gt;
&lt;div class="title"&gt;Figure 1. Install Descheduler Operator&lt;/div&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;After a few moments the operator will be installed.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="paragraph"&gt;
&lt;p&gt;You can now create a Descheduler instance either via UI (wizard) or by using the following specification:&lt;/p&gt;
&lt;/div&gt;
&lt;div class="listingblock"&gt;
&lt;div class="content"&gt;
&lt;pre class="highlightjs highlight"&gt;&lt;code class="language-yaml hljs" data-lang="yaml"&gt;apiVersion: operator.openshift.io/v1
kind: KubeDescheduler
metadata:
name: cluster
namespace: openshift-kube-descheduler-operator
spec:
deschedulingIntervalSeconds: 3600 &lt;i class="conum" data-value="1"&gt;&lt;/i&gt;&lt;b&gt;(1)&lt;/b&gt;
logLevel: Normal &lt;i class="conum" data-value="2"&gt;&lt;/i&gt;&lt;b&gt;(2)&lt;/b&gt;
managementState: Managed
operatorLogLevel: Normal &lt;i class="conum" data-value="3"&gt;&lt;/i&gt;&lt;b&gt;(3)&lt;/b&gt;
profiles: &lt;i class="conum" data-value="4"&gt;&lt;/i&gt;&lt;b&gt;(4)&lt;/b&gt;
- AffinityAndTaints
- TopologyAndDuplicates
- LifecycleAndUtilization&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="colist arabic"&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td&gt;&lt;i class="conum" data-value="1"&gt;&lt;/i&gt;&lt;b&gt;1&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;Defines the time interval the descheduler is running. Default is 3600 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;i class="conum" data-value="2"&gt;&lt;/i&gt;&lt;b&gt;2&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;Defines logging for overall component. Can be Normal, Debug, Trace or TraceAll&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;i class="conum" data-value="3"&gt;&lt;/i&gt;&lt;b&gt;3&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;Defines logging for the operator itself. Can be Normal, Debug, Trace or TraceAll&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;i class="conum" data-value="4"&gt;&lt;/i&gt;&lt;b&gt;4&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;Enables on or multiple profiles the Descheduler should consider&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</description></item></channel></rss>