[Ep.16] Tuning OpenShift GitOps for performance at scale
Argo CD is a powerhouse for GitOps. It is easy to get started with and helps you manage clusters and applications. As the number of clusters, applications, and repositories grows, performance can suffer: refreshes take longer, the UI may feel sluggish, and the experience turns frustrating.
This article is about performance tuning: which options exist, which scenarios matter, and which settings might help in your environment.
Introduction
GitOps has become a common way to manage configuration drift, and Argo CD is one of the most widely used tools for it. It keeps the desired state defined in Git aligned with what actually runs in the cluster. On Kubernetes you can use Argo CD to manage multiple clusters from a single management cluster (other topologies exist) and to synchronise multiple repositories. A typical pattern is two Argo CD instances: one for cluster configuration and one for application rollout (though many variants work).
What holds across setups is that Argo CD tends to slow down as its configuration grows and as the repositories it syncs from get larger and busier.
For example, imagine you have a single repository with thousands of commits. By default, Argo CD synchronises the whole repository, including the history. This is one screw that can be tightened.
Argo CD Components
Argo CD is made up of several cooperating components. In rough data-flow order, they are:
Repo Server: Connects to Git repositories, clones and syncs them, and generates manifests (for example with Helm or Kustomize). It uses Redis for caching.
Redis: Caches generated manifests and other data the other components reuse (default 24h).
Server: Exposes the API and UI; it reads from Redis to verify if it is still valid.
ApplicationSet Controller: Reconciles
ApplicationSetresources and createsApplicationobjects that the application controller picks up.Application Controller: Reconciles desired state with live state in the cluster—creating, updating, and deleting resources as needed.
| You may also see extra pods (for example Dex or a GitOps plugin). They are not central to the core sync path covered here; tuning them is mostly a matter of resources and replicas. |
TL;DR: an overview
When starting with this article, I did not expect to fall down into the rabbit hole. The article became bigger and bigger and I am pretty sure that there will be settings that I missed, even though I search the actual source code.
| If you find anything missing, please let me know. I am happy to add it. |
The following overview table summarises the settings discussed in this article. Default and example recommended values are indicative—always validate in your environment. If you want a quick overview, start here.
| Setting | Default | Recommended | Standalone Argo CD | OpenShift GitOps (ArgoCD CR) |
|---|---|---|---|---|
Full history | 1 | Repository Secret with stringData.depth / data.depth (base64). | | |
0 (unlimited) | 20 then tune (avoid OOM) |
| | |
0 (unlimited) | 20 when Git rate limits bite | Env on | | |
90s | 180s or higher for heavy Helm/Kustomize | Env on | | |
10s | Leave default unless you change SIGTERM→SIGKILL grace | Env on | | |
1 | 3–5 on flaky networks | Env on | | |
24h | Often keep 24h; shorten only with care | Env on | | |
10s | Raise if large charts hit lock timeouts (e.g. 30s) | Env on | | |
60 | Align with repo render time (e.g. 180) | Env on | | |
Disabled / single replica | Enable for multi-cluster; algorithm + replicas | HA StatefulSet, argocd-cmd-params-cm controller.sharding.algorithm, ARGOCD_CONTROLLER_REPLICAS / ARGOCD_CONTROLLER_SHARD on pods. | | |
3m | 600s–3600s with webhooks |
| | |
60s (if unset, controller default applies) | Keep a fraction of timeout.reconciliation (often tens of seconds); 0s = no random spread |
| | |
20 | 50–100 at large scale |
| | |
10 | Raise when the sync queue backs up |
| | |
ARGOCD_APPLICATION_CONTROLLER_REPO_SERVER_TIMEOUT_SECONDS + controller.repo.server.timeout.seconds | 60 | 120–300 for slow manifest RPCs |
| |
ARGOCD_APPLICATION_CONTROLLER_KUBECTL_PARALLELISM_LIMIT + controller.kubectl.parallelism.limit | 20 | Match or slightly below operation processors (e.g. 50) |
| |
Standalone Argo CD uses argocd-cm, argocd-cmd-params-cm, and environment variables on the upstream Deployments / StatefulSets. The OpenShift GitOps column shows spec fragments merged into the ArgoCD CR (metadata.name / metadata.namespace openshift-gitops); use the same spec under metadata.name / namespace argocd for a community Argo CD Operator install. Repository depth is always a Secret, not a field on ArgoCD. |
Let’s start tuning
Before we discuss the obvious settings, like increasing replicas, resources, or configuring sharding, let’s take a look at the often-overlooked settings.
Repo-Server: Git shallow cloning
By default Argo CD will clone the whole repository including the history. Depending on the repository size this can take a long time. By limiting the commit history downloaded by the argocd-repo-server, you can drastically reduce fetch times, bandwidth, and resource consumption, especially for large monorepos.
To configure Git shallow cloning, set the depth parameter to the number of commits you want to clone. For example, to clone only the latest commit, set the depth parameter to 1. This setting applies at the repository level, so you must configure it in the Secret object for each repository.
| When you work with public repositories you might not have this Secret object. In that case you need to create that Secret object for the public repository. |
Here is an example of a Secret object for a public repository:
kind: Secret
apiVersion: v1
metadata:
name: repo-2960947011
namespace: openshift-gitops
labels:
argocd.argoproj.io/secret-type: repository (1)
annotations:
managed-by: argocd.argoproj.io
data:
depth: 1 (2)
type: Git
url: https://github.com/tjungbauer/openshift-clusterconfig-gitops (3)
type: Opaque| 1 | The label is used to identify the Secret object as a repository secret. |
| 2 | The depth parameter is the number of commits you want to clone. |
| 3 | The url field is the repository URL. |
In the logs of the repo server we can see the different calls to the Git repository:
# Without depth parameter
time="2026-04-12T02:42:44Z" level=info msg="Initializing https://github.com/tjungbauer/openshift-clusterconfig-gitops to /tmp/_argocd-repo/991f2e97-c2ca-4fd0-8988-b9af2f1b228e"
[...truncated...]
time="2026-04-12T02:42:45Z" level=info msg=Trace args="[git fetch origin --tags --force --prune]" dir=/tmp/_argocd-repo/991f2e97-c2ca-4fd0-8988-b9af2f1b228e operation_name="exec git" time_ms=1011.528864 (1)
# With depth parameter
time="2026-04-12T02:45:23Z" level=info msg="Initializing https://github.com/tjungbauer/openshift-clusterconfig-gitops to /tmp/_argocd-repo/fcd62851-9f0f-4c0b-aa61-039327adc256"
[...truncated...]
time="2026-04-12T02:45:23Z" level=info msg=Trace args="[git fetch origin 0a2f72afe511023aa2a86560de24f6046644c286 --depth 1 --force --prune]" dir=/tmp/_argocd-repo/fcd62851-9f0f-4c0b-aa61-039327adc256 operation_name="exec git" time_ms=559.205253 (2)| 1 | Without a depth value, the whole history of the repository is fetched, which takes almost twice as long. |
| 2 | With depth set, Git fetches only the requested shallow history. |
Do not forget that the depth parameter must be added to the Secret. If you use a public repository you might need to create the Secret object manually. |
References: Argo CD: Shallow clone
Repo-Server: Environment variables to boost performance
The repo server container can be configured with environment variables to boost performance.
When you use the OpenShift GitOps Operator, set these variables in the spec.repo.env section of the ArgoCD custom resource, for example:
apiVersion: argoproj.io/v1alpha1
kind: ArgoCD
metadata:
name: openshift-gitops
namespace: openshift-gitops
spec:
[...truncated...]
repo:
env:
- name: ENV_NAME
value: "VALUE"If you prefer to use a ConfigMap (or Secret) instead, you can do so by setting a reference:
env:
- valueFrom:
configMapKeyRef:
name: my-configmap
key: my-key
optional: trueThe variables are passed as a simple list to the container.
| If you do not use the operator you can set these values in the repo server deployment or the Argo CD ConfigMap. |
The following environment variables might be considered:
ARGOCD_REPO_SERVER_PARALLELISM_LIMIT (Default: 0)
This variable defines the maximum number of parallel fetches of the Git repository. The most common issue arises when the repo server crashes with OOM errors. By default this variable is set to 0, which means no limit.
Problem
Imagine you have a single repository with 500 applications. If you push new code to the repo, Argo CD detects the change and tries to generate the manifests for all applications at once. That can mean a large number of helm template or kustomize build commands running concurrently, which results in:
High CPU spikes
Very high memory use, leading to
OOMKilledcontainers and possibly a crash loop for the repo server
Solution
When you set a value greater than 0 (for example, 20), you configure a strict worker pool (a semaphore).
If 500 requests arrive, the repo server only processes 20 at a time. The remaining 480 requests wait in a queue until a worker is free.
Disadvantage: Applications might take a few extra seconds to sync during large bursts of activity.
Advantage: Your argocd-repo-server stays stable, remains online, and avoids repeated OOM kills.
There is no one-size-fits-all value here. The best approach is to start with a modest number such as 20 and monitor the behaviour of the repo server. |
References: Argo CD: Repo server deployment
ARGOCD_GIT_LS_REMOTE_PARALLELISM_LIMIT (Default: 0)
While the previous setting protects your Kubernetes cluster, this setting protects your Git service from being overwhelmed by too many concurrent requests. Argo CD periodically runs git ls-remote to check for new commits.
Problem
Imagine that we still have 500 applications pointing to a single repository and branch (for example, main). When reconciliation runs, Argo CD runs git ls-remote for each of the 500 applications at the same time to check for new commits. If your Git provider enforces a strict rate limit, this might lead to:
Request throttling by the Git provider
Your repository or organisation being completely blocked by the Git provider
Applications left in an
Unknownstate in Argo CD
Solution
Setting this value adds a second limit. When you set a value greater than 0 (for example, 20), the repo server opens at most 20 concurrent connections to the Git provider. The remaining requests wait in a queue until a connection is free.
Disadvantage: Slower reconciliation for applications that are not yet synced.
Advantage: You are less likely to hit rate limits on your Git provider.
Do not set the value too low. For example, a value of 2 might cause timeouts in the application controller if requests wait too long in the queue. |
References: Argo CD: Repo server deployment
ARGOCD_EXEC_TIMEOUT (Default: 90s)
Sets the maximum time for child processes (Helm, Kustomize, additional plugins, etc.). The repo server uses these tools to generate the final Kubernetes manifests for your applications. When the value is too low, it might lead to timeouts for slow charts.
Problem
Imagine you are deploying applications that rely on very large or complex Helm charts and Kustomize overlays. Because these charts take longer to render, the child process might need more than the default 90 seconds to complete. If the timeout expires before rendering finishes, you may see:
Timeouts in the application controller for slow charts.
Applications failing to sync and throwing rendering errors.
Frustration when heavy but valid configurations break simply because they need more time to render.
Solution
By adjusting this value (for example, configuring it to 180s) you give the repo server more time to finish rendering complex manifests. It tells Argo CD to wait longer before killing the child process.
Disadvantage: A higher timeout means a genuinely hung process holds resources longer before Argo CD finally terminates it.
Advantage: Complex, large, or slow-rendering charts have enough time to generate manifests successfully.
| Do not set the value too high. An excessively high timeout stops Argo CD from failing fast under severe load or when a rendering process hangs. |
References: Argo CD: High Availability Guide
ARGOCD_EXEC_FATAL_TIMEOUT (Default: 10s)
This setting complements ARGOCD_EXEC_TIMEOUT. It is used to forcibly terminate a child process (such as Helm, Kustomize, etc.) if it cannot exit gracefully.
When execution exceeds ARGOCD_EXEC_TIMEOUT, Argo CD sends SIGTERM. If the process still has not exited when the fatal timeout elapses, Argo CD sends SIGKILL.
Problem
Imagine a scenario where a manifest generation tool or a custom plugin gets completely stuck in an unresponsive state.
When the standard ARGOCD_EXEC_TIMEOUT is reached, Argo CD asks the process to stop gracefully. If the child process ignores that request or is fully deadlocked, you may see:
Zombie processes consuming memory and CPU on the repo server.
The repo server eventually running out of resources.
Subsequent application syncs failing or hanging indefinitely because the underlying tools are locked up.
Solution
Setting this value adds a strict fallback limit. When configured (for example, setting it to 120s if your regular timeout is 90s), it tells Argo CD to intervene and mercilessly terminate the process if it hasn’t exited by this hard deadline.
Disadvantage: Force-killing a process prevents it from cleaning up after itself.
Advantage: It guarantees that a hung process will not permanently lock up the repo server’s compute resources.
| I personally would not set this value if it is not absolutely necessary. The default 10 seconds should be enough for most cases. |
References: Argo CD: High Availability Guide
ARGOCD_GIT_ATTEMPTS_COUNT (Default: 1)
This variable sets the maximum number of retry attempts for failed Git repository requests. When Argo CD interacts with your Git provider, network hiccups, temporary outages, or rate limiting or throttling by the provider can cause a Git request to fail unexpectedly.
Problem
For example, if multiple concurrent requests hit the provider, the provider might randomly reset the connection (often seen as a connection reset by peer or ssh: handshake failed error in the logs). With the default setting of one attempt, Argo CD fails the sync or reconciliation immediately when this happens, leaving applications in an Unknown state or surfacing a sync error.
Solution
Increasing this value (for example, to 3, 5, or even 10 in heavily throttled environments) configures Argo CD to retry the failed Git operation automatically before surfacing a hard error.
Disadvantage: If your Git provider is genuinely down (for example, a major GitHub outage) or your credentials have expired, the repo server wastes time retrying an operation that cannot succeed.
Advantage: It improves resilience against flaky network paths, temporary DNS issues, and provider rate limiting.
Avoid setting this to an unrealistically high value (such as 50) unless you are addressing a specific, known throttling problem. A very high retry count can leave the repo server tied up on a bad repository for minutes, and you will still hit ARGOCD_EXEC_TIMEOUT in the end. In practice, values between 3 and 5 are a good balance for stability. |
References: Argo CD: High Availability Guide
Repo-Server: Caching optimizations
The following additional environment variables are focused on caching optimizations.
ARGOCD_REPO_CACHE_EXPIRATION (Default: 24h)
This setting controls the time-to-live (TTL) for cached repository states and generated manifests in the repo server (stored in Redis). When that TTL expires, the cache is invalidated and the repo server downloads the repository and generates the manifests again.
For most environments, the default of 24h is a good balance between performance and freshness. You can still trigger a manual refresh when you need newer content. |
Problem
When the repo server downloads a Git repository and generates Kubernetes manifests, doing so is resources expensive.
If the repo server has already downloaded a repository and the remote Git branch has not received any new commits, there is no need to re-download and re-render the same YAML. However, if you use floating tags (such as latest or main) or rely on remote Helm repositories where a chart can change without a version bump, an overly long cache TTL would stop Argo CD from seeing updates promptly.
Solution
This value sets how long the repo server trusts its cached data before it forces a fresh download and re-renders manifests to confirm nothing changed upstream.
Disadvantage: If set too high (for example,
48h), developers who use mutable image tags or update remote Helm charts without bumping versions will find that Argo CD does not pick up their changes quickly enough.Advantage: Keeps the repo server fast and efficient by serving pre-rendered manifests from Redis for most sync operations.
Do not lower this value (for example, to 5m) only to pick up floating tags sooner. Doing so forces the repo server to discard its cache and re-render manifests from scratch far more often, which severely degrades performance. |
References: Argo CD: Repo server deployment
ARGOCD_REVISION_CACHE_LOCK_TIMEOUT (Default: 10s)
This variable sets the TTL used to deduplicate concurrent work on the same revision (0 disables deduplication). It reduces duplicate manifest generation when many applications refresh at once.
Problem
Imagine you have 100 applications pointing to the same Git repository and commit. When a webhook fires, all 100 applications try to sync at once. To save CPU, Argo CD generates the manifests for that commit once, caches the result in Redis, and serves it to every application. The first worker locks the cache key while it renders; everyone else waits.
If rendering takes a long time (for example, a very large Helm chart) and exceeds the default lock timeout, waiting workers assume the first attempt failed and all start rendering the same commit themselves, causing a sharp CPU spike. That pattern is often called a cache stampede.
Solution
Increasing this lock timeout gives the first worker more time to finish rendering and populate the cache. Other workers stay in the queue instead of duplicating the work.
Disadvantage: If the first worker hangs or exits without releasing the lock, the others remain blocked until this longer timeout elapses.
Advantage: Reduces cache stampedes where many workers render the same Git commit unnecessarily, which protects the repo server from sudden CPU spikes during mass reconciliations.
As with ARGOCD_EXEC_TIMEOUT, tune this value based on how long your slowest Helm chart or Kustomize overlay actually takes to render. The default of 10s is adequate for many environments; large charts or complex overlays may need a higher value. |
References: Argo CD: Repo server deployment
ARGOCD_SERVER_REPO_SERVER_TIMEOUT_SECONDS (Default: 60)
This variable sets the maximum time the Argo CD API server waits for a response from the repo server.
Problem
When you open the Argo CD Web UI or click on an application, the API Server has to make an internal gRPC call to the Repo Server and say, "Hey, please render these manifests and send them back to me so I can show them to the user."
If you have very large Helm charts or complex Kustomize builds, the repo server might take a long time to render them. However, this API server timeout defaults to 60 seconds, so the UI can give up, drop the connection, and show a context deadline exceeded error even though the repo server is still rendering in the background.
Solution
By increasing this value you tell the API Server to be more patient. It will keep the loading spinner going until the repo server finally finishes rendering and returns the data.
Disadvantage: If the repo server genuinely hangs or crashes, users will sit staring at a spinning wheel in the UI for a much longer time before finally receiving an error message.
Advantage: Eliminates false-positive UI and CLI errors when viewing complex applications, ensuring users can actually see the manifests and diffs for heavy deployments.
Align this timeout with ARGOCD_EXEC_TIMEOUT on the repo server. If the repo server may spend 180 seconds rendering a chart but the API server gives up after 60 seconds, the UI will keep failing for slow applications. |
References: Argo CD: Server deployment
Application controller: Sharding (horizontal scaling)
When you investigate performance issues, sharding is often one of the first topics you meet. Sharding scales the application controller horizontally by distributing work across multiple controller replicas automatically.
When a single application controller reconciles too many applications across too many clusters, it becomes the bottleneck: high CPU and memory use, long reconcile queues, Kubernetes API throttling, and pressure on the repo server. Sharding splits that work across multiple controller replicas. Each replica has a shard index and reconciles only applications whose destination clusters map to that shard.
| Implicitly, this means that sharding only makes sense if you have more than one cluster in your environment. |
How it works (conceptually)
You run N application-controller replicas (typically a StatefulSet so each pod has a stable identity).
Argo CD assigns every registered cluster (including the in-cluster deployment) to exactly one shard using a distribution function. The controller for shard K ignores clusters that are not mapped to K.
Applications are processed by the shard that owns their destination cluster.
Distribution algorithms
The following algorithms are supported. Configure them with the ARGOCD_CONTROLLER_SHARDING_ALGORITHM (at spec.controller.env) environment variable when you use the OpenShift GitOps Operator, or with controller.sharding.algorithm in argocd-cmd-params-cm ConfigMap:
legacy— stable hash from cluster ID; simple, but shards can be unbalanced (some busier than others). Kept for compatibility.round-robin— tends to spread clusters more evenly across shards; often a better default when you scale out.consistent-hashing— can improve balance further in some large, multi-cluster setups; cluster placement can reshuffle when the number of shards changes.
The OpenShift GitOps documentation does not explicitly mention consistent-hashing, which often indicates limited or unsupported use on that distribution. If you set ARGOCD_CONTROLLER_SHARDING_ALGORITHM under spec.controller.env, Argo CD still reads whatever value you supply—verify against your operator version before relying on it in production. |
Why it improves performance
Parallelism across nodes: More controller pods mean more CPU and memory headroom in aggregate.
Smaller per-shard work: Each replica handles fewer clusters and fewer applications.
Better fit for hub-and-spoke: On a management cluster with many spoke clusters, sharding is the usual way to scale past what one controller can sustain.
Trade-offs
Operational complexity: More pods to monitor; you must ensure every shard stays healthy.
Not a substitute for repo-server tuning: If manifest generation or Git is the bottleneck, sharding the controller alone will not fix it.
Only useful if you have more than one cluster in your environment.
Configure Sharding
OpenShift GitOps (Argo CD custom resource)
The OpenShift GitOps Operator lets you configure sharding in the Argo CD custom resource.
The following snippet shows a typical sharding configuration, including the algorithm.
apiVersion: argoproj.io/v1alpha1
kind: ArgoCD
metadata:
name: openshift-gitops
namespace: openshift-gitops
spec:
controller:
env:
- name: ARGOCD_CONTROLLER_SHARDING_ALGORITHM
value: round-robin (1)
sharding:
enabled: true (2)
replicas: 3 (3)
minShards: 2 (4)
maxShards: 5 (5)| 1 | The algorithm to use for sharding |
| 2 | Sharding must be enabled explicitly. |
| 3 | Number of application-controller replicas (shards) |
| 4 | Minimum number of shards |
| 5 | Maximum number of shards |
| Sharding applies to the application controller only. The repo server, Redis, and Argo CD server scale on their own and are sized separately. |
Standalone Argo CD
Install the high-availability layout so the application controller is a StatefulSet (not a single-replica Deployment).
Set replicas on that StatefulSet to the number of shards you want - each replica is one shard.
Configure
controller.sharding.algorithminargocd-cmd-params-cm(for exampleround-robin).Ensure each pod receives
ARGOCD_CONTROLLER_REPLICAS(total shard count) andARGOCD_CONTROLLER_SHARD(this pod’s index).
Controller: Performance tuning (ConfigMap)
The controller container can also be configured with environment variables. However, you configure it slightly differently from the repo server. The configuration comes from a ConfigMap: either you edit that ConfigMap directly, or, if you use the OpenShift GitOps Operator, you set spec.extraConfig or the appropriate setting under spec.controller. Those values eventually become environment variables in the controller container.
When you use the OpenShift GitOps Operator with the Argo CD custom resource, set spec.extraConfig. For example:
spec:
extraConfig: (1)
timeout.reconciliation: 180s| 1 | Special extraConfig key with an example setting |
The operator will automatically merge the spec.extraConfig with the argocd-cm ConfigMap.
However, to keep it confusing, some parameters are set under spec.controller. For example, spec.controller.processors.status and spec.controller.processors.operation.
spec:
controller:
processors:
status: 50If you do not use the Operator, set the parameter directly in the ConfigMap.
kind: ConfigMap
apiVersion: v1
metadata:
name: argocd-cm (1)
namespace: openshift-gitops
data:
[...truncated...]
timeout.reconciliation: 180s (2)| 1 | Special ConfigMap name for the Argo CD configuration |
| 2 | Special timeout.reconciliation key with an example setting |
Several ConfigMaps are involved in controller configuration. The argocd-cm ConfigMap is the main ConfigMap for the Argo CD instance. The argocd-cmd-params-cm ConfigMap supplies controller command-line parameters. Each subsection below states which ConfigMap applies. When you use the Operator, map these keys through the appropriate section where noted. |
timeout.reconciliation (Default: 3m)
ConfigMap: argocd-cm
ArgoCD: spec.extraConfig (if using the Operator)
With this setting you control the default polling interval for Argo CD. It defines how often Argo CD wakes up to compare the desired state in Git with the live state in your Kubernetes cluster.
Problem
If you do not have Git webhooks configured, Argo CD relies on a pull-based mechanism. By default, every three minutes the application controller connects to the repo server to check Git for new commits. In a cluster with thousands of applications, polling every three minutes creates a sustained, heavy baseline load on both the application controller (CPU and memory) and the repo server (network and Git API traffic), even when nothing has changed in your repositories.
Solution
By increasing this value (for example, 600s for 10 minutes, or even 3600s for one hour), you reduce that background load and CPU overhead on the controller and repo server.
| In combination with Git webhooks, this setting is one of the most important performance tweaks you can apply. |
Disadvantage: If you do not use Git webhooks, new commits are detected and applied more slowly (by up to the reconciliation interval).
Advantage: A large reduction in baseline CPU, memory, and network usage—a crucial performance tweak for large clusters.
| Only increase this value if you have Git webhooks configured so Argo CD is notified as soon as changes are pushed. With webhooks in place, the controller does not need to poll as aggressively, so longer reconciliation intervals are usually safe. |
References: Argo CD: FAQ
timeout.reconciliation.jitter (Default: 60s)
ConfigMap: argocd-cm
ArgoCD: spec.extraConfig (if using the Operator)
This setting works together with timeout.reconciliation. It defines the maximum random delay (a jitter) that Argo CD may add when scheduling each application’s periodic Git poll. The controller picks a value between zero and this maximum so that not every Application wakes up at the same instant.
Problem
If thousands of applications share the same base reconciliation interval, their refresh work can align in time. That creates periodic spikes of load on the application controller, the repo server, and your Git provider — even when nothing has changed.
Solution
Keep a non-zero jitter that is a reasonable fraction of timeout.reconciliation (upstream examples often use 60s jitter alongside a 120s base interval; see the FAQ). When you increase the poll interval for performance, adjust jitter so you still spread work rather than synchronizing every long interval.
Disadvantage: Each app’s next poll is a bit less predictable.
Advantage: Smoother average load and fewer peak loads than setting jitter to 0s.
References: Argo CD: FAQ
Controller: Performance tuning (Command line parameters)
For a while I was unsure where to place the following settings, because the Operator configures them differently from a plain install. I ended up giving command-line parameters their own subsection.
If you are not using the Operator, configure these values in the argocd-cmd-params-cm ConfigMap. That part is straightforward.
If you use the Operator, set them under spec.controller.processors.
spec:
controller:
processors: (1)
status: 20
operation: 10| 1 | Special processors key with the default settings |
This differs from the section above, where such parameters were set at spec.extraConfig.
The reason is, that the Operator sets these parameters directly as a switch in the execution command of the container and does not read it from the ConfigMap.
I am sure there is a design decision behind this …
controller.status.processors (Default: 20)
ConfigMap: argocd-cmd-params-cm (if not using the Operator)
ArgoCD: spec.controller.processor.status (if using the Operator)
This setting controls the number of concurrent status processors. That is, how many applications the controller can compare at once.
Problem
If you have 500 applications, a default of 20 processors means the controller evaluates only 20 at a time. During a mass reconciliation (for example, a webhook firing for a monorepo), the remaining 480 applications wait in the reconciliation queue. You will notice drift detection becoming very slow.
Solution
Increase this value (for example, to 50 or 100 when you run more than 1,000 applications) so the controller can drain the queue faster, especially if refreshes in the UI feel slow.
Disadvantage: Higher CPU and memory use on the application controller Pod.
Advantage: Argo CD detects drift and refreshes status in the UI much faster in large environments.
References: Argo CD Operator: Reference
controller.operation.processors (Default: 10)
ConfigMap: argocd-cmd-params-cm (if not using the Operator)
ArgoCD: spec.controller.processor.operation (if using the Operator)
While status processors compare desired and live state, this setting controls the number of concurrent operation processors for actual sync runs.
Problem
If you trigger a mass sync of 100 applications at once but only 10 operation processors are configured, Argo CD runs 10 syncs, waits for them to finish, starts the next 10, and so on.
Applications stay in Waiting or Pending in the UI until their turn starts.
Solution
Increase this value when many applications sync or run operations at the same time and the operation queue backs up (pending syncs, slow operation phase). Monitor controller CPU and memory — you need headroom for the extra parallelism.
Disadvantage: Higher CPU and memory use on the application controller Pod.
Advantage: Argo CD can apply manifests to the Kubernetes API for many applications at the same time.
| This is independent of status processors: you might have heavy refresh traffic (status) and lighter sync traffic (operations), or the opposite—tune each side to match your workload. |
References: Argo CD Operator: Reference
ARGOCD_APPLICATION_CONTROLLER_REPO_SERVER_TIMEOUT_SECONDS - controller.repo.server.timeout.seconds (Default: 60)
ConfigMap: argocd-cmd-params-cm (if not using the Operator)
ArgoCD: spec.controller.env (if using the Operator)
This sets how long the application controller waits for the repo server to finish generating manifests before it stops and returns a context deadline exceeded error.
Do not put this key in spec.extraConfig. On OpenShift GitOps, set it under spec.controller.env in the Argo CD custom resource. |
Problem
Heavy charts, large repositories, slow Git, or a busy repo server may need more than 60 seconds for a single GenerateManifests (or similar) RPC. If the deadline is too short, applications show comparison failures even when the repo server would eventually succeed.
If you raised ARGOCD_EXEC_TIMEOUT on the repo server so a chart can render for 180 seconds, but leave this controller deadline at 60 seconds, the controller cancels the request while the repo server is still working, and the sync fails.
Solution
Raise the value (for example to 180) so legitimately slow manifest generation can finish. Size it so the controller allows at least as long as the worst-case repo-server work you expect for one RPC.
Disadvantage: A hung repo server holds work longer before the client fails.
Advantage: Fewer false failures from an overly aggressive deadline; large environments often use 120–300 seconds or more.
This controls only the controller → repo-server gRPC client. It does not replace ARGOCD_EXEC_TIMEOUT (child processes on the repo server) or ARGOCD_REPO_SERVER_PARALLELISM_LIMIT (concurrent manifest work). |
References: Argo CD: Command line parameters
ARGOCD_APPLICATION_CONTROLLER_KUBECTL_PARALLELISM_LIMIT - controller.kubectl.parallelism.limit (Default: 20)
ConfigMap: argocd-cmd-params-cm (if not using the Operator)
ArgoCD: spec.controller.env (if using the Operator)
This setting controls the maximum number of concurrent kubectl child processes the application controller may spawn at once.
Problem
If you raise controller.operation.processors (for example to 50 to run 50 syncs at once), you can hit a hidden bottleneck. This kubectl limit defaults to 20, so the extra operation processors wait for a free slot.
Without any limit, running dozens of concurrent kubectl processes could spike memory and CPU and risk OOM-killing the controller Pod.
If you increase controller.operation.processors to speed up mass syncs, raise this limit to match or sit slightly below that value (for example set it to 50). That lets the controller run sync commands in parallel without stalling on the process limit.
Raise the limit in steps (for example 50, or 100 at very large scale). After each change, watch the Kubernetes API server (request rate and latency), etcd, and application-controller CPU and memory.
Disadvantage: More overlapping
kubectlwork increases API traffic and process pressure. On a stressed control plane that can make things worse.Advantage: Less time waiting for a free
kubectlslot when the cluster can absorb the extra concurrency.
Do not set this far above controller.operation.processors; extra headroom does not increase throughput and only wastes memory. The two settings are meant to move together. |
References: Argo CD: Command line parameters
Summary
OpenShift GitOps performance is rarely fixed by a single knob. The repo server, application controller, API server, Redis, and your Git provider all interact: a slow manifest render stresses the repo server; an aggressive client timeout fails the UI even when rendering would succeed; unbounded parallelism can OOM the repo server; tight kubectl concurrency can leave sync operations queued on an otherwise healthy controller.
A practical approach is to work in order of leverage: reduce unnecessary Git and render work (for example shallow clones, cache TTLs, and webhooks so you are not polling blindly), then align related timeouts (ARGOCD_EXEC_TIMEOUT, controller and API server deadlines to the repo server), then bound concurrency (repo-server parallelism, Git ls-remote limits, status and operation processors, kubectl parallelism) before you scale out with sharding and replicas.
On OpenShift GitOps, the same upstream settings appear in different places spec.repo.env, spec.controller.env, spec.extraConfig, spec.controller.processors, and spec.controller.sharding so always map each variable to the right field in the Argo CD custom resource (or the equivalent ConfigMaps in a plain install).
Finally, treat tuning as evidence-led: change one major setting at a time, watch controller and repo-server memory, reconcile duration, API rates, and Git rate limits. What works for a small instance will not automatically transfer to thousands of applications or many managed clusters.
References
Copyright © 2020 - 2026 Toni Schmidbauer & Thomas Jungbauer
![image from [Ep.16] Tuning OpenShift GitOps for performance at scale](https://blog.stderr.at/gitopscollection/images/logos/ep-16-gitops.png)
![Featured image for [Ep.15] OpenShift GitOps - Argo CD Agent](/gitopscollection/images/agent/Logo-ArgoCDAgent.png)
Discussion
Comments are powered by GitHub Discussions. To participate, you'll need a GitHub account.
By loading comments, you agree to GitHub's Privacy Policy. Your data is processed by GitHub, not by this website.