Running multi-tenant Kubernetes clusters requires robust governance and policy enforcement to ensure security, compliance, and consistent resource usage for all tenants.
Open Policy Agent (OPA) was the backbone of SCHIP, our Kubernetes platform, policy enforcement for a long time—even as we began migrating to Kyverno.
As we deployed many clusters and tenants, it became nearly impossible to write policies without factoring in the state of other objects in the cluster. OPA offers a powerful feature to sync external data allowing us to leverage additional context from various Kubernetes resources. However, turning on this feature also introduces additional resource overhead, particularly in terms of memory consumption.
In this article, I’d like to share key considerations when enabling OPA’s data sync capabilities, how it can impact memory usage and why you need to balance these benefits with the resource costs. While the goal here is not to discourage you from using advanced OPA features, it’s crucial to be aware of their implications.
Finally, I’ll also share lessons learned from our transition to Kyverno, including how we prioritised the migration of rules based on their resource impact. This article should help you make more informed decisions about your policy management and potential migration paths.
A simple OPA policy
A simple OPA policy can work with the context provided by the object being validated itself. The example policy below checks the host field under the ingress object being validated
package policy.ingress_without_host
violation[{"msg": msg}] {
ingress := input.review.object.spec.rules[_]
not ingress.host
msg := "Invalid ingress. Please add a host to the ingress"
BashHowever, real-world policies often require access to other objects in the cluster. For example, verifying that a label is unique across all pods and namespaces is impossible unless the policy has visibility of all these resources.
Using OPA Sync for complex policies
To address this challenge, Open Policy Agent (OPA) allows object data to be synchronised into its client so that ConstraintTemplates can access them. Gatekeeper provides a way to enable this via Sync Configuration, as shown below:
apiVersion: config.gatekeeper.sh/v1alpha1
kind: Config
metadata:
name: config
spec:
sync:
syncOnly:
- group: ""
version: "v1"
kind: "Namespace"
- group: ""
version: "v1"
kind: "Pod"
YAMLAlternatively, SyncSet can be used to achieve similar results. Once enabled, policies can reference synced data using data.inventory. For example:
You can access the synced information via data.inventory using:
data.inventory.namespace[ns][_]["RoleBinding"]
BashThis feature is highly useful, and it worked well for us until we encountered performance issues.
Performance Challenges with High Pod Fluctuations
Recently, we noticed that OPA started experiencing performance degradation and sporadic OOMKills across multiple clusters. Although our VerticalPodAutoscaler (VPA) helped adjust resource allocation dynamically, the alerts became noisy and disruptive.
Investigating the Memory Usage Spikes
Upon further analysis, we found that clusters with high pod fluctuation — where the number of pods rapidly increases and decreases — were the primary culprits. This is common in:
- Development clusters with frequent deployments, especially full cluster deployment
- Clusters running a large number of CronJobs
Memory Usage Correlation
We observed significant memory usage fluctuations in Gatekeeper-controller pods, often by several gigabytes. The root cause? We were syncing pod data into Gatekeeper’s inventory, causing excessive memory consumption when pod counts surged.

As you can see from the graph, the memory usage fluctuates in the magnitude of GBs due to changes in the number of pods.

In our case, this is because we are using pods information in the inventory.
apiVersion: config.gatekeeper.sh/v1alpha1
kind: Config
metadata:
name: config
spec:
sync:
syncOnly:
- group: ""
kind: Namespace
version: v1
- group: "networking.k8s.io"
version: "v1"
kind: "Ingress"
- group: ""
kind: Pod
version: v1
- group: "rbac.authorization.k8s.io"
version: "v1"
kind: "RoleBinding"
YAMLSince we were already migrating to Kyverno, as mentioned in our previous article Why Did We Transition from Gatekeeper to Kyverno for Kubernetes Policy Management?, we decided to prioritise migration based on impact.
Optimising Migration for Resource Efficiency
Given our constraints, we decided to prioritise migrating policies that use data.inventory, particularly those dealing with high-volume objects like pods.
Migration Results
After migrating policies that use data.inventory of pods to Kyverno, we managed to remove pods from Gatekeeper’s sync configuration. This reduced memory usage from 8GB to 2.7GB — with just a single configuration change in one cluster.
This is significant, considering that we operate 30+ clusters, each running 3+ controller pods, making this optimisation highly impactful.


Impact on Kyverno
Kyverno does not rely on a pre-synced inventory, it instead fetches data dynamically using API calls. Here’s an example of how Kyverno retrieves pod information:
context:
- name: pods-access
apiCall:
urlPath: "/api/v1/namespaces/{{request.object.metadata.name}}/pods"
jmesPath: "items[? !starts_with(metadata.name, 'system')].metadata.name"
YAMLMemory vs. API Load Trade-Off
This approach reduces memory consumption but increases the load on the Kubernetes API server. To mitigate this, consider:
1. Monitoring API Server Load — Ensure that the additional requests don’t overload the API server.
2. Using specific pre-conditions — Reduce unnecessary evaluations:
preconditions:
- key: "{{ request.operation }}"
operator: Equals
value: "DELETE"
YAMLFinal Thoughts
This article does not aim to criticise Gatekeeper’s inventory sync feature — which remains extremely useful. However, it’s important to exercise caution when enabling inventory sync, as excessive memory usage could impact Gatekeeper’s stability — potentially preventing new deployments and pod creations.
This blog describes a strategy for transitioning from OPA to Kyverno by prioritising policies that rely on high-impact inventory rules — such as those referencing high-volume objects like pods.
By applying this approach, we yielded a substantial reduction in memory usage, improved stability, and reduced operational noise — without compromising policy enforcement.