Crossplane provider proposal #7085

zanderfriz · 2025-11-03T21:23:20Z

What this PR does: Introduces a proposal for a crossplane provider to the cortex project) to declaratively manage Cortex Alertmanager and Ruler configurations through Kubernetes Custom Resources.

Which issue(s) this PR fixes: N/A
Checklist

[N/A] Tests updated
Documentation added
[ N/A] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

friedrichg · 2025-11-03T23:50:31Z

thanks!. please follow https://github.com/cortexproject/cortex/pull/7085/checks?check_run_id=54406852290 to fix DCO

friedrichg · 2025-11-17T18:42:27Z

@zanderfriz please rebase to have CI pass the PR. We made some changes in GitHub Actions

friedrichg · 2025-11-17T19:05:43Z

I am in support of this proposal

I have 2 requests to merge this as accepted:

Let's put this in a separate repo inside cortexproject, where the selected maintainers will be able to keep this component updated.
We need 2 maintainers for this. (I can't be a maintainer, sorry). I am expecting you will be one of the mantainers. Can you find 1 person to help you with this?

alolita · 2025-11-18T05:02:07Z

+1 on making sure there are at least 2 maintainers for this provider component.

Support a separate repo within the project.

SungJin1212 · 2025-11-20T01:45:56Z

+1

Signed-off-by: afrisvold <afrisvold@apple.com>

zanderfriz · 2025-11-21T00:15:59Z

After discussing with @devopsjedi, he said he would be happy to be a maintainer on this project

devopsjedi · 2025-11-21T02:15:26Z

After discussing with @devopsjedi, he said he would be happy to be a maintainer on this project

Agreed- excited to support this effort!

yeya24 · 2025-11-22T19:21:00Z

docs/proposals/crossplane-provider.md

+
+#### TenantConfig
+
+The TenantConfig CRD manages connection details and authentication for a specific Cortex tenant:


Why connection and auth only? And how tenant config will be consumed by Cortex?

TenantConfig is the configuration for the crossplane provider to connect to the cortex instance as a Tennant. It is not for configuring a Tenant on cortex as the cortex administrator.

forestsword

I had written a long winded version of this comment but realized it was all just a matter of my internal organization's organization. In short we can't use this version of the operator because we can't run crossplane. We're an observability team and do not have the responsibility to run something that at the same time can provision s3 buckets.

Also, neither the Prometheus nor Opentelemetry operator, two work-horses of our observability infrastructure, require that we run crossplane, why should cortex?

Don't get me wrong, I don't want to trash the idea of crossplane. It's better for the cortex community to have a crossplane provider than nothing. But we won't be able to use it where I work and that makes me sad.

forestsword · 2025-11-27T12:19:40Z

docs/proposals/crossplane-provider.md

+  providerConfigRef:
+    name: cortex-config


What is this referring to? Is it crossplane specific?

Yes this is referencing the Crossplane provider config for the cortex provider.

forestsword · 2025-11-27T12:21:00Z

docs/proposals/crossplane-provider.md

+    tenantConfigRef:
+      name: production-tenant


We run multiple clusters and it would be helpful to be able to specify multiple clusters where rules should be deployed to. Otherwise we'd need a RuleGroup per cortex cluster.

This is a use case where Crossplane can shine. The provider's job is to provide the primitive objects you need. As the platform owner, you can create an XRD/Composition which fits your use case. For example:

Create XRD

apiVersion: apiextensions.crossplane.io/v1 kind: CompositeResourceDefinition metadata: name: xsharedrulegroups.cortex.platform.example.com spec: group: cortex.platform.example.com names: kind: XSharedRuleGroup plural: xsharedrulegroups claimNames: kind: SharedRuleGroup # What users create plural: sharedrulegroups versions: - name: v1alpha1 served: true referenceable: true schema: openAPIV3Schema: type: object properties: spec: type: object properties: tenantRefs: type: array description: "List of TenantConfigs to apply rules to" items: type: object properties: name: type: string namespace: type: string required: [name] namespace: type: string description: "Cortex rules namespace" groupName: type: string description: "Rule group name" interval: type: string default: "1m" rules: type: array description: "Alert/recording rules" # ... same schema as RuleGroup.spec.forProvider.rules required: - tenantRefs - namespace - groupName - rules

Create composition with function:

apiVersion: apiextensions.crossplane.io/v1 kind: Composition metadata: name: sharedrulegroup-fanout spec: compositeTypeRef: apiVersion: cortex.platform.example.com/v1alpha1 kind: XSharedRuleGroup mode: Pipeline pipeline: - step: fan-out-to-tenants functionRef: name: function-go-templating input: apiVersion: gotemplating.fn.crossplane.io/v1beta1 kind: GoTemplate source: Inline inline: template: | {{- range $i, $tenant := .observed.composite.resource.spec.tenantRefs }} --- apiVersion: config.cortexmetrics.io/v1alpha1 kind: RuleGroup metadata: name: {{ $.observed.composite.resource.metadata.name }}-{{ $tenant.name }} annotations: crossplane.io/composition-resource-name: rulegroup-{{ $tenant.name }} spec: forProvider: tenantConfigRef: name: {{ $tenant.name }} {{- if $tenant.namespace }} namespace: {{ $tenant.namespace }} {{- end }} namespace: {{ $.observed.composite.resource.spec.namespace }} groupName: {{ $.observed.composite.resource.spec.groupName }} {{- if $.observed.composite.resource.spec.interval }} interval: {{ $.observed.composite.resource.spec.interval }} {{- end }} rules: {{ toJson $.observed.composite.resource.spec.rules }} {{- end }}

Then create your SharedRuleGroup

apiVersion: cortex.platform.example.com/v1alpha1 kind: SharedRuleGroup metadata: name: platform-cpu-alerts namespace: platform-team spec: tenantRefs: - name: team-a-tenant - name: team-b-tenant - name: team-c-tenant namespace: "monitoring" groupName: "cpu-alerts" interval: "30s" rules: - alert: HighCPUUsage expr: 'rate(cpu_usage[5m]) > 0.8' for: "5m" labels: severity: warning annotations: summary: "CPU usage above 80% for 5 minutes"

This is the way most providers are written. For example provider-aws ships with the primitives VPC, Subnet, and Instance which is 1:1 with AWS API, but does not ship with XCluster which needs multiple primitives to define a Kubernetes cluster in crossplane.

forestsword · 2025-11-27T12:22:07Z

docs/proposals/crossplane-provider.md

+The RuleGroup CRD manages Prometheus alerting and recording rules within a Cortex namespace:
+
+```yaml
+apiVersion: config.cortexmetrics.io/v1alpha1


I've heard some talk of people wanting a prometheus operator compatible api for cortex CRDs. Would that be a goal here?

That is not a goal and IMHO would bring a lot of unnecessary complexity. If you have a proposal for how this could be done I'd be open to hear more.

forestsword · 2025-11-27T12:29:18Z

docs/proposals/crossplane-provider.md

+   - Applies necessary changes via HTTP API calls
+   - Updates resource status with current state and any errors
+
+2. **External Resource Identification**: Resources are identified using:


I think it's possible (obviously not ideal but I've seen a lot of mistakes in my life) that you could have the same alerts defined on two clusters in the exact same namespace name and without further identifying attributes they would conflict with each other. Each operator would try and take control. I think it might be necessary to provide additional identifying attributes to prevent conflicts like this. For instance each operator would be passed k8s.cluster.name at start as an identifying attribute and resources would be saved in cortex like k8s.cluster.name/k8s.namespace.name/resource. Wdyt?

Each object in the cortex API needs to have a unique ID and this scheme is a good example. The encoding scheme used to map clusters/tenants/namespaces to the objects in cortex is something we could document as a best practice. It would not be something explicitly enforced within the provider.

I am not very familiar with crossplane, but I think you can hit this problem with any crossplane provider. For example If you use s3.aws.crossplane.io to define an s3 bucket with the same name in the same region in 2 different Kubernetes clusters, conflicts will appear.

I think one way to solve this problem is to use crossplane compositions so that the tenant config is constructed from namespace name and the kubernetes cluster name.
https://docs.crossplane.io/latest/composition/compositions/

This is a good callout if you are running multiple operators. In general, Crossplane operators are run in a admin/central cluster, not the managed/edge clusters. That being said, this is a good call out and I will create an issue to address it.

forestsword · 2025-11-27T12:34:20Z

docs/proposals/crossplane-provider.md

+**Comparison**:
+- **Pros**: Direct control over implementation, no external dependencies
+- **Cons**:
+  - Requires building and maintaining complex controller infrastructure


I'm not sure what complex infrastructure would be required for a classic k8s operator other than running the operator and setting it up with the api server. Running crossplane is more complex from my perspective especially because its feature set extends way beyond just cortex. Could you provide an example?

While this is a Crossplane provider, it can work as a standalone Kubernetes operator. Crossplane providers are essentially highly opinionated operators that allow interaction with the Crossplane ecosystem (specifically XRDs). If we make our own operator, we'd have to define our own opinions. We get to re-use a lot of libraries, best practices etc. that the Crossplane community has already put a lot of thought into.

forestsword · 2025-11-27T12:36:18Z

docs/proposals/crossplane-provider.md

+- **Pros**: Direct control over implementation, no external dependencies
+- **Cons**:
+  - Requires building and maintaining complex controller infrastructure
+  - No composition or configuration management capabilities


I don't understand this. I don't see the responsibility of an operator to do this. It's the 'deployment delivery' tech that does this like helm or tanka etc. Could you provide an example of how the provider would do this?

This is referencing the crossplane concept of compositions where a crossplane admin team can create a high level compositions and with minimal less configuration will result. I gave an example of how it could be useful above when you asked how to implement the same rules across multiple tenants.

forestsword · 2025-11-27T12:37:14Z

docs/proposals/crossplane-provider.md

+- **Cons**:
+  - Requires building and maintaining complex controller infrastructure
+  - No composition or configuration management capabilities
+  - Limited reusability across different Kubernetes clusters


I disagree, not everyone can or will use crossplane, everyone can run a classic operator IMO.

Feel feee to run this as an operator and not run the full crossplane system. It works as a standalone operator.

forestsword · 2025-11-27T12:38:12Z

docs/proposals/crossplane-provider.md

+  - Requires building and maintaining complex controller infrastructure
+  - No composition or configuration management capabilities
+  - Limited reusability across different Kubernetes clusters
+  - Missing advanced features like external secret management


Could you provide an example? We'd be delivering secrets via the external secrets operator from vault. We would only need to reference the secret like described in the CRDs above.

Technically anything crossplane does you could implement yourself in an operator. Some things like cross-namespace secretRef come built in with crossplane library. Good comment.

forestsword · 2025-11-27T12:40:25Z

docs/proposals/crossplane-provider.md

+  - No composition or configuration management capabilities
+  - Limited reusability across different Kubernetes clusters
+  - Missing advanced features like external secret management
+  - Significant development and maintenance overhead


This is subjective. There's years of experience out there running and writing k8s operators, from opentelemetry to prometheus as examples. Crossplane is much younger and not a given. Kubebuilder for it limitations does provide a relief from much of the plumbing.

This goes back to us being able to use the development best practices and tools already written by the crossplane project. I wound up using the xp-provider-gen repository to stub out most of my provider. It uses the build/test best practices from the crossplane project and I got to focus on the business logic of interacting with cortex.

zanderfriz · 2025-12-01T20:37:57Z

I had written a long winded version of this comment but realized it was all just a matter of my internal organization's organization. In short we can't use this version of the operator because we can't run crossplane. We're an observability team and do not have the responsibility to run something that at the same time can provision s3 buckets.

Also, neither the Prometheus nor Opentelemetry operator, two work-horses of our observability infrastructure, require that we run crossplane, why should cortex?

Don't get me wrong, I don't want to trash the idea of crossplane. It's better for the cortex community to have a crossplane provider than nothing. But we won't be able to use it where I work and that makes me sad.

@forestsword I really appreciate the feedback. Technically you don't need to run the operators like provider-aws that enable the deployment of s3 buckets and other resources. You could just run Crossplane and have the cortex operator be the only one installed. That being said, internal organization polies are just that. I'd encourage you to try running the provider as a standalone operator. You can also save yourself some copy pasta using kustomize to manage your TenantConfig, RuleGroup, and AlertmanagerConfig objects which would let you easily share a base RuleGroup between clusters.

pull-request-size bot added the size/XL label Nov 3, 2025

dosubot bot added component/alertmanager component/rules Bits & bobs todo with rules and alerts: the ruler, config service etc. labels Nov 3, 2025

zanderfriz force-pushed the proposal-crossplane-provider branch from 3f0d67e to 2f35332 Compare November 4, 2025 19:12

Crossplane provider proposal

40fdd1f

Signed-off-by: afrisvold <afrisvold@apple.com>

zanderfriz force-pushed the proposal-crossplane-provider branch from 2f35332 to 40fdd1f Compare November 20, 2025 19:26

friedrichg approved these changes Nov 21, 2025

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Nov 21, 2025

yeya24 reviewed Nov 22, 2025

View reviewed changes

forestsword reviewed Nov 27, 2025

View reviewed changes

friedrichg requested a review from CharlieTLe December 1, 2025 20:18


		#### TenantConfig

		The TenantConfig CRD manages connection details and authentication for a specific Cortex tenant:

Crossplane provider proposal #7085

Are you sure you want to change the base?

Crossplane provider proposal #7085

Conversation

zanderfriz commented Nov 3, 2025

Uh oh!

friedrichg commented Nov 3, 2025

Uh oh!

friedrichg commented Nov 17, 2025

Uh oh!

friedrichg commented Nov 17, 2025

Uh oh!

alolita commented Nov 18, 2025

Uh oh!

SungJin1212 commented Nov 20, 2025

Uh oh!

zanderfriz commented Nov 21, 2025

Uh oh!

devopsjedi commented Nov 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

forestsword left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Create XRD

Create composition with function:

Then create your SharedRuleGroup

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

friedrichg Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zanderfriz Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zanderfriz commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

forestsword left a comment •

edited

Loading

friedrichg Dec 1, 2025 •

edited

Loading

zanderfriz Dec 1, 2025 •

edited

Loading