-
Notifications
You must be signed in to change notification settings - Fork 840
Crossplane provider proposal #7085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Crossplane provider proposal #7085
Conversation
|
thanks!. please follow https://github.com/cortexproject/cortex/pull/7085/checks?check_run_id=54406852290 to fix DCO |
3f0d67e to
2f35332
Compare
|
@zanderfriz please rebase to have CI pass the PR. We made some changes in GitHub Actions |
|
I am in support of this proposal I have 2 requests to merge this as accepted:
|
|
+1 on making sure there are at least 2 maintainers for this provider component. Support a separate repo within the project. |
|
+1 |
Signed-off-by: afrisvold <afrisvold@apple.com>
2f35332 to
40fdd1f
Compare
|
After discussing with @devopsjedi, he said he would be happy to be a maintainer on this project |
Agreed- excited to support this effort! |
|
|
||
| #### TenantConfig | ||
|
|
||
| The TenantConfig CRD manages connection details and authentication for a specific Cortex tenant: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why connection and auth only? And how tenant config will be consumed by Cortex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TenantConfig is the configuration for the crossplane provider to connect to the cortex instance as a Tennant. It is not for configuring a Tenant on cortex as the cortex administrator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had written a long winded version of this comment but realized it was all just a matter of my internal organization's organization. In short we can't use this version of the operator because we can't run crossplane. We're an observability team and do not have the responsibility to run something that at the same time can provision s3 buckets.
Also, neither the Prometheus nor Opentelemetry operator, two work-horses of our observability infrastructure, require that we run crossplane, why should cortex?
Don't get me wrong, I don't want to trash the idea of crossplane. It's better for the cortex community to have a crossplane provider than nothing. But we won't be able to use it where I work and that makes me sad.
| providerConfigRef: | ||
| name: cortex-config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this referring to? Is it crossplane specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is referencing the Crossplane provider config for the cortex provider.
| tenantConfigRef: | ||
| name: production-tenant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We run multiple clusters and it would be helpful to be able to specify multiple clusters where rules should be deployed to. Otherwise we'd need a RuleGroup per cortex cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a use case where Crossplane can shine. The provider's job is to provide the primitive objects you need. As the platform owner, you can create an XRD/Composition which fits your use case. For example:
Create XRD
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
name: xsharedrulegroups.cortex.platform.example.com
spec:
group: cortex.platform.example.com
names:
kind: XSharedRuleGroup
plural: xsharedrulegroups
claimNames:
kind: SharedRuleGroup # What users create
plural: sharedrulegroups
versions:
- name: v1alpha1
served: true
referenceable: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
tenantRefs:
type: array
description: "List of TenantConfigs to apply rules to"
items:
type: object
properties:
name:
type: string
namespace:
type: string
required: [name]
namespace:
type: string
description: "Cortex rules namespace"
groupName:
type: string
description: "Rule group name"
interval:
type: string
default: "1m"
rules:
type: array
description: "Alert/recording rules"
# ... same schema as RuleGroup.spec.forProvider.rules
required:
- tenantRefs
- namespace
- groupName
- rulesCreate composition with function:
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
name: sharedrulegroup-fanout
spec:
compositeTypeRef:
apiVersion: cortex.platform.example.com/v1alpha1
kind: XSharedRuleGroup
mode: Pipeline
pipeline:
- step: fan-out-to-tenants
functionRef:
name: function-go-templating
input:
apiVersion: gotemplating.fn.crossplane.io/v1beta1
kind: GoTemplate
source: Inline
inline:
template: |
{{- range $i, $tenant := .observed.composite.resource.spec.tenantRefs }}
---
apiVersion: config.cortexmetrics.io/v1alpha1
kind: RuleGroup
metadata:
name: {{ $.observed.composite.resource.metadata.name }}-{{ $tenant.name }}
annotations:
crossplane.io/composition-resource-name: rulegroup-{{ $tenant.name }}
spec:
forProvider:
tenantConfigRef:
name: {{ $tenant.name }}
{{- if $tenant.namespace }}
namespace: {{ $tenant.namespace }}
{{- end }}
namespace: {{ $.observed.composite.resource.spec.namespace }}
groupName: {{ $.observed.composite.resource.spec.groupName }}
{{- if $.observed.composite.resource.spec.interval }}
interval: {{ $.observed.composite.resource.spec.interval }}
{{- end }}
rules: {{ toJson $.observed.composite.resource.spec.rules }}
{{- end }}Then create your SharedRuleGroup
apiVersion: cortex.platform.example.com/v1alpha1
kind: SharedRuleGroup
metadata:
name: platform-cpu-alerts
namespace: platform-team
spec:
tenantRefs:
- name: team-a-tenant
- name: team-b-tenant
- name: team-c-tenant
namespace: "monitoring"
groupName: "cpu-alerts"
interval: "30s"
rules:
- alert: HighCPUUsage
expr: 'rate(cpu_usage[5m]) > 0.8'
for: "5m"
labels:
severity: warning
annotations:
summary: "CPU usage above 80% for 5 minutes"This is the way most providers are written. For example provider-aws ships with the primitives VPC, Subnet, and Instance which is 1:1 with AWS API, but does not ship with XCluster which needs multiple primitives to define a Kubernetes cluster in crossplane.
| The RuleGroup CRD manages Prometheus alerting and recording rules within a Cortex namespace: | ||
|
|
||
| ```yaml | ||
| apiVersion: config.cortexmetrics.io/v1alpha1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've heard some talk of people wanting a prometheus operator compatible api for cortex CRDs. Would that be a goal here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not a goal and IMHO would bring a lot of unnecessary complexity. If you have a proposal for how this could be done I'd be open to hear more.
| - Applies necessary changes via HTTP API calls | ||
| - Updates resource status with current state and any errors | ||
|
|
||
| 2. **External Resource Identification**: Resources are identified using: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's possible (obviously not ideal but I've seen a lot of mistakes in my life) that you could have the same alerts defined on two clusters in the exact same namespace name and without further identifying attributes they would conflict with each other. Each operator would try and take control. I think it might be necessary to provide additional identifying attributes to prevent conflicts like this. For instance each operator would be passed k8s.cluster.name at start as an identifying attribute and resources would be saved in cortex like k8s.cluster.name/k8s.namespace.name/resource. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each object in the cortex API needs to have a unique ID and this scheme is a good example. The encoding scheme used to map clusters/tenants/namespaces to the objects in cortex is something we could document as a best practice. It would not be something explicitly enforced within the provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not very familiar with crossplane, but I think you can hit this problem with any crossplane provider. For example If you use s3.aws.crossplane.io to define an s3 bucket with the same name in the same region in 2 different Kubernetes clusters, conflicts will appear.
I think one way to solve this problem is to use crossplane compositions so that the tenant config is constructed from namespace name and the kubernetes cluster name.
https://docs.crossplane.io/latest/composition/compositions/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good callout if you are running multiple operators. In general, Crossplane operators are run in a admin/central cluster, not the managed/edge clusters. That being said, this is a good call out and I will create an issue to address it.
| **Comparison**: | ||
| - **Pros**: Direct control over implementation, no external dependencies | ||
| - **Cons**: | ||
| - Requires building and maintaining complex controller infrastructure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what complex infrastructure would be required for a classic k8s operator other than running the operator and setting it up with the api server. Running crossplane is more complex from my perspective especially because its feature set extends way beyond just cortex. Could you provide an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is a Crossplane provider, it can work as a standalone Kubernetes operator. Crossplane providers are essentially highly opinionated operators that allow interaction with the Crossplane ecosystem (specifically XRDs). If we make our own operator, we'd have to define our own opinions. We get to re-use a lot of libraries, best practices etc. that the Crossplane community has already put a lot of thought into.
| - **Pros**: Direct control over implementation, no external dependencies | ||
| - **Cons**: | ||
| - Requires building and maintaining complex controller infrastructure | ||
| - No composition or configuration management capabilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. I don't see the responsibility of an operator to do this. It's the 'deployment delivery' tech that does this like helm or tanka etc. Could you provide an example of how the provider would do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is referencing the crossplane concept of compositions where a crossplane admin team can create a high level compositions and with minimal less configuration will result. I gave an example of how it could be useful above when you asked how to implement the same rules across multiple tenants.
| - **Cons**: | ||
| - Requires building and maintaining complex controller infrastructure | ||
| - No composition or configuration management capabilities | ||
| - Limited reusability across different Kubernetes clusters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree, not everyone can or will use crossplane, everyone can run a classic operator IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel feee to run this as an operator and not run the full crossplane system. It works as a standalone operator.
| - Requires building and maintaining complex controller infrastructure | ||
| - No composition or configuration management capabilities | ||
| - Limited reusability across different Kubernetes clusters | ||
| - Missing advanced features like external secret management |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide an example? We'd be delivering secrets via the external secrets operator from vault. We would only need to reference the secret like described in the CRDs above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically anything crossplane does you could implement yourself in an operator. Some things like cross-namespace secretRef come built in with crossplane library. Good comment.
| - No composition or configuration management capabilities | ||
| - Limited reusability across different Kubernetes clusters | ||
| - Missing advanced features like external secret management | ||
| - Significant development and maintenance overhead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is subjective. There's years of experience out there running and writing k8s operators, from opentelemetry to prometheus as examples. Crossplane is much younger and not a given. Kubebuilder for it limitations does provide a relief from much of the plumbing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This goes back to us being able to use the development best practices and tools already written by the crossplane project. I wound up using the xp-provider-gen repository to stub out most of my provider. It uses the build/test best practices from the crossplane project and I got to focus on the business logic of interacting with cortex.
@forestsword I really appreciate the feedback. Technically you don't need to run the operators like provider-aws that enable the deployment of s3 buckets and other resources. You could just run Crossplane and have the cortex operator be the only one installed. That being said, internal organization polies are just that. I'd encourage you to try running the provider as a standalone operator. You can also save yourself some copy pasta using kustomize to manage your TenantConfig, RuleGroup, and AlertmanagerConfig objects which would let you easily share a base RuleGroup between clusters. |
What this PR does: Introduces a proposal for a crossplane provider to the cortex project) to declaratively manage Cortex Alertmanager and Ruler configurations through Kubernetes Custom Resources.
Which issue(s) this PR fixes: N/A
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]