-
Notifications
You must be signed in to change notification settings - Fork 16
DNS Groups. #669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
DNS Groups. #669
Conversation
5f9673d to
29c24eb
Compare
...etup/dnsrecords/delegating/coredns/loadbalanced/dnsrecord-loadbalanced-coredns-cluster1.yaml
Outdated
Show resolved
Hide resolved
eaf200f to
6795ee1
Compare
| log | ||
|
|
||
| rewrite name regex kuadrant-active-groups\.(.*)k.example\.com kuadrant-active-groups-coredns.pb.hcpapps.net | ||
| forward kuadrant-active-groups-coredns.pb.hcpapps.net /etc/resolv.conf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will need to be updated to allow this custom host to be passed in at set up time, somehow, that's for another ticket: #670
098c135 to
e9364ce
Compare
Boomatang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a number of questions. I know it is still a draft and subject to change. For the reason I didn't look at any test changes.
b43bd9a to
9e54ed6
Compare
Boomatang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still haven't looked at the test. Seen a few very small things. But I started to question the use of group field on the remote reconciles.
I am going to start setting this up locally and play around with it.
| activeGroups := r.getActiveGroups(ctx, c, dnsRecord) | ||
|
|
||
| // only process unpublish when there are active groups and we are reconciling a record from an active group | ||
| if len(activeGroups) == 0 || !dnsRecord.IsActive() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this now can be simplified.
| if len(activeGroups) == 0 || !dnsRecord.IsActive() { | |
| if !dnsRecord.IsActive() { |
5caf25d to
a26babc
Compare
Signed-off-by: Phil Brookes <pbrookes@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
| ) | ||
|
|
||
| const ( | ||
| activeGroupsTXTRecordName = "kuadrant-active-groups" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you replace this with this constant as well, please?
| SetStatusConditions(hadChanges bool) | ||
| SetStatusCondition(conditionType string, status metav1.ConditionStatus, reason, message string) | ||
| ClearStatusCondition(conditionType string) | ||
| GetStatusCondition(conditionType string) *metav1.Condition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to never use the GetStatusCondition. We have meta.FindStatusCondition() instead. Is this an issue?
| } | ||
|
|
||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will work with all TXT records in the zone. If it happens to hold a random TXT record with groups=cat in it, this might cause a problem. I don't see that happening, to be honest, but it is a theoretical possibility. If we can't rely on the name of the TXT record here, why don't we take the route ExternalDNS took with the heritage prefix?
I would not expect it to be done in this PR tho - as it is a "theoretical edge case that will, probably, never happen"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an alternative, we could poke the provider (like we do below) instead of doing a lookup. But I'd imagine there was a good reason we went an extra mile to do a proper lookup.
| // | ||
| // Returns a list of nameserver addresses (e.g., ["10.96.0.10:53", "10.96.0.11"]) | ||
| // Returns an empty list if no NAMESERVERS field is found or the secret doesn't exist. | ||
| func (r *BaseDNSRecordReconciler) getNameserversFromProvider(ctx context.Context, c client.Client, dnsRecord DNSRecordAccessor) ([]string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the baseReconciler methods. Why aren't they in the base_dnsrecord_reconciler.go?
| dnsRecord.SetStatusCondition(string(v1alpha1.ConditionTypeReady), metav1.ConditionTrue, string(v1alpha1.ConditionReasonProviderSuccess), "Provider ensured the dns record") | ||
| dnsRecord.SetStatusObservedGeneration(dnsRecord.GetDNSRecord().GetGeneration()) | ||
|
|
||
| logger.Info("unpublish section start") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a debug call that was left behind
| Scheme: mgr.GetScheme(), | ||
| ProviderFactory: providerFactory, | ||
| DelegationRole: delegationRole, | ||
| TXTResolver: &MockTXTResolver{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only works because it is a pointer, and we manipulate this resolver as the test case goes? Manipulate == change the LookupTXT result by adding/removing DNSRecords from the resolver.
This is a curiosity question
| Host string | ||
| Groups map[types.Group]*RegistryGroup | ||
| UngroupedOwners map[string]*RegistryOwner | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found it strange earlier: the OwnerID of RegistryOwner, GroupID of RegistryGroup and Host of RegistryHost seem to never be read (except for one of them once in a test case) as they are repeating information we get from the keys of the respective maps.
I remember the reasoning behind this - convenience and flexibility. But can I ask for a mention somewhere around that the key in the Owners is the same as the OwnerID of the RegistryOwner (for all of the "duplicates")
| txtFormatVersion = "1" | ||
| externalDNSMapperVersion = "" | ||
|
|
||
| GroupLabelKey = "group" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this key in the types/group.go
| return activeGroups | ||
| } | ||
| // Parse the TXT record to extract active groups | ||
| // Expected format: "groups=group1&&group2&&group3;version=1;other=value" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expected most parts of this format to reside in constants and be reused in places we want to reuse them.
This PR introduces DNS Groups functionality to enable active-passive failover for DNS records across multiple clusters. Groups allow operators to control which cluster's DNS records are published based on a configurable "active groups" list, enabling scenarios like:
Changes
API Changes:
Core Implementation (internal/controller/dnsrecord_groups.go - new file):
Controller Changes (internal/controller/dnsrecord_controller.go):
Test Coverage:
How It Works
Each DNS operator instance is started with a group identifier:
--group=us-east
or
GROUP=us-east
Records managed by that operator inherit the group assignment in their status.
The active groups list is stored as a TXT record in DNS:
kuadrant-active-groups.example.com TXT "groups=us-east&&us-west;version=1"
Before publishing DNS records, each controller:
Queries the active groups TXT record
Compares its group against the active groups list
If inactive: Updates status condition and requeues (15s)
If active: Publishes its records AND cleans up records from inactive groups
Ungrouped Records
Records without a group assignment (group="") are always active and published alongside whichever groups are currently active. They will never process unpublishing of records.
Example Scenario:
Setup:
Active groups = ["us-east"]:
Published: 1.2.3.4, 9.9.9.9
Switch active groups to ["us-west"]:
Published: 5.6.7.8, 9.9.9.9
(Cluster B unpublishes stale 1.2.3.4)
Manual Verification Instructions
Prerequisites
Option 1: Verification with AWS Route53
Setup:
make local-setup with 2 clusters, and deploy true. Then edit the deployments to set the group runtime argument (e.g. us-east and us-west).
Create test DNSRecords:
In cluster-1
In cluster-2
Set initial active groups (us-east):
Using kuadrant-dns-cli, set the active group to us-east
Verify initial state:
Query DNS (should return 1.2.3.4)
dig test-groups.example.com +short
Check Route53 for published records
aws route53 list-resource-record-sets --hosted-zone-id
--query "ResourceRecordSets[?Name=='test-groups.example.com.']"
Use dns cli to update the TXT record in Route53 to us-west
Wait 15-30 seconds for reconciliation, then verify:
Query DNS (should now return 5.6.7.8)
dig test-groups.example.com +short
Verification Checklist
Related Issues: #620