Add monitoring.json snapshotting#3104
Add monitoring.json snapshotting#3104rob05c merged 1 commit intoapache:masterfrom ezelkow1:tmjson_snap
Conversation
|
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Looks good! Tested, works as expected.
The only big things are the missing error checks, I'll merge as soon as those are fixed.
A CHANGELOG.md entry would be good, too, especially noting that a Snapshot is required (people should always snapshot after upgrading, but it's strictly required for this one).
There was a problem hiding this comment.
Looks like we are just changing the behavior of the end point that TM uses, do you feel that there is value in knowing what the non-snapped Monitor config would look like? Seems to me like we should have two seperate end points, one for the monitoring snapshot and one for the raw monitoring.config file.
There was a problem hiding this comment.
It does change the behavior, because the existing behavior is fundamentally a bug, and it's never correct to use non-snapshotted data for monitoring.
do you feel that there is value in knowing what the non-snapped Monitor config would look like? Seems to me like we should have two seperate end points, one for the monitoring snapshot and one for the raw monitoring.config file.
I think that sounds good in theory, but in practice, I'm very concerned that someone might use the data, and it's always wrong to use it for anything practical. The monitoring data must always be synchronized with the CRConfig snapshot, or the race conditions in #1738 will happen.
But I can see the use case, e.g. for GUIs like the Portal to be able to present a diff. The CRConfig has a "raw" endpoint at /cdns/{cdn}/snapshot/new. Following that convention, what if we add a new un-snapshotted endpoint at cdns/{cdn}/configs/monitoring/new, and extensively document in the API Documentation, above the route in the routes.go file, and a GoDoc comment over the function itself, that it MUST NOT ever be used for any logic decisions, but only for presenting users with a comparison. How does that sound?
I understand the use case, but it really worries me if anyone were ever to use it for anything, they'd reintroduce the subtle and dangerous bug.
There was a problem hiding this comment.
Yes, I think the idea is that at some point we will want to be able to diff what is snapshotted vs what is raw. There could also be some future use cases where the raw data really is desired, I just want to make sure that we keep it available like it is today.
There was a problem hiding this comment.
@dneuman64 would you be good with us opening up a separate issue for that? We were throwing around some ideas like a diff end point instead of a full raw one and things like that, but its a bit out of the scope of this PR
There was a problem hiding this comment.
Yeah, I think a new issue is fine.
rob05c
left a comment
There was a problem hiding this comment.
Tested with the latest changes, still works as expected.
These changes are to fix the issue of the monitoring.json and crconfig's being out of sync. Currently it is possible to make a change to a status (or other field that the monitoring.json presents) and have it show up immediately while the changes are not reflected in a crconfig until it has been snapshotted, causing an inconcsistency across the files. This will change the column name in the snapshot table from 'content' to 'crconfig' and then add a new column 'monitoring'. When a snapshot is done we then store both the crconfig and monitoring.json in the table at the same time, maintaining consistency. Monitoring.json is then pulled from the table instead of being generated on the fly like was done previously.
These changes are to fix the issue of the monitoring.json and crconfig's being out of sync. Currently it is possible to make a change to a status (or other field that the monitoring.json presents) and have it show up
immediately while the changes are not reflected in a crconfig until it has been snapshotted, causing an inconcsistency across the files.
This will change the column name in the snapshot table from 'content' to 'crconfig' and then add a new column 'monitoring'. When a snapshot is done we then store both the crconfig and monitoring.json in the table at
the same time, maintaining consistency. Monitoring.json is then pulled from the table instead of being generated on the fly like was done previously.
What does this PR do?
Fixes #1738
Which TC components are affected by this PR?
What is the best way to verify this PR?
Check all that apply