Add Alertmanager Integration Tests and Static File Backend#1686
Add Alertmanager Integration Tests and Static File Backend#1686jtlisi wants to merge 34 commits intocortexproject:masterfrom
Conversation
|
Where does the new protobuf come in? |
|
I chose to use a protobuf for the polling interface boundary to provide a easy to use storage format that wasn't yaml based and easy to version. This would make any additional backend based on cloud provider object stores much easier to add, while not reducing any of the current functionality. |
eaa9211 to
b32dbb3
Compare
d079fa6 to
e574088
Compare
pracucci
left a comment
There was a problem hiding this comment.
First time I'm seeing the multi-tenant alertmanager and config DB, so I have have miss something. I left some comments. All in all, looks doing it job. I left several comments to polish it up a bit (especially from the user perspective), plus I think we're not correctly handling the case an user config is removed (see comment in multitenant.go).
pkg/alertmanager/storage.go
Outdated
There was a problem hiding this comment.
Some thoughts:
- I would suggest to make the YAML field name explict. Given the
typecould befileorconfigdb, we may call these fields with the same name (to keep it specular), soyaml:"configdb"andyaml:"file". However,fileis a very bad name because very generic. - I've seen you've also used "local" sometimes in the code: maybe it's a better name for
file? - Is really a value naming the DB type
configdb? Isn't just adbthe backend storage?
What's your take?
There was a problem hiding this comment.
I think local is a better fit. It's what we use to designate the boltdb backend for chunks
pkg/alertmanager/storage.go
Outdated
There was a problem hiding this comment.
The CLI flags should have all the same prefix, then:
.type.<type>.<option>(ie..file.path)
Right now there's a bit of inconsistency:
-alertmanager.storage.type string
-alertmanager.storage.local.path string
-alertmanager.configs.auto-webhook-root string
-alertmanager.configs.client-timeout duration
-alertmanager.configs.fallback string
-alertmanager.configs.poll-interval duration
-alertmanager.configs.url value
There was a problem hiding this comment.
I think everything related to backend config storage should have a alertmanager.storage.. However, I wanted to avoid making a breaking change in this PR and updating the prefix for alertmanager.configs to alertmanager.storage.configs would force users to change their flags. I can add a TODO to make that change in the future and go throough a change/deprecation process in a future PR?
pkg/alertmanager/storage.go
Outdated
There was a problem hiding this comment.
mode>typegiven we call like this in the configrule> replacing it withalertmanageris more clear to the final user
There was a problem hiding this comment.
Good catch, a lot of this code is based on the Ruler.
There was a problem hiding this comment.
I would suggest to name the package and file based on the content. For example, local/alert_client.go makes me think about "local" storage (while it's named file) and "client" is actually code "store" in the code.
There was a problem hiding this comment.
Is /etc/cortex/alertmanager_configs/ a good default? We don't have any default filepath in that location. Looks more inherited by our cluster setup, than picked to be a good default to me.
pkg/alertmanager/alerts/compat.go
Outdated
There was a problem hiding this comment.
Not needed in this PR, I'll remove this file
pkg/alertmanager/alerts/compat.go
Outdated
There was a problem hiding this comment.
Not needed in this PR, I'll remove this file
4f81964 to
65b264b
Compare
|
@pracucci this should be good for a second look |
There was a problem hiding this comment.
Please add an entry to CHANGELOG.md explaning all changes (this is a config breaking change).
pkg/cortex/modules.go
Outdated
There was a problem hiding this comment.
We specify the err variable in the function signature and utilize a naked return pattern throughout the modules.go file. I personally like to explicitly return the error so I'll update it.
pkg/alertmanager/multitenant.go
Outdated
There was a problem hiding this comment.
This MultitenantAlertmanager: prefix repeated across all logs looks a bit annoying. If you want to clean it up, you may pick the logger in input in NewMultitenantAlertmanager(), wrap it with log.With(util.Logger, "component", "multiteant-alertmanager") and then always use that internally.
Up to you.
There was a problem hiding this comment.
I'll use an embedded logger with a component field.
pkg/alertmanager/multitenant.go
Outdated
There was a problem hiding this comment.
What's this TODO about? Doable in this PR too?
There was a problem hiding this comment.
I broke the totalConfigs metric up to have a status label
pkg/alertmanager/multitenant.go
Outdated
pkg/alertmanager/alerts/store.go
Outdated
There was a problem hiding this comment.
Being the configdb store, may make sense to move it to pkg/alertmanager/alerts/configdb/store.go and just move the AlertStore interface back into pkg/alertmanager/alerts?
There was a problem hiding this comment.
Makes sense, I will also move the pkg/alertmananger/local package to be pkg/alertmanager/alerts/local
pkg/alertmanager/alerts/store.go
Outdated
There was a problem hiding this comment.
ConfigDBAlertStore would be more appropriate? Just asking, up to you.
pkg/alertmanager/alerts/store.go
Outdated
There was a problem hiding this comment.
Why do we need to retain it? Looking at ListAlertConfigs() implementation looks not useful to me.
There was a problem hiding this comment.
configs, err := c.configClient.GetAlerts(ctx, c.since)
Will only return configs that have IDs greater than c.since. This avoids the configdb having to do a full pull of it's table.
pkg/alertmanager/local/store.go
Outdated
There was a problem hiding this comment.
As previously mentioned, I would add a comment to explain that the filename must match the user ID.
There was a problem hiding this comment.
Sorry I didn't rectify this previously
pkg/alertmanager/local/store.go
Outdated
There was a problem hiding this comment.
As previously mentioned, I would reverse this into a guard. Looks easier to read the code, if we return nil in case it's not a file or not a YAML extension.
I would also suggest to log with a Warn any skipped entry, because we wouldn't expect any spurious file/directory within f.cfg.Path, right?
There was a problem hiding this comment.
Sorry I didn't rectify this previously
712b5e2 to
545835c
Compare
pracucci
left a comment
There was a problem hiding this comment.
Bash-based integration tests are no more a thing. I see two options:
- Strip away integration tests from this PR and eventually open a new PR for them
- Move these tests to the new Go integration tests
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
545835c to
cbd8c18
Compare
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
|
Closing in favor of #2125 |
Overview
Important Detail
pkg/alertmanager/multitenant_test.gois run with!racetag because of an underlying mild race condition in the Alertmanager.https://github.com/prometheus/alertmanager/issues/2182Related
#1647
#1244
#1513
#619