Skip to content

Comments

Bump to v2.10.0#31

Merged
openshift-merge-robot merged 260 commits intoopenshift:masterfrom
simonpasquier:aos-bump-v2.10.0
May 29, 2019
Merged

Bump to v2.10.0#31
openshift-merge-robot merged 260 commits intoopenshift:masterfrom
simonpasquier:aos-bump-v2.10.0

Conversation

@simonpasquier
Copy link

cc @brancz
Not be merged immediately as 2.10.0 is the brand new version and I need to test that prometheus#5582 didn't break the alerting console. Other than that, there should some nice performance improvements compared to v2.7.2 that is the previously shipped version.

vn-ki and others added 30 commits January 31, 2019 17:03
Signed-off-by: Vishnunarayan K I <appukuttancr@gmail.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
…heus#5158)

This makes things generally more resilient, and will
help with OpenMetrics transitions (and inconsistencies).

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
Update error message in the extractTimeRange function
to match function's logic

Signed-off-by: zhulongcheng <zhulongcheng.me@gmail.com>
Predeclare and reuse errors to reduce duplicate code

Signed-off-by: zhulongcheng <zhulongcheng.me@gmail.com>
It should be a unix timestamp, not the seconds in the minute.

Signed-off-by: beorn7 <beorn@soundcloud.com>
Fix prometheus_rule_group_last_evaluation_timestamp_seconds
Signed-off-by: beorn7 <beorn@soundcloud.com>
Signed-off-by: Erik Hollensbe <github@hollensbe.org>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* web: updated bootstrap3-typeahead file to work with bootstrap 4.0.0

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: Replaced bootstrap-3.3.1 with bootstrap 4.0.0

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: Added bootstrap4-glyphicons as 4.0.0 doesnt include bootstrap3 glyphicons

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: updated js jquery to 3.3.1

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: updated _base.html to import new bootstrap 4.0.0, jquery3.3.1 and bootstrap class tags to be 4.0 compatible

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: _base.html missed word out in title tag (Server).

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: updated alerts.html class names and tags to be bootstrap 4 compatible.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: updated config.html class names and tags to be bootstrap 4 compatible.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: updated flags.html class names and tags to be bootstrap 4 compatible.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: updated service-discovery.html class names and tags to be bootstrap 4 compatible.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: updated status.html class names and tags to be bootstrap 4 compatible.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: updated targets.html class names and tags to be bootstrap 4 compatible.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: updated graph_template.handlebar class names and tags to be bootstrap 4 compatible.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: alerts.css fix for button color inheritance on alerts page.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: graph.css fix for color inheritance.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: prometheus.css updated to fix nav bar.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: previous merge conflict not fixed correctly on _base.html

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* menu.lib and prom.lib imports updated

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* bootstrap 4.1.3 imported

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Bootstrap 4.1.3 imported into _base.html

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* bootstrap 4.1.3 imported into prom.lib

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* menu.lib style adjusted to view sidebar

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Alert colour uplifted to bootstrap 4.1.3

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Alerts display code reformatted similarly to config

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Consoles pages adjusted to account for new navbar

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* LHS Menu fixed in console pages

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Minor changes to prom_console to adjust lhs nav

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Prom.lib and some css updated to fix console graph controls

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Bootstrap 4.0.0 files removed

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Consoles configured so that the graph fits with the new side bar, css files also adjusted

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Import popper.min.js for dropdowns

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Popper.min.js imported locally

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Re-added prometheus#4764 and fixed css

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Removed .DS_Store

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Rebuilt assets

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Spaces between buttons and inputs on graph page removed

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* fixed spacing in buttons on /targets

Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com>

* Updated vfsdata.go

Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com>

* fixed typeahead issue

Signed-off-by: James Ritchie <james.g.ritchie@baesystems.com>

* added css for dropdown

Signed-off-by: James Ritchie <james.g.ritchie@baesystems.com>

* changed order of css imports

Signed-off-by: James Ritchie <james.g.ritchie@baesystems.com>

* tinkered with CSS changes to make keyboard select and mouseover match

Signed-off-by: James Ritchie <james.g.ritchie@baesystems.com>
Signed-off-by: Minh-Long  Do <minhlong.langos@gmail.com>
…enarios (prometheus#5189)

Signed-off-by: tariqibrahim <tariq181290@gmail.com>
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
1. Added an ability to resize text area on mouseclick
2. Remember selected target status button on page reload

Signed-off-by: Maria Nemtinova <nemtinovamasha@gmail.com>
This change switches the remote_write API to use the TSDB WAL.  This should reduce memory usage and prevent sample loss when the remote end point is down.

We use the new LiveReader from TSDB to tail WAL segments.  Logic for finding the tracking segment is included in this PR.  The WAL is tailed once for each remote_write endpoint specified. Reading from the segment is based on a ticker rather than relying on fsnotify write events, which were found to be complicated and unreliable in early prototypes.

Enqueuing a sample for sending via remote_write can now block, to provide back pressure.  Queues are still required to acheive parallelism and batching.  We have updated the queue config based on new defaults for queue capacity and pending samples values - much smaller values are now possible.  The remote_write resharding code has been updated to prevent deadlocks, and extra tests have been added for these cases.

As part of this change, we attempt to guarantee that samples are not lost; however this initial version doesn't guarantee this across Prometheus restarts or non-retryable errors from the remote end (eg 400s).

This changes also includes the following optimisations:
- only marshal the proto request once, not once per retry
- maintain a single copy of the labels for given series to reduce GC pressure

Other minor tweaks:
- only reshard if we've also successfully sent recently
- add pending samples, latest sent timestamp, WAL events processed metrics

Co-authored-by: Chris Marchbanks <csmarchbanks.com> (initial prototype)
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> (sharding changes)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
- Remove datarace in the exported highest scrape timestamp.
- Backoff on enqueue should be per-sample - reset the result for each sample.
- Remove diffKeys, unused ctx and cancelfunc in WALWatcher, 'name' from writeTo interface, and pass it to constructor.
- Reorder functions in WALWatcher depth-first according to call graph.
- Fix vendor/modules.txt.
- Split out the various timer periods into consts at the top of the file.
- Move w.currentSegmentMetric.Set close to where we set the currentSegment.
- Combine r.Next() and isClosed(w.quit) into a single loop.
- Unnest some ifs in WALWatcher.watch, propagate erros in decodeRecord, add some new lines to make it easier to read.
- Reorganise checkpoint handling to reduce nesting and make it easier to follow.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
- Use the queue name in WAL watcher logging.
- Don't return from watch if the reader error was EOF.
- Fix sample timestamp check logic regarding what samples we send.
- Refactor so we don't need readToEnd/readSeriesRecords
- Fix wal_watcher tests since readToEnd no longer exists

Signed-off-by: Callum Styan <callumstyan@gmail.com>
* scrape: catch errors when creating HTTP clients

This change makes sure that no scrape pool is created with a nil HTTP
client.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Address Tariq's comment

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Address Brian's comment

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Propose myself (Ganesh, @codesome) as 2.8 release shepherd
* storage/remote: adapt tests for Travis CI

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Check filesystems on Travis environment

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Run remote/storage tests on CircleCI for troubleshooting

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Try using tmpfs partition

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Revert "Try using tmpfs partition"

This reverts commit 85a30de.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Don't store labels in writeToMock

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Fix data race

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Bump retries to 100 meaning that the total timeout is 10s

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* clean up .travis.yml

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* code fixup

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Remove unneeded empty line

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: Julius Volz <julius.volz@gmail.com>
beorn7 and others added 9 commits May 21, 2019 01:28
* web/api/v1: alert value as string in alert/rules endpoints

Signed-off-by: Alexander Saltykov <alexander-s@yandex-team.ru>
Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com>
Signed-off-by: beorn7 <bjoern@rabenste.in>
Signed-off-by: bevisy <binbin36520@gmail.com>
Signed-off-by: bevisy <binbin36520@gmail.com>
Signed-off-by: beorn7 <bjoern@rabenste.in>
@simonpasquier
Copy link
Author

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 27, 2019
@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 27, 2019
@brancz
Copy link

brancz commented May 28, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 28, 2019
v2.10.0

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label May 28, 2019
@simonpasquier
Copy link
Author

Alert console seems to be ok:

image

@brancz
Copy link

brancz commented May 28, 2019

Nice!

@brancz
Copy link

brancz commented May 28, 2019

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 28, 2019
@brancz
Copy link

brancz commented May 29, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 29, 2019
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: brancz, simonpasquier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit e08bbb2 into openshift:master May 29, 2019
openshift-merge-bot bot pushed a commit that referenced this pull request Oct 9, 2025
When doing a config reload that need to stop some providers while also sending SIGTERM to Prometheus at the same time can sometimes hang

1: sync.WaitGroup.Wait [83 minutes] [Created by run.(*Group).Run in goroutine 1 @ group.go:37]
    sync         sema.go:110              runtime_SemacquireWaitGroup(*uint32(#166))
    sync         waitgroup.go:118         (*WaitGroup).Wait(*WaitGroup(#23))
    discovery    manager.go:276           (*Manager).ApplyConfig(#23, #167)
    main         main.go:964              main.func5(#120)
    main         main.go:1505             reloadConfig({#183, 0x1b}, 1, #40, #43, #50, {#31, 0xa, 0})
    main         main.go:1182             main.func22()
    run          group.go:38              (*Group).Run.func1(*Group(#26), #51)

Add a test for it.

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.