Skip to content

[branch-2.10] [fix][broker] Fix isolated group not work#21097

Closed
horizonzy wants to merge 1026 commits intoapache:masterfrom
horizonzy:branch-2.10-fix-isolated-group-not-work
Closed

[branch-2.10] [fix][broker] Fix isolated group not work#21097
horizonzy wants to merge 1026 commits intoapache:masterfrom
horizonzy:branch-2.10-fix-isolated-group-not-work

Conversation

@horizonzy
Copy link
Copy Markdown
Member

Motivation

This is a cherry-pick commit for #21096.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

zymap and others added 30 commits February 6, 2023 14:47
…tent topic timeout (apache#19454)

Co-authored-by: Tao Jiuming <95597048+tjiuming@users.noreply.github.com>
…ache#12615)" (apache#19439)

This reverts commit 62e2547.

### Motivation

The motivation for apache#12615 relies on an incorrect understanding of Netty's threading model. The `ctx.executor()` is the context's event loop thread that is the same thread used to process messages. The `waitingForPingResponse` variable is only ever updated/read from the context's event loop, so there is no need to make the variable `volatile`.

### Modifications

* Remove `volatile` keyword for `waitingForPingResponse`

### Verifying this change

Read through all references to the variable.

### Documentation

- [x] `doc-not-needed`

### Matching PR in forked repository

PR in forked repository: Skipping for this trivial PR.

(cherry picked from commit fb28d83)
…ot aware rack info problem. (apache#18672)

(cherry picked from commit 43335fb)
… txn race condition. (apache#19201)

Fixes apache#19200

transaction lasted for long time and will not be aborted, which cause TB's MaxReadPosition do not move and will not take snapshot. With an old snapshot, TB will read a lot of entry while doing recovery.
In worst cases, there are 30 minutes of unavailable time with Topics.

avoid concurrent execution.

(cherry picked from commit 96f4161)
When a Pulsar topic is unloaded from a broker, certain metrics related to that topic will appear to remain active for the broker for 5 minutes. This is confusing for troubleshooting because it makes the topic appear to be owned by multiple brokers for a short period of time. See below for a way to reproduce this behavior.

In order to solve this "zombie" metric problem, I propose we remove the timestamps that get exported with each Prometheus metric served by the broker.

Since we introduced Prometheus metrics in apache#294, we have exported a timestamp along with most metrics. This is an optional, valid part of the spec defined [here](https://prometheus.io/docs/instrumenting/exposition_formats/#comments-help-text-and-type-information). However, after our adoption of Prometheus metrics, the Prometheus project released version 2.0 with a significant improvement to its concept of staleness. In short, before 2.0, a metric that was in the last scrape but not the next one (this often happens for topics that are unloaded) will essentially inherit the most recent value for the last 5 minute window. If there isn't one in the past 5 minutes, the metric becomes "stale" and isn't reported. Starting in 2.0, there was new logic to consider a value stale the very first time that it is not reported in a scrape. Importantly, this new behavior is only available if you do not export timestamps with metrics, as documented here: https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness. We want to use the new behavior because it gives better insight into all topic metrics, which are subject to move between brokers at any time.

This presentation https://www.youtube.com/watch?v=GcTzd2CLH7I and slide deck https://promcon.io/2017-munich/slides/staleness-in-prometheus-2-0.pdf document the feature in detail. This blog post was also helpful: https://www.robustperception.io/staleness-and-promql/.

Additional motivation comes from mailing list threads like this one https://groups.google.com/g/prometheus-users/c/8OFAwp1OEcY. It says:

> Note, however, that adding timestamps is an extremely niche use
case. Most of the users who think the need it should actually not do
it.
>
> The main usecases within that tiny niche are federation and mirroring
the data from another monitoring system.

The Prometheus Go client also indicates a similar motivation: https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#NewMetricWithTimestamp.

The OpenMetrics project also recommends against exporting timestamps: https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#exposing-timestamps.

As such, I think we are not a niche use case, and we should not add timestamps to our metrics.

1. Run any 2.x version of Prometheus (I used 2.31.0) along with the following scrape config:
```yaml
  - job_name: broker
    honor_timestamps: true
    scrape_interval: 30s
    scrape_timeout: 10s
    metrics_path: /metrics
    scheme: http
    follow_redirects: true
    static_configs:
      - targets: ["localhost:8080"]
```
2. Start pulsar standalone on the same machine. I used a recently compiled version of master.
3. Publish messages to a topic.
4. Observe `pulsar_in_messages_total` metric for the topic in the prometheus UI (localhost:9090)
5. Stop the producer.
6. Unload the topic from the broker.
7. Optionally, `curl` the metrics endpoint to verify that the topic’s `pulsar_in_messages_total` metric is no longer reported.
8. Watch the metrics get reported in prometheus for 5 additional minutes.

When you set `honor_timestamps: false`, the metric stops getting reported right after the topic is unloaded, which is the desired behavior.

* Remove all timestamps from metrics
* Fix affected tests and test files (some of those tests were in the proxy and the function worker, but no code was changed for those modules)

This change is accompanied by updated tests.

This is technically a breaking change to the metrics, though I would consider it a bug fix at this point. I will discuss it on the mailing list to ensure it gets proper visibility.

Given how frequently Pulsar changes which metrics are exposed between each scrape, I think this is an important fix that should be cherry picked to older release branches. Technically, we can avoid cherry picking this change if we advise users to set `honor_timestamps: false`. However, I think it is better to just remove them.

- [x] `doc-not-needed`

(cherry picked from commit 0bbc4e1)
…19412)

Signed-off-by: Zixuan Liu <nodeces@gmail.com>
(cherry picked from commit 016e7f0)
Fixes: apache#19478

### Motivation

See issue for additional context. Essentially, we are doing a shallow clone when we needed a deep clone. The consequence is leaked labels, annotations, and tolerations.

### Modifications

* Add a `deepClone` method to the `BasicKubernetesManifestCustomizer.RuntimeOpts` method. Note that this method is not technically a deep clone for the k8s objects. However, based on the way we "merge" these objects, it is sufficient to copy references to the objects.

### Verifying this change

Added a test that fails before the change and passes afterwards.

### Documentation

- [x] `doc-not-needed`

This is an internal bug fix. No docs needed.

### Matching PR in forked repository

PR in forked repository: michaeljmarshall#27

(cherry picked from commit 0205148)
Fixes apache#19431

`authenticationData` is already `volatile`. We use `originalAuthData` when set, so we should match the style. In apache#19431, I proposed that we find a way to not use `volatile`. I still think this might be a "better" approach, but it will be a larger change, and since we already use `volatile` for `authenticationData`, I think this is the right change for now.

It's possible that this is not a bug, given that the `originalAuthData` does not change frequently. However, we always want to use up to date values for authorization.

* Add `volatile` keyword to `ServerCnx#originalAuthData`.

This change is a trivial rework / code cleanup without any test coverage.

- [x] `doc-not-needed`

PR in forked repository: skipping test in fork.

(cherry picked from commit c4c1744)
(cherry picked from commit e1d9941)
I broke all release branches when I cherry picked 2847dd1 to them. This change takes some of the underlying logic from apache#19409, without taking the async logic.

* Make changes to `ServerCnx` to make tests pass

Tests are currently failing, so passing tests will show that this solution is correct.

- [x] `doc-not-needed`

(cherry picked from commit 8246da2)
(cherry picked from commit 15e4198)
shibd and others added 27 commits July 18, 2023 15:50
…egistered if there has no message was sent (apache#20888)

Motivation: In the replication scenario, we want to produce messages on the native cluster and consume messages on the remote cluster, the producer and consumer both use a same schema, but the consumer cannot be registered if there has no messages in the topic yet.The root cause is that for the remote cluster, there is a producer who has been registered with `AUTO_PRODUCE_BYTES` schema, so there is no schema to check the compatibility.

Modifications: If there is no schema and only the replicator producer was registered, skip the compatibility check.
(cherry picked from commit 9be0b52)
- The task `trim ledgers` runs in the thread `BkMainThreadPool.choose(ledgerName)`
- The task `write entries to BK` runs in the thread `BkMainThreadPool.choose(ledgerId)`

So the two tasks above may run concurrently/

The task `trim ledgers` work as the flow below:
- find the ledgers which are no longer to read, the result is `{Ledgers before the slowest read}`.
- check if the `{Ledgers before the slowest read}` is out of retention policy, the result is `{Ledgers to be deleted}`.
  - if the create time of the ledger is lower than the earliest retention time, mark it should be deleted
  - if after deleting this ledger, the rest ledgers are still larger than the retention size, mark it should be deleted
- delete the`{Ledgers to be deleted}`

**(Highlight)** There is a scenario that causes the task `trim ledgers` did  discontinuous ledger deletion, resulting consume messages discontinuous:
- context:
  - ledgers: `[{id=1, size=100}, {id=2,size=100}]`
  - retention size: 150
  - no cursor there
- Check `ledger 1`, skip by retention check `(200 - 100) < 150`
- One in-flight writing is finished, the `calculateTotalSizeWrited()` would return `300` now.
- Check `ledger 2`, retention check `(300 - 100) > 150`, mark the ledger-2 should be deleted.
- Delete the `ledger 2`.
- Create a new consumer. It will receive messages from `[ledger-1, ledegr-3]`, but the `ledger-2` will be skipped.

Once the retention constraint has been met, break the loop.

(cherry picked from commit 782e91f)
The C++ and Python clients are not maintained in the main repo now.
Fixes: apache#20997

Update the expired certs to get tests passing.

* Update all certs. See `README.md` in files for detailed steps.

This change is covered by tests.

- [x] `doc-not-needed`

(cherry picked from commit d6734b7)
…r. (apache#20880)

Main Issue: apache#20851
### Motivation
When the Proto version does not allow us to send TcClientConnectRequest to the broker, we should add a log to debug it.

### Modifications

Add a waining log.
(cherry picked from commit d06cda6)
(cherry picked from commit c644849)
…pache#21035)

Motivation: After [PIP-118: reconnect broker when ZooKeeper session expires](apache#13341), the Broker will not shut down after losing the connection of the local metadata store in the default configuration. However, before the ZK client is reconnected, the events of BK online and offline are lost, resulting in incorrect BK info in the memory. You can reproduce the issue by the test `BkEnsemblesChaosTest. testBookieInfoIsCorrectEvenIfLostNotificationDueToZKClientReconnect`(90% probability of reproduce of the issue, run it again if the issue does not occur)

Modifications: Refresh BK info in memory after the ZK client is reconnected.
(cherry picked from commit db20035)
…ageId read reaches lastReadId (apache#20988)

(cherry picked from commit 9e2195c)
@github-actions
Copy link
Copy Markdown

@horizonzy Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@horizonzy horizonzy closed this Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.