Skip to content

Conversation

@tjiuming
Copy link
Contributor

Motivation

This PR is fully based on #17618, but the origin PR has a memory leak issue, so it was reverted.
This PR is supposed to fix the memory leak issue.

Why the origin PR has a memory leak issue:
In #17618, PrometheusMetricsGenerator.java:

    public static void generate(PulsarService pulsar, boolean includeTopicMetrics, boolean includeConsumerMetrics,
                                boolean includeProducerMetrics, boolean splitTopicAndPartitionIndexLabel,
                                OutputStream out,
                                List<PrometheusRawMetricsProvider> metricsProviders)
            throws IOException {
        ByteBuf buf = ByteBufAllocator.DEFAULT.heapBuffer();
        boolean exceptionHappens = false;
        //Used in namespace/topic and transaction aggregators as share metric names
        PrometheusMetricStreams metricStreams = new PrometheusMetricStreams();
        try {
            SimpleTextOutputStream stream = new SimpleTextOutputStream(buf);

            generateSystemMetrics(stream, pulsar.getConfiguration().getClusterName());

            NamespaceStatsAggregator.generate(pulsar, includeTopicMetrics, includeConsumerMetrics,
                    includeProducerMetrics, splitTopicAndPartitionIndexLabel, metricStreams);

            if (pulsar.getWorkerServiceOpt().isPresent()) {
                pulsar.getWorkerService().generateFunctionsStats(stream);
            }

            if (pulsar.getConfiguration().isTransactionCoordinatorEnabled()) {
                TransactionAggregator.generate(pulsar, metricStreams, includeTopicMetrics);
            }

            metricStreams.flushAllToStream(stream);

            generateBrokerBasicMetrics(pulsar, stream);

            generateManagedLedgerBookieClientMetrics(pulsar, stream);

            if (metricsProviders != null) {
                for (PrometheusRawMetricsProvider metricsProvider : metricsProviders) {
                    metricsProvider.generate(stream);
                }
            }
            out.write(buf.array(), buf.arrayOffset(), buf.readableBytes());
        } finally {
            //release all the metrics buffers
            metricStreams.releaseAll();
            //if exception happens, release buffer
            if (exceptionHappens) {
                buf.release();
            }
        }
    }

in the finally scope, call buf.release() when exceptionHappens == true. But the initialize value of exceptionHappens is false, and it never updated to true anywhere. So the memory leak issue happens.
It should be a small mistake in the check pick process.

Documentation

  • doc-required
    (Your PR needs to update docs and you will update later)

  • doc-not-needed
    (Please explain why)

  • doc
    (Your PR contains doc changes)

  • doc-complete
    (Docs have been already added)

@marksilcox
Copy link
Contributor

@tjiuming thanks for fixing this, struggling to find time at the moment

out.write(buf.array(), buf.arrayOffset(), buf.readableBytes());
} finally {
//release all the metrics buffers
metricStreams.releaseAll();
Copy link
Contributor

@codelipenghui codelipenghui Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjiuming I noticed you had removed the exceptionHappens, if we don't have the check, will we release the buf multiple times?

From the master branch:

} catch (Throwable t) {
            exceptionHappens = true;
            throw t;
        } finally {
            //release all the metrics buffers
            metricStreams.releaseAll();
            //if exception happens, release buffer
            if (exceptionHappens) {
                buf.release();
            }
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codelipenghui No, this check is for #14453, the feature didn't check-pick to 2.9

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjiuming will the buf can get a chance to be released multiple times? Since the metricStreams.releaseAll() will also get a chance to release the buf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codelipenghui buf and metricStreams only released in the finally scope, and metricStreams.releaseAll() will not release buf

Copy link
Contributor

@asafm asafm Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for missing out on this bug. I wasn't aware that PrometheusMetricsGenerator doesn't look the same in master and in apache-2.9, specifically the generate0, and the performance improvement was backported, so it confused me as well.

The fix looks solid, as this buffer is not returned as it did in generate0 so it should always be released.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjiuming Oh, I see. They are different buffers.

@codelipenghui codelipenghui merged commit 976a318 into apache:branch-2.9 Sep 29, 2022
@tjiuming tjiuming deleted the dev/group_prometheus_metrics branch November 1, 2022 17:38
@congbobo184 congbobo184 added cherry-picked/branch-2.9 Archived: 2.9 is end of life release/2.9.4 labels Nov 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked/branch-2.9 Archived: 2.9 is end of life release/2.9.4

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants