Skip to content

Conversation

@heesung-sohn
Copy link
Contributor

Master Issue: #15207

Motivation

Pulsar Server Default GC Update

As Java 17 will be officially required for pulsar-2.11+, it would be worth to revisit Pulsar’s GC default configurations
and consider the newer GC, ZGC or ShenandoahGC as the new default.

ZGC:

One could easily find ZGC intro articles[1][2][3]. I personally found the following persuasive.

“The primary goals of ZGC are low latency, scalability, and ease of use. To achieve this, ZGC allows a Java application to continue running while it performs all garbage collection operations except thread stack scanning. It scales from a few hundred MB to TB-size Java heaps, while consistently maintaining very low pause times—typically within 2 ms.
The implications of predictably low pause times could be profound for both application developers and system architects. Developers will no longer need to worry about designing elaborate ways to avoid garbage collection pauses. And system architects will not require specialized GC performance tuning expertise to achieve the dependably low pause times that are very important for so many use cases. This makes ZGC a good fit for applications that require large amounts of memory, such as with big data. However, ZGC is also a good candidate for smaller heaps that require predictable and extremely low pause times.”[3]

The less settings, the better

One might further tune G1GC flags to outperform ZGC, but our goal is to make the default GC perform well enough to cover general use-cases — it should be rare for users to further tune GC flags. It is promoted that ZGC requires less tunings.

  1. ZGC is designed to guarantee low pause time
  2. ZGC scales well independent of application heap-size, hundred MBs to TBs

ShenandoahGC:

ShenandoahGC shares the similar designs to ZGC, promoting low pause time as well. Nonetheless, because ShenandoahGC is not officially supported by Oracle, it is unavailable in Oracle built OpenJdks[4][5]. Hence, between ShenandoahGC and ZGC, Pulsar probably needs to take a more available option, ZGC, also considering the future support.
Still, individual Pulsar users can override this default GC, depending on their use-case and OpenJdk versions.

[1] https://wiki.openjdk.java.net/display/zgc/Main

[2] https://developers.redhat.com/articles/2021/11/02/how-choose-best-java-garbage-collector

[3] https://blogs.oracle.com/javamagazine/post/understanding-the-jdks-new-superfast-garbage-collectors

[4] https://developers.redhat.com/blog/2019/04/19/not-all-openjdk-12-builds-include-shenandoah-heres-why

[5] https://bugs.openjdk.java.net/browse/JDK-8215030

Performance Tests:

To confirm the performance benefits, we conducted the open-messaging benchmark.
In this test, we skipped journalings to give more pressures on JVM GCs.

Max Throughput Test

  • Workload : 1-topic-100-partitions-1kb-4p-4c-2000k
Java11 G1GC Java17 G1GC Java17 ZGC Java17 ShenandoahGC
Avg Pub rate(mb/s) 1784 1703 1711 1618
Avg Cons rate(mb/s) 1778 1701 1711 1619
Avg Backlog cnt(k) 3159 121 30 64
Avg Pub latency(ms) 286 299 296 294

Latency Test

  • Workload: 100-partitions-1kb-4p-4c-500k
Java11 G1GC Java17 G1GC Java17 ZGC Java17 ShenandoahGC
P999 Pub latency(ms) 1.8 2.1 2.1 2.0
P9999 Pub latency(ms) 37.7 32.9 20.2 37

Test Result Analysis

ZGC performs well

From the Max Throughput Test, ZGC performed well by keeping the lowest backlogs, avg 30k
while maintaining avg 1711mb/s throughput.

From the Latency Test, although the latency difference is not very significant,
ZGC showed the lowest p9999 Pub latency, 20.2ms.

Modifications

Pulsar Default GC Flag Update Proposal

Before:

https://github.com/apache/pulsar/blob/master/conf/pulsar_env.sh#L48

  • -XX:+UseG1GC
  • -XX:MaxGCPauseMillis=10
  • -XX:+ParallelRefProcEnabled
  • -XX:+UnlockExperimentalVMOptions
  • -XX:+DoEscapeAnalysis
  • -XX:ParallelGCThreads=32
  • -XX:ConcGCThreads=32
  • -XX:G1NewSizePercent=50
  • -XX:+DisableExplicitGC

After:

  • -XX:+UseZGC
  • -XX:+PerfDisableSharedMem
  • -XX:+AlwaysPreTouch

Update Details

Verifying this change

  • Make sure that the change passes the CI checks.

This change is already covered by existing tests, such as all CIs

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API: (yes / no)
  • The schema: (yes / no / don't know)
  • The default values of configurations: (yes / no)
  • The wire protocol: (yes / no)
  • The rest endpoints: (yes / no)
  • The admin cli options: (yes / no)
  • Anything that affects deployment: (yes / no / don't know)

Documentation

Check the box below or label this PR directly.

Need to update docs?

  • doc-required
    (Your PR needs to update docs and you will update later)

  • no-need-doc
    This is pulsar's internal default GC setting change, but we probably need to mention this in the release note.

  • doc
    (Your PR contains doc changes)

  • doc-added
    (Docs have been already added)

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label May 24, 2022
@lhotari
Copy link
Member

lhotari commented May 25, 2022

some tests seem to be hitting https://bugs.openjdk.java.net/browse/JDK-8257534 with ZGC.

example failure

  Error:  Tests run: 4, Failures: 1, Errors: 0, Skipped: 3, Time elapsed: 0.17 s <<< FAILURE! - in org.apache.pulsar.client.impl.ClientCnxTest
  Error:  testClientCnxTimeout(org.apache.pulsar.client.impl.ClientCnxTest)  Time elapsed: 0.022 s  <<< FAILURE!
  java.lang.NoClassDefFoundError: Could not initialize class org.apache.pulsar.common.protocol.Commands
  	at org.apache.pulsar.client.impl.ClientCnx.<init>(ClientCnx.java:209)
  	at org.apache.pulsar.client.impl.ClientCnxTest.testClientCnxTimeout(ClientCnxTest.java:55)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
  	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
  	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:45)
  	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:73)
  	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
  	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
  	at java.base/java.lang.Thread.run(Thread.java:833)

"It may be that the initialization failed due to OOME but is being reported as a NoClassDefFoundError. " (comment link)

@nicoloboschi has been fixing some memory leaks in the tests. There are open PRs #15513 and #15638.

@dave2wave
Copy link
Member

Would you please describe the OpenMessaging Benchmark setup you used. Driver files, workload, and terraform.tfvars. I would like to confirm your results.

@heesung-sohn
Copy link
Contributor Author

heesung-sohn commented May 25, 2022

"It may be that the initialization failed due to OOME but is being reported as a NoClassDefFoundError. " (comment link)

@nicoloboschi has been fixing some memory leaks in the tests. There are open PRs #15513 and #15638.

2022-05-24T21:39:34.4240859Z [INFO] Running org.apache.pulsar.client.impl.MessageTest
2022-05-24T21:39:34.5256249Z [ERROR] Tests run: 18, Failures: 1, Errors: 0, Skipped: 8, Time elapsed: 2.146 s <<< FAILURE! - in org.apache.pulsar.client.impl.MessageImplTest
2022-05-24T21:39:34.5259972Z [ERROR] testMessageBrokerAndEntryMetadataTimestampMissed(org.apache.pulsar.client.impl.MessageImplTest)  Time elapsed: 0.009 s  <<< FAILURE!
2022-05-24T21:39:34.5260648Z java.lang.OutOfMemoryError: test
2022-05-24T21:39:34.5261831Z 	at org.apache.bookkeeper.common.allocator.impl.ByteBufAllocatorImpl.buffer(ByteBufAllocatorImpl.java:134)
2022-05-24T21:39:34.5262751Z 	at org.apache.pulsar.client.impl.MessageImplTest.testMessageBrokerAndEntryMetadataTimestampMissed(MessageImplTest.java:433)
2022-05-24T21:39:34.5374488Z 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2022-05-24T21:39:34.5375408Z 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
2022-05-24T21:39:34.5376465Z 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-05-24T21:39:34.5376986Z 	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
2022-05-24T21:39:34.5377450Z 	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
2022-05-24T21:39:34.5377953Z 	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:45)
2022-05-24T21:39:34.5378410Z 	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:73)
2022-05-24T21:39:34.5378855Z 	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
2022-05-24T21:39:34.5379272Z 	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
2022-05-24T21:39:34.5379714Z 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
2022-05-24T21:39:34.5380203Z 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
2022-05-24T21:39:34.5380586Z 	at java.base/java.lang.Thread.run(Thread.java:833)

I also see the above oom from one of the attempts in example failure. It appears that G1GC is more stable to run the Pulsar unit tests. Do we think we should use G1GC for the unit tests?

@heesung-sohn
Copy link
Contributor Author

heesung-sohn commented May 25, 2022

Would you please describe the OpenMessaging Benchmark setup you used. Driver files, workload, and terraform.tfvars. I would like to confirm your results.

Hi, dave2wave,

Basically, I used the followings for the test. I see some trivial setup failures from the current OpenMessaging Benchmark tool, so I locally made some changes to make it work. I probably need to raise a PR to update the OpenMessaging Benchmark tool.

# workload:  1-topic-100-partitions-1kb-4p-4c-2000k, 100-partitions-1kb-4p-4c-500k
# pulsar version: branch-2.9

# Garbage collection options

# g1gc
PULSAR_GC=" -XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=12 -XX:ConcGCThreads=12
 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB"
PULSAR_GC="${PULSAR_GC} -XX:+PerfDisableSharedMem -XX:+AlwaysPreTouch -XX:-UseBiasedLocking"

# zgc
#PULSAR_GC="-XX:+UseZGC -XX:+PerfDisableSharedMem -XX:+AlwaysPreTouch"

# ShenandoahGC
#PULSAR_GC="-XX:+UseShenandoahGC -XX:+PerfDisableSharedMem -XX:+AlwaysPreTouch"

 # Extra options to be passed to the jvm
-PULSAR_EXTRA_OPTS="${PULSAR_EXTRA_OPTS} ${PULSAR_MEM} ${PULSAR_GC} -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.maxCapacity.default=1000 -Dio.netty.recycler.linkCa
pacity=1024"
+PULSAR_EXTRA_OPTS="${PULSAR_EXTRA_OPTS} -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.maxCapacity.default=1000 -Dio.netty.recycler.linkCapacity=1024"

@lhotari
Copy link
Member

lhotari commented Jun 2, 2022

I also see the above oom from one of the attempts in example failure. It appears that G1GC is more stable to run the Pulsar unit tests. Do we think we should use G1GC for the unit tests?

@heesung-sn It should be fine to use ZGC also for unit tests. There has been a bad test that has caused those issues. It will be fixed by #15911 .

@heesung-sohn
Copy link
Contributor Author

We also need to review this PR, #15692 as ZGC will expose this Jvm metrics bug.

@heesung-sohn
Copy link
Contributor Author

I see the following test failures. I am trying to debug them from my local env by (heesung-sohn#3)

test failure logs

[INFO] Tests run: 266, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 439.468 s - in org.apache.pulsar.client.api.ConsumerBatchReceiveTest 1246
 Error: The operation was canceled.




  Error:  Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M3:test (default-test) on project pulsar-presto-connector-original: There are test failures.
  Error: 
  Error:  Please refer to /home/runner/work/pulsar/pulsar/pulsar-sql/presto-pulsar/target/surefire-reports for the individual test results.
  Error:  Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
  Error:  ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
  Error:  Command was /bin/sh -c cd /home/runner/work/pulsar/pulsar/pulsar-sql/presto-pulsar && /usr/lib/jvm/temurin-17-jdk-amd64/bin/java -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -XX:+ExitOnOutOfMemoryError -Xmx1G -XX:+UseZGC -Dpulsar.allocator.pooled=true -Dpulsar.allocator.leak_detection=Advanced -Dpulsar.allocator.exit_on_oom=false -Dpulsar.allocator.out_of_memory_policy=FallbackToHeap -Dio.netty.tryReflectionSetAccessible=true --add-opens java.base/jdk.internal.loader=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.io=ALL-UNNAMED --add-opens java.base/sun.net=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED -jar /home/runner/work/pulsar/pulsar/pulsar-sql/presto-pulsar/target/surefire/surefirebooter17710588017893554135.jar /home/runner/work/pulsar/pulsar/pulsar-sql/presto-pulsar/target/surefire 2022-06-08T17-48-50_487-jvmRun4 surefire2791316321757151587tmp surefire_312249355835236009864tmp
  Error:  Error occurred in starting fork, check output in log
  Error:  Process Exit Code: 3
  Error:  Crashed tests:
  Error:  org.apache.pulsar.sql.presto.TestPulsarRecordCursor
  Error:  org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
  Error:  Command was /bin/sh -c cd /home/runner/work/pulsar/pulsar/pulsar-sql/presto-pulsar && /usr/lib/jvm/temurin-17-jdk-amd64/bin/java -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -XX:+ExitOnOutOfMemoryError -Xmx1G -XX:+UseZGC -Dpulsar.allocator.pooled=true -Dpulsar.allocator.leak_detection=Advanced -Dpulsar.allocator.exit_on_oom=false -Dpulsar.allocator.out_of_memory_policy=FallbackToHeap -Dio.netty.tryReflectionSetAccessible=true --add-opens java.base/jdk.internal.loader=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.io=ALL-UNNAMED --add-opens java.base/sun.net=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED -jar /home/runner/work/pulsar/pulsar/pulsar-sql/presto-pulsar/target/surefire/surefirebooter17710588017893554135.jar /home/runner/work/pulsar/pulsar/pulsar-sql/presto-pulsar/target/surefire 2022-06-08T17-48-50_487-jvmRun4 surefire2791316321757151587tmp surefire_312249355835236009864tmp
  Error:  Error occurred in starting fork, check output in log
  Error:  Process Exit Code: 3
  Error:  Crashed tests:
  Error:  org.apache.pulsar.sql.presto.TestPulsarRecordCursor
  Error:  	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:511)
  Error:  	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkOnceMultiple(ForkStarter.java:383)
  Error:  	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298)
  Error:  	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:247)
  Error:  	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1161)
  Error:  	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1002)
  Error:  	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:848)
  Error:  	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)
  Error:  	at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute(MojoExecutor.java:301)
  Error:  	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:211)
  Error:  	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:165)
  Error:  	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:157)
  Error:  	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:121)
  Error:  	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
  Error:  	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56)
  Error:  	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:127)
  Error:  	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:294)
  Error:  	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192)
  Error:  	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:105)
  Error:  	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:960)
  Error:  	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:293)
  Error:  	at org.apache.maven.cli.MavenCli.main(MavenCli.java:196)
  Error:  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  Error:  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  Error:  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  Error:  	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
  Error:  	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:282)
  Error:  	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:225)
  Error:  	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:406)
  Error:  	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:347)
  Error:  Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
  Error:  Command was /bin/sh -c cd /home/runner/work/pulsar/pulsar/pulsar-sql/presto-pulsar && /usr/lib/jvm/temurin-17-jdk-amd64/bin/java -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -XX:+ExitOnOutOfMemoryError -Xmx1G -XX:+UseZGC -Dpulsar.allocator.pooled=true -Dpulsar.allocator.leak_detection=Advanced -Dpulsar.allocator.exit_on_oom=false -Dpulsar.allocator.out_of_memory_policy=FallbackToHeap -Dio.netty.tryReflectionSetAccessible=true --add-opens java.base/jdk.internal.loader=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.io=ALL-UNNAMED --add-opens java.base/sun.net=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED -jar /home/runner/work/pulsar/pulsar/pulsar-sql/presto-pulsar/target/surefire/surefirebooter17710588017893554135.jar /home/runner/work/pulsar/pulsar/pulsar-sql/presto-pulsar/target/surefire 2022-06-08T17-48-50_487-jvmRun4 surefire2791316321757151587tmp surefire_312249355835236009864tmp
  Error:  Error occurred in starting fork, check output in log
  Error:  Process Exit Code: 3
  Error:  Crashed tests:
  Error:  org.apache.pulsar.sql.presto.TestPulsarRecordCursor
  Error:  	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:670)
  Error:  	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:116)
  Error:  	at org.apache.maven.plugin.surefire.booterclient.ForkStarter$1.call(ForkStarter.java:372)
  Error:  	at org.apache.maven.plugin.surefire.booterclient.ForkStarter$1.call(ForkStarter.java:348)
  Error:  	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
  Error:  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
  Error:  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
  Error:  	at java.base/java.lang.Thread.run(Thread.java:833)
  Error:  -> [Help 1]
  Error: 
  Error:  To see the full stack trace of the errors, re-run Maven with the -e switch.
  Error:  Re-run Maven using the -X switch to enable full debug logging.
  Error: 
  Error:  For more information about the errors and possible solutions, please read the following articles:
  Error:  [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
  Error: 
  Error:  After correcting the problems, you can resume the build with the command
  Error:    mvn <args> -rf :pulsar-presto-connector-original
  Warning: Command failed. Attempt 2/3:

Error: Tests run: 6, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: 513.638 s <<< FAILURE! - in org.apache.pulsar.sql.presto.TestPulsarRecordCursor
6972
 Error: testTopics(org.apache.pulsar.sql.presto.TestPulsarRecordCursor) Time elapsed: 201.803 s <<< FAILURE!
6973
 java.lang.AssertionError: expected [105995] but found [105722]
6974
 	at org.testng.Assert.fail(Assert.java:99)

@heesung-sohn heesung-sohn force-pushed the default-gc branch 2 times, most recently from 57f7628 to 546bedd Compare June 10, 2022 03:30
@heesung-sohn
Copy link
Contributor Author

heesung-sohn commented Jun 10, 2022

Raised PR to fix the test issues: #16011.

With this test fix PR, this ZGC update passed the CI tests(from my local repo) https://github.com/heesung-sn/pulsar/runs/6834730213

@merlimat merlimat merged commit 3eadbc3 into apache:master Jun 14, 2022
@yangl
Copy link
Contributor

yangl commented Dec 13, 2022

hi @heesung-sn, when add the -XX:+PerfDisableSharedMem ,which makes the jps jstat commands unusable. What is the purpose of turning on this parameter in the first place?

@heesung-sohn
Copy link
Contributor Author

hi @heesung-sn, when add the -XX:+PerfDisableSharedMem ,which makes the jps jstat commands unusable. What is the purpose of turning on this parameter in the first place?

As I explained in the above, this -XX:+PerfDisableSharedMem flag is to fix the possible high GC pause latencies due to IO blocking when the garbage collector tries to write to /tmp (hsperfdata).

I think this SO discussion is a good reference too.
https://stackoverflow.com/questions/66806890/is-there-any-performance-downsides-to-using-the-xxperfdisablesharedmem-jvm

We can collect performance counts by jcmd <pid> PerfCounter.print . Of course, users can override this default setting if they want jps and jstat.

@heesung-sohn heesung-sohn deleted the default-gc branch April 2, 2024 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants