Add common alias cores (x) for grdlandmask by weiji14 · Pull Request #2944 · GenericMappingTools/pygmt

weiji14 · 2024-01-02T07:53:30Z

Description of proposed changes

Add alias cores (-x) for grdlandmask, and set limit of cores=2 for test_grdlandmask_no_outgrid to make benchmark execution deviate less.

Preview at https://pygmt-dev--2944.org.readthedocs.build/en/2944/api/generated/pygmt.grdlandmask.html

Extends #1423, may address #2942 partially.

Reminders

Run make format and make check to make sure the code follows the style guide.
Add tests for new features or tests that would have caught the bug that you're fixing.
Add new public functions/methods/classes to doc/api/index.rst.
Write detailed docstrings for all functions/methods.
If wrapping a new module, open a 'Wrap new GMT module' issue and submit reasonably-sized PRs.
If adding new functionality, add an example to docstrings or tutorials.
Use underscores (not hyphens) in names of Python files and directories.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash commands are:

/format: automatically format and lint the code
/test-gmt-dev: run full tests on the latest GMT development version

weiji14 · 2024-01-02T07:59:33Z

pygmt/tests/test_grdlandmask.py

    Test grdlandmask with no set outgrid.
    """
-    result = grdlandmask(spacing=1, region=[125, 130, 30, 35])
+    result = grdlandmask(spacing=1, region=[125, 130, 30, 35], cores=2)


The Linux runner for GitHub Actions has 2 CPU cores according to https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources, but not sure if we should set cores to 2 or 1 for consistent benchmarking.

Using two cores is faster and can help check the OpenMP feature in GMT, so I prefer to use two cores if possible.

BTW, the macOS runner for GitHub Actions has 3 cores, so instead of cores=2, cores=True will use all available cores.

Is only 1 core being used if we don't set cores, or use cores=None? I'm reading https://docs.generic-mapping-tools.org/6.4/gmt.html#core-full which says [default is to use all available cores], but is that only with cores=True (i.e. -x in GMT)?

The benchmark.yml workflow is running on Linux, but I guess we could set cores=True since this test would also run on macOS in ci_test.yml. My question was more around whether multi-threading with multiple-cores (2 or more) would provide a consistent benchmark compared to using only 1 core, but I guess we need to merge this PR to find out!

Is only 1 core being used if we don't set cores, or use cores=None? I'm reading https://docs.generic-mapping-tools.org/6.4/gmt.html#core-full which says [default is to use all available cores], but is that only with cores=True (i.e. -x in GMT)?

Yes.

Is only 1 core being used if we don't set cores, or use cores=None? I'm reading https://docs.generic-mapping-tools.org/6.4/gmt.html#core-full which says [default is to use all available cores], but is that only with cores=True (i.e. -x in GMT)?

Yes.

Actually it uses all available cores if -x is not used.

$ gmt grdlandmask -R0/10/0/10 -Gabc.nc -I1/1 -Vi grdlandmask [INFORMATION]: Enable all available threads (up to xxx)

Maybe set cores=1 and see what the flame graph diff shows?

The test_grdlandmask_no_outgrid benchmark speedup went from 3x to 9.6x 🤣 It might be that there is some overhead with using threading (2 cores) vs no threading (1 core). Here's the flame graph diff from https://codspeed.io/GenericMappingTools/pygmt/branches/limit-cores again:

The extra gomp_team_barrier_wait_* calls are still there, but also some other stuff (in blue).

This reminds me of the warnings message I saw when running the tests in random order (see https://github.com/GenericMappingTools/pygmt/actions/runs/7396374248/job/20121388367?pr=2936):

pygmt/tests/test_session_management.py::test_session_multiprocessing pygmt/tests/test_session_management.py::test_session_multiprocessing /home/runner/micromamba/envs/pygmt/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=2433) is multi-threaded, use of fork() may lead to deadlocks in the child. self.pid = os.fork()

There are some related discussions at https://discuss.python.org/t/concerns-regarding-deprecation-of-fork-with-alive-threads/33555 and related PR at python/cpython#100228.

os.fork() in a multi-threaded application is a likely source of deadlocks on many platforms. We should raise a warning when people call os.fork() from a process that we know has other threads running.

Not sure if it's related to the issue here.

pygmt/tests/test_session_management.py::test_session_multiprocessing pygmt/tests/test_session_management.py::test_session_multiprocessing /home/runner/micromamba/envs/pygmt/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=2433) is multi-threaded, use of fork() may lead to deadlocks in the child. self.pid = os.fork()

That test_session_multiprocessing function is only running basemap and savefig right? It shouldn't have any multi-threading going on, so not sure what's happening.

I tried running some benchmarks using cores from -1 (all cores) to 0 (no-multithreading) to 16 (all cores on my laptop). This was running grdlandmask with a spacing of 0.05 arc degrees. The results do vary between runs, but in general, more cores means less time:

Code to reproduce:

import time import pandas as pd import pygmt import tqdm timings = {} for cores in tqdm.tqdm(range(-1, 16)): tic = time.perf_counter() pygmt.grdlandmask(spacing="0.05d", region=[0, 180, 0, 90], cores=cores) toc = time.perf_counter() timings[cores] = toc - tic df = pd.DataFrame.from_dict(data=timings, orient="index", columns=["time"]) # %% fig = pygmt.Figure() region = pygmt.info(data=df.reset_index(), per_column=True, spacing=0.05) fig.plot( x=df.index, y=df.time, region=region, pen="thick", frame=[ "WSne+tBenchmark grdlandmask", "xaf+lCores", "yaf+lTime (s), lower is better", ], ) fig.savefig("benchmark_grdlandmask.png") fig.show()

If I change the spacing from 0.05 arc degrees to 1 arc degree, more cores actually take more time:

For that test_grdlandmask_no_outgrid unit test which is using spacing=1 (i.e. 1 arc degree), I think we should just use cores=2, since cores=3 might not be much faster on such a low resolution. As long as we are benchmarking with a fixed number of core counts, it should be ok?

Ok, back to using cores=2 in 97637d4.

codspeed-hq · 2024-01-02T08:04:55Z

CodSpeed Performance Report

Merging #2944 will improve performances by ×3

_{Comparing limit-cores (97637d4) with main (3f9d14b)}

Summary

⚡ 3 improvements
✅ 61 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`limit-cores`	Change
⚡	`test_grdlandmask_no_outgrid`	2,473.4 ms	819.2 ms	×3
⚡	`test_grdsample_dataarray_out`	104.1 ms	94.5 ms	+10.11%
⚡	`test_sph2grd_no_outgrid`	4.7 s	2.2 s	×2.1

seisman · 2024-01-02T08:28:20Z

CodSpeed Performance Report

Merging #2944 will improve performances by ×3

Comparing limit-cores (137a04f) with main (88ab1ca)

Summary

⚡ 3 improvements ✅ 61 untouched benchmarks

Benchmarks breakdown

Benchmark main limit-cores Change
⚡ test_grdlandmask_no_outgrid 2,531.2 ms 842.1 ms ×3
⚡ test_grdsample_dataarray_out 106.3 ms 91.6 ms +16.05%
⚡ test_sph2grd_no_outgrid 4.6 s 2.3 s ×2

The benchmark report is quite confusing. It seems that test_sph2grd_no_outgrid also speedup by 2 times. Does this mean the -x option is a session-level option (similar to GMT's configuration settings) and affect all tests?

weiji14 · 2024-01-02T08:36:17Z

The benchmark report is quite confusing. It seems that test_sph2grd_no_outgrid also speedup by 2 times. Does this mean the -x option is a session-level option (similar to GMT's configuration settings) and affect all tests?

Yeah, it seems to confirm the hypothesis in #2942 (comment) though. Does the setting affect GMT_MAX_CORES somehow?

weiji14 · 2024-01-04T01:22:36Z

pygmt/tests/test_grdlandmask.py

    Test grdlandmask with no set outgrid.
    """
-    result = grdlandmask(spacing=1, region=[125, 130, 30, 35])
+    result = grdlandmask(spacing=1, region=[125, 130, 30, 35], cores=1, verbose="i")


Hmm, the debug info at https://github.com/GenericMappingTools/pygmt/actions/runs/7403072686/job/20142175894?pr=2944#step:8:111 is showing that there are 4 threads on GitHub Actions??

grdlandmask [INFORMATION]: Enable 1 threads of 4 available grdlandmask [INFORMATION]: Selected ice front line as Antarctica coastline grdlandmask [INFORMATION]: GSHHG version 2.3.7 grdlandmask [INFORMATION]: Derived from World Vector Shoreline, CIA WDB-II, and Atlas of the Cryosphere grdlandmask [INFORMATION]: Processed by Paul Wessel and Walter H. F. Smith, 1994-2017 grdlandmask [INFORMATION]: Nodes in the oceans will be set to 0 grdlandmask [INFORMATION]: Nodes on land will be set to 1 grdlandmask [INFORMATION]: Nodes in lakes will be set to 0 grdlandmask [INFORMATION]: Nodes in islands will be set to 1 grdlandmask [INFORMATION]: Nodes in ponds will be set to 0 grdlandmask [INFORMATION]: Apply linear scale to avoid wrap-around: -Jx100id; this temporarily changes the domain grdlandmask [INFORMATION]: Writing grid to file /tmp/pygmt-0c0skk20.nc grdlandmask [INFORMATION]: Level 0 set for 33 nodes grdlandmask [INFORMATION]: Level 1 set for 3 nodes grdlandmask [INFORMATION]: Enable 1 threads of 4 available grdlandmask [INFORMATION]: Selected ice front line as Antarctica coastline grdlandmask [INFORMATION]: GSHHG version 2.3.7 grdlandmask [INFORMATION]: Derived from World Vector Shoreline, CIA WDB-II, and Atlas of the Cryosphere grdlandmask [INFORMATION]: Processed by Paul Wessel and Walter H. F. Smith, 1994-2017 grdlandmask [INFORMATION]: Nodes in the oceans will be set to 0 grdlandmask [INFORMATION]: Nodes on land will be set to 1 grdlandmask [INFORMATION]: Nodes in lakes will be set to 0 grdlandmask [INFORMATION]: Nodes in islands will be set to 1 grdlandmask [INFORMATION]: Nodes in ponds will be set to 0 grdlandmask [INFORMATION]: Apply linear scale to avoid wrap-around: -Jx100id; this temporarily changes the domain grdlandmask [INFORMATION]: Writing grid to file /tmp/pygmt-i3n6zkcg.nc grdlandmask [INFORMATION]: Level 0 set for 33 nodes grdlandmask [INFORMATION]: Level 1 set for 3 nodes

So the 2 CPUs mentioned at https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources might have hyperthreading, and we could potentially set cores=4, though see #2944 (comment) on how more cores might not actually improve runtime.

This reverts commit 35e2d61.

weiji14 added 2 commits January 2, 2024 20:40

Add common alias cores (x) for grdlandmask

54577a8

Run test_grdlandmask_no_outgrid with 2 cores

137a04f

weiji14 added the enhancement Improving an existing feature label Jan 2, 2024

weiji14 added this to the 0.11.0 milestone Jan 2, 2024

weiji14 self-assigned this Jan 2, 2024

weiji14 commented Jan 2, 2024

View reviewed changes

weiji14 mentioned this pull request Jan 2, 2024

Investigate large (>10%) deviations in benchmark of test_grdlandmask_no_outgrid #2942

Closed

weiji14 added 2 commits January 4, 2024 11:18

Merge branch 'main' into limit-cores

b8b10fd

Try setting cores=1 and turn on verbose info mode

35e2d61

weiji14 commented Jan 4, 2024

View reviewed changes

weiji14 added 2 commits January 4, 2024 14:25

Merge branch 'main' into limit-cores

ce4901c

Revert "Try setting cores=1 and turn on verbose info mode"

97637d4

This reverts commit 35e2d61.

weiji14 marked this pull request as ready for review January 4, 2024 01:29

seisman approved these changes Jan 4, 2024

View reviewed changes

weiji14 enabled auto-merge (squash) January 4, 2024 01:38

weiji14 disabled auto-merge January 4, 2024 01:54

weiji14 merged commit 8be628e into main Jan 4, 2024

weiji14 deleted the limit-cores branch January 4, 2024 01:55

Conversation

weiji14 commented Jan 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codspeed-hq bot commented Jan 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging #2944 will improve performances by ×3

Summary

Benchmarks breakdown

Uh oh!

seisman commented Jan 2, 2024

Merging #2944 will improve performances by ×3

Summary

Benchmarks breakdown

Uh oh!

weiji14 commented Jan 2, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

weiji14 commented Jan 2, 2024 •

edited

Loading

codspeed-hq bot commented Jan 2, 2024 •

edited

Loading