clib: Fix the bug when passing multiple columns of strings with variable lengths to the GMT C API by seisman · Pull Request #2719 · GenericMappingTools/pygmt

seisman · 2023-10-07T10:19:57Z

Description of proposed changes

Found this bug when working on PR #2720.

The changed code is meant to element-wisely join two string arrays with a space, but it doesn't work if strings have different lengths.

Here is a simple example to reproduce the issue:

>>> import numpy as np
>>> string_arrays = [np.array(["ABC", "DEF"]), np.array(["ABC", "DEFGHIJK"])]
>>> np.apply_along_axis(func1d=" ".join, axis=0, arr=string_arrays)
array(['ABC ABC', 'DEF DEF'], dtype='<U7')

Obvisouly the result is incorrect, although I don't fully understand why it doesn't work.

The new code works as expected.

>>> np.array([" ".join(vals) for vals in zip(*string_arrays)])
array(['ABC ABC', 'DEF DEFGHIJK'], dtype='<U12')

BTW: a test related to this bug will be added in PR #2720.

yvonnefroehlich · 2023-10-08T15:34:20Z

I just tried different versions of @seismann's code example:

import numpy as np

string_arrays01 = [np.array(["ABC", "DEF"]), np.array(["ABC", "DEFGHIJK"])]
test01 = np.apply_along_axis(func1d=" ".join, axis=0, arr=string_arrays01)
test01
# array(['ABC ABC', 'DEF DEF'], dtype='<U7')  -> cut after 7 signs

string_arrays02 = [np.array(["ABC", "DEFGHIJK"]), np.array(["ABC", "DEF"])]
test02 = np.apply_along_axis(func1d=" ".join, axis=0, arr=string_arrays02)
test02
# array(['ABC ABC', 'DEFGHIJ'], dtype='<U7')  -> cut after 7 signs

string_arrays03= [np.array(["ABC", "DEF"]), np.array(["DEFGHIJK", "ABC"])]
test03 = np.apply_along_axis(func1d=" ".join, axis=0, arr=string_arrays03)
test03
# array(['ABC DEFGHIJK', 'DEF ABC'], dtype='<U12')  -> NOT cut after 7 signs and now <U12 instead of <U7

string_arrays04 = [np.array(["DEFGHIJK", "ABC"]), np.array(["ABC", "DEF"])]
test04 = np.apply_along_axis(func1d=" ".join, axis=0, arr=string_arrays04)
test04
# array(['DEFGHIJK ABC', 'ABC DEF'], dtype='<U12')  -> NOT cut after 7 signs and now <U12 instead of <U7

Here, it looks like the length of the first concentrated string sets the maximum length of all other / following concentrated strings.

This occurs identically when using axis=1 instead of axis=0:

string_arrays05 = [np.array(["ABC", "DEF"]), np.array(["ABC", "DEFGHIJK"])]
test05 = np.apply_along_axis(func1d=" ".join, axis=1, arr=string_arrays05)
test05 
# array(['ABC DEF', 'ABC DEF'], dtype='<U7')  -> cut after 7 signs

string_arrays06 = [np.array(["ABC", "DEFGHIJK"]), np.array(["ABC", "DEF"])]
test06 = np.apply_along_axis(func1d=" ".join, axis=1, arr=string_arrays06)
test06 
# array(['ABC DEFGHIJK', 'ABC DEF'], dtype='<U12')  -> NOT cut after 7 signs and now <U12 instead of <U7

seisman · 2023-10-08T15:41:56Z

If you do Google search "apply_along_axis string arrays", you will see many posts that have exactly the same error, and also the upstream issue report numpy/numpy#8352.

seisman · 2023-10-08T16:02:31Z

>>> import numpy as np
>>> string_arrays = [np.array(["ABC", "DEF"]), np.array(["ABC", "DEFGHIJK"])]
>>> %timeit np.apply_along_axis(func1d=" ".join, axis=0, arr=string_arrays)
37.3 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>>> %timeit np.array([" ".join(vals) for vals in zip(*string_arrays)])
3.02 µs ± 278 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

The new & correct version is 10x faster than the old & incorrect version!

weiji14 · 2023-10-08T20:55:49Z

Thanks @seisman for spotting the bug, fixing it and adding the benchmark!

BTW: a test related to this bug will be added in PR #2720.

Could you try to add a unit test to test_clib.py under test_virtualfile_from_vectors_*? Want to make sure we're capturing this on the clib level and not just via fig.text.

seisman · 2023-10-09T02:02:27Z

Thanks @seisman for spotting the bug, fixing it and adding the benchmark!

BTW: a test related to this bug will be added in PR #2720.

Could you try to add a unit test to test_clib.py under test_virtualfile_from_vectors_*? Want to make sure we're capturing this on the clib level and not just via fig.text.

Modified the existing test test_virtualfile_from_vectors_two_string_or_object_columns to catch this bug.

With the main branch, the test fails:

E           AssertionError: assert '0\t5\ta pqrs...t9\tklmnolo\n' == '0\t5\ta pqrs...mnolooong $\n'
E               0	5	a pqrst
E               1	6	bc uvwx
E               2	7	def yz!
E               3	8	ghij @#
E             - 4	9	klmnolooong $
E             ?  	 	       ------
E             + 4	9	klmnolo

Done in 80e500c.

weiji14

Modified the existing test test_virtualfile_from_vectors_two_string_or_object_columns to catch this bug.

Brilliant! Wish I had used a different string length back in #520 (you actually asked me to make variable length strings in #520 (comment), but I didn't make sure that string1 + string2 were different lengths) 😆

seisman · 2023-10-09T02:21:36Z

I've revised the PR title to make it more descriptive as a changelog entry.

Fix the bug when concatenating strings arrays with spaces

f83f21d

seisman added the bug Something isn't working label Oct 7, 2023

seisman added this to the 0.11.0 milestone Oct 7, 2023

seisman changed the title ~~Fix the bug when concatenating strings arrays with spaces~~ Fix the bug when element-wisely join string arrays with spaces Oct 7, 2023

seisman changed the title ~~Fix the bug when element-wisely join string arrays with spaces~~ Fix the bug of elementwisely joining string arrays with spaces Oct 7, 2023

seisman changed the title ~~Fix the bug of elementwisely joining string arrays with spaces~~ Fix the bug of elementwisely joining string arrays with a spce Oct 7, 2023

seisman changed the title ~~Fix the bug of elementwisely joining string arrays with a spce~~ Fix the bug of elementwisely joining string arrays with a space Oct 7, 2023

seisman added the needs review This PR has higher priority and needs review. label Oct 7, 2023

seisman mentioned this pull request Oct 7, 2023

Figure.text: Support passing in a list of angle/font/justify values #2720

Merged

7 tasks

yvonnefroehlich approved these changes Oct 8, 2023

View reviewed changes

seisman added 2 commits October 9, 2023 09:56

Merge branch 'main' into put-string

b5567ce

Improve the existing test to catch the bug

80e500c

weiji14 approved these changes Oct 9, 2023

View reviewed changes

seisman removed the needs review This PR has higher priority and needs review. label Oct 9, 2023

seisman changed the title ~~Fix the bug of elementwisely joining string arrays with a space~~ clib: Fix the bug when passing two columns of strings with variable lengths to the GMT C API Oct 9, 2023

seisman changed the title ~~clib: Fix the bug when passing two columns of strings with variable lengths to the GMT C API~~ clib: Fix the bug when passing multiple columns of strings with variable lengths to the GMT C API Oct 9, 2023

seisman added the final review call This PR requires final review and approval from a second reviewer label Oct 9, 2023

seisman merged commit 3d401bb into main Oct 9, 2023

seisman deleted the put-string branch October 9, 2023 03:58

seisman removed the final review call This PR requires final review and approval from a second reviewer label Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clib: Fix the bug when passing multiple columns of strings with variable lengths to the GMT C API#2719

clib: Fix the bug when passing multiple columns of strings with variable lengths to the GMT C API#2719
seisman merged 3 commits intomainfrom
put-string

seisman commented Oct 7, 2023 •

edited

Loading

Uh oh!

yvonnefroehlich commented Oct 8, 2023

Uh oh!

seisman commented Oct 8, 2023

Uh oh!

seisman commented Oct 8, 2023 •

edited

Loading

Uh oh!

weiji14 commented Oct 8, 2023

Uh oh!

seisman commented Oct 9, 2023 •

edited

Loading

Uh oh!

weiji14 left a comment

Uh oh!

seisman commented Oct 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

seisman commented Oct 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yvonnefroehlich commented Oct 8, 2023

Uh oh!

seisman commented Oct 8, 2023

Uh oh!

seisman commented Oct 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

weiji14 commented Oct 8, 2023

Uh oh!

seisman commented Oct 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

weiji14 left a comment

Choose a reason for hiding this comment

Uh oh!

seisman commented Oct 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

seisman commented Oct 7, 2023 •

edited

Loading

seisman commented Oct 8, 2023 •

edited

Loading

seisman commented Oct 9, 2023 •

edited

Loading