Skip to content

[CI][R] Resolve Valgrind errors  #42234

@jonkeane

Description

@jonkeane

Describe the bug, including details regarding any error messages, version, and platform.

We have been seeing Valgrind errors for a while now in R.

==774== HEAP SUMMARY:
==774==     in use at exit: 351,781,345 bytes in 69,559 blocks
==774==   total heap usage: 16,807,335 allocs, 16,737,776 frees, 9,804,514,696 bytes allocated
==774== 
==774== 400 bytes in 1 blocks are possibly lost in loss record 252 of 3,243
==774==    at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==774==    by 0x40147D9: calloc (rtld-malloc.h:44)
==774==    by 0x40147D9: allocate_dtv (dl-tls.c:375)
==774==    by 0x40147D9: _dl_allocate_tls (dl-tls.c:634)
==774==    by 0x4DA67B4: allocate_stack (allocatestack.c:430)
==774==    by 0x4DA67B4: pthread_create@@GLIBC_2.34 (pthread_create.c:647)
==774==    by 0x11D614D3: je_arrow_private_je_pthread_create_wrapper (background_thread.c:47)
==774==    by 0x11D614D3: background_thread_create_signals_masked (background_thread.c:287)
==774==    by 0x11D614D3: background_thread_create_locked (background_thread.c:495)
==774==    by 0x11D6275C: je_arrow_private_je_background_thread_create (background_thread.c:520)
==774==    by 0x400647D: call_init.part.0 (dl-init.c:70)
==774==    by 0x4006567: call_init (dl-init.c:33)
==774==    by 0x4006567: _dl_init (dl-init.c:117)
==774==    by 0x4E85AF4: _dl_catch_exception (dl-error-skeleton.c:182)
==774==    by 0x400DFF5: dl_open_worker (dl-open.c:808)
==774==    by 0x400DFF5: dl_open_worker (dl-open.c:771)
==774==    by 0x4E85A97: _dl_catch_exception (dl-error-skeleton.c:208)
==774==    by 0x400E34D: _dl_open (dl-open.c:883)
==774==    by 0x4DA163B: dlopen_doit (dlopen.c:56)
==774== 
==774== 723 (144 direct, 579 indirect) bytes in 1 blocks are definitely lost in loss record 288 of 3,243
==774==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==774==    by 0x12F2B64D: CRYPTO_zalloc (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==774==    by 0x12F10997: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==774==    by 0x12F008F9: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==774==    by 0x12F205E8: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==774==    by 0x12F2050B: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==774==    by 0x12F3FB2A: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==774==    by 0x13009227: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==774==    by 0x1300987D: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==774==    by 0x12F0D392: EVP_MAC_fetch (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==774==    by 0x1183F6B2: Aws::Utils::Crypto::Sha256HMACOpenSSLImpl::Calculate(Aws::Utils::Array<unsigned char> const&, Aws::Utils::Array<unsigned char> const&) (in /usr/local/RDvalgrind/lib/R/site-library/arrow/libs/arrow.so)
==774==    by 0x11B6CF4F: Aws::Utils::Crypto::Sha256HMAC::Calculate(Aws::Utils::Array<unsigned char> const&, Aws::Utils::Array<unsigned char> const&) (in /usr/local/RDvalgrind/lib/R/site-library/arrow/libs/arrow.so)
==774== 
==774== 1,248 bytes in 3 blocks are possibly lost in loss record 330 of 3,243
==774==    at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==774==    by 0x40147D9: calloc (rtld-malloc.h:44)
==774==    by 0x40147D9: allocate_dtv (dl-tls.c:375)
==774==    by 0x40147D9: _dl_allocate_tls (dl-tls.c:634)
==774==    by 0x4DA67B4: allocate_stack (allocatestack.c:430)
==774==    by 0x4DA67B4: pthread_create@@GLIBC_2.34 (pthread_create.c:647)
==774==    by 0x11D62513: je_arrow_private_je_pthread_create_wrapper (background_thread.c:47)
==774==    by 0x11D62513: background_thread_create_signals_masked (background_thread.c:287)
==774==    by 0x11D62513: check_background_thread_creation (background_thread.c:332)
==774==    by 0x11D62513: background_thread0_work (background_thread.c:370)
==774==    by 0x11D62513: background_work (background_thread.c:412)
==774==    by 0x11D62513: background_thread_entry (background_thread.c:444)
==774==    by 0x4DA5AC2: start_thread (pthread_create.c:442)
==774==    by 0x4E36A03: clone (clone.S:100)
==774== 
==774== 12,048 bytes in 4 blocks are possibly lost in loss record 1,495 of 3,243
==774==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==774==    by 0x4013E4D: malloc (rtld-malloc.h:56)
==774==    by 0x4013E4D: allocate_dtv_entry (dl-tls.c:684)
==774==    by 0x4013E4D: allocate_and_init (dl-tls.c:709)
==774==    by 0x4013E4D: tls_get_addr_tail (dl-tls.c:907)
==774==    by 0x401820B: __tls_get_addr (tls_get_addr.S:55)
==774==    by 0x11D616D3: tsd_state_get (tsd.h:269)
==774==    by 0x11D616D3: tsd_fetch_impl (tsd.h:421)
==774==    by 0x11D616D3: tsd_fetch_min (tsd.h:433)
==774==    by 0x11D616D3: tsd_internal_fetch (tsd.h:439)
==774==    by 0x11D616D3: background_thread_entry (background_thread.c:444)
==774==    by 0x4DA5AC2: start_thread (pthread_create.c:442)
==774==    by 0x4E36A03: clone (clone.S:100)
==774== 
==774== LEAK SUMMARY:
==774==    definitely lost: 144 bytes in 1 blocks
==774==    indirectly lost: 579 bytes in 11 blocks
==774==      possibly lost: 13,696 bytes in 8 blocks
==774==    still reachable: 351,098,631 bytes in 69,538 blocks
==774==                       of which reachable via heuristic:
==774==                         length64           : 456 bytes in 2 blocks
==774==                         newarray           : 4,264 bytes in 1 blocks
==774==         suppressed: 668,295 bytes in 1 blocks
==774== Reachable blocks (those to which a pointer was found) are not shown.
==774== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==774== 
==774== For lists of detected and suppressed errors, rerun with: -s
==774== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 1 from 1)

a recent build

Judging from when these started, I suspect one of these PRs is what introduced this:

* #41419
* #41295
* #41421
* #41366
* #41434

Turns out, something changed with how we were looking for binaries when we were instantiating the Valgrind run and that was causing these issues. The strange thing is that there were no code changes around this which shouldn't have caused this build to start using binaries of libarrow, but seemingly did:

The last success 30 April:

*** No nightly binaries were found for version 16.0.0.9000: falling back to libarrow build from source

The first failure 1 May:

*** Latest available nightly for 16.0.0.9000: 16.0.0.100000045

I've hardcoded don't-download-binaries in #42249 which resolves the issue, but I'm curious if you know of what changed around then to start this @assignUser ? We also might need to check other builds that we want to be source builds and confirm that they still are too.

Component(s)

C++, R

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions