Skip to content

Conversation

@daviesrob
Copy link
Member

UBSan on clang complains about dereferencing unaligned uint16_tpointers even when the dereference is being done by memmove(). Strictly it's correct as even setting the pointer to such a value is undefined behaviour according to the standard.

Silence the noise by changing the pointers to uint8_t and adjusting the arithmetic on them as necessary.

Some bounds checks to ensure fast code won't read beyond the end of its input are also adjusted to prevent any possibility of generating an address beyond the limits of the input memory.

(configure CFLAGS='-g -O2 -fsanitize=address,undefined -fno-sanitize-recover=all' LDFLAGS='-g -O2 -fsanitize=address,undefined' to highlight the problem this solves.)

Comment on lines 492 to 493
ptr16[-2] = ransN[z-k];
ptr16[-1] = ransN[z-k]>>8;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect.

If you make htscodecs_endian.h a nop so the #ifdef HTSCODECS_LITTLE_ENDIAN checks fail, this no longer builds.

It should be:

                ptr[-2] = ransN[z-k];                                           
                ptr[-1] = ransN[z-k]>>8;                                        

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed now.

daviesrob and others added 2 commits February 3, 2026 15:07
UBSan on clang complains about dereferencing unaligned uint16_t
pointers even when the dereference is being done by memmove().
Strictly it's correct as even setting the pointer to such a
value is undefined behaviour according to the standard.

Silence the noise by changing the pointers to uint8_t and
adjusting the arithmetic on them as necessary.

Some bounds checks to ensure fast code won't read beyond the
end of its input are also adjusted to prevent any possibility
of generating an address beyond the limits of the input memory.
The O0 and O1 encoder had incorrect arguments to *storeu_epi16.
The (1<<pc2)-1 is creating a bit-mask of which 16-bit quantities to
store (it's the "_mask_" part of the function call), so we can't
naively double it.

The O1 decoder missed doubling the 'sp' increment when we're decoding
a TF_SHIFT_O1_FAST data-stream.
@jkbonfield
Copy link
Collaborator

If you're happy that my changes to the AVX512 didn't reintroduce any issues (it doesn't look like it would trigger ubsan as the changes are minimal, but I didn't retest it), then I'm happy to accept this.

I noted it has a small performance hit in the scalar encoding function (ie rans_compress_O0_32x16). It's about 7% slower on clang-21 and less with gcc. However this isn't such a commonly targetted platform type as most modern x86_64 systems have AVX2 capabilities and 7% isn't disastrous.

I did try improving it, but everything I tried slowed up down so I'll not spend more time fiddling with it.

@daviesrob
Copy link
Member Author

I'm happy with your changes.

@jkbonfield jkbonfield merged commit 1682b5f into samtools:master Feb 3, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants