Skip to content

Lot of unsafe code could probably be replaced by safe code without perf. impact #18

@Ten0

Description

@Ten0

Looking at the source code:
https://docs.rs/slice-group-by/0.2.6/src/slice_group_by/linear_group/linear_group_by.rs.html#131-136

It seems this crate makes extensive use of unsafe.
However, it also looks like this unsafe is unnecessary as it creates no performance improvement:

Representation with the two pointers is exactly the memory representation of a slice, so it looks like this:

pub struct LinearGroupBy<'a, T: 'a, P> {
    ptr: *const T,
    end: *const T,
    predicate: P,
    _phantom: marker::PhantomData<&'a T>,
}

could be replaced by this:

pub struct LinearGroupBy<'a, T: 'a, P> {
    elements_left: &'a [T],
    predicate: P,
}

while this:

                let mut i = 0;
                let mut ptr = self.ptr;

                // we use an unsafe block to avoid bounds checking here.
                // this is safe because the only thing we do here is to get
                // two elements at `ptr` and `ptr + 1`, bounds checking is done by hand.

                // we need to get *two* contiguous elements so we check that:
                //  - the first element is at the `end - 1` position because
                //  - the second one will be read from `ptr + 1` that must
                //    be lower or equal to `end`
                unsafe {
                    while ptr != self.end.sub(1) {
                        let a = &*ptr;
                        ptr = ptr.add(1);
                        let b = &*ptr;

                        i += 1;

                        if !(self.predicate)(a, b) {
                            let slice = $mkslice(self.ptr, i);
                            self.ptr = ptr;
                            return Some(slice)
                        }
                    }
                }

could be rewritten using for (i,s) in self.elements_left.windows(2).enumerate() and slice::split_at/slice::split_at_mut.

(and obviously this:

                // `i` is either `0` or the `slice length - 1` because either:
                //  - we have not entered the loop and so `i` is equal to `0`
                //    the slice length is necessarily `1` because we ensure it is not empty
                //  - we have entered the loop and we have not early returned
                //    so `i` is equal to the slice `length - 1`
                let slice = unsafe { $mkslice(self.ptr, i + 1) };
                self.ptr = self.end;
                Some(slice)

by this:

Some(std::mem::replace(&mut self.elements_left, &[]))

)

I don't think this change would create additional runtime cost, as most double-bounds-checking (or constant-bounds-checking in particular with the constants 2 and 1 here) usually gets optimized-out by the compiler, and I would feel much more comfortable using this crate, as I wouldn't like to introduce so much unsafe to our codebase for such a simple algorithm.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions