Skip to content

Comments

Bitwise cleanup: Fix opSlice index error. Add unittests and changelog entry#5002

Merged
andralex merged 2 commits intodlang:masterfrom
edi33416:fix_bitwise
Jan 14, 2017
Merged

Bitwise cleanup: Fix opSlice index error. Add unittests and changelog entry#5002
andralex merged 2 commits intodlang:masterfrom
edi33416:fix_bitwise

Conversation

@edi33416
Copy link
Contributor

This PR brings the following:

  • Fixed opSlice bug.
  • More unittests.
  • A changelog entry
  • Fixed typos
  • There were mixed tabs and spaces in changelog.dd. Now there are only spaces.

@edi33416
Copy link
Contributor Author

@wilzbach could you please have a look and make sure I did not miss any of your previous points?

@wilzbach wilzbach added this to the 2.073.0-b0 milestone Dec 29, 2016
Copy link
Contributor

@wilzbach wilzbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Endianness

Ehm apparently no one mentioned that e.g. Intel's CPUs are Little-endian, so on my machine this means that core.bitop and bitwise don't work well together:

import core.bitop;
import std.range : bitwise;

auto arr = [3UL];
bt(arr.ptr, 1).writeln; // 1
bts(arr.ptr, 2);
arr[0].writeln; // 7

arr = [3UL]; // reset
auto r = arr.bitwise;
r[1].writeln; // false
r[2] = true;
arr[0].writeln; // 2305843009213693955

Shouldn't the bitwise stream behave similarly to the BitRange or at least allow this behavior?

CC @schveiguy @MartinNowak

Performance

I have thrown a small benchmark together to show my previous point that recalculating the mask on every access is slow - the test is a simple count of set bits:

  • the first example (filter) is just to show the cost of the range abstraction as it uses filter and sum
  • all other examples use a foreach for better comparability
  • bitwise.foreach is the bitwise taken from this PR (and thus Phobos)
  • bitwise2 is a simpler bitwise with shifts the mask directly (in Big-endian and Little-endian variants)
  • bitwise3 uses BitRange to calculate the positions
  • BitRange just returns the positions of all set bits
> ldc -O5 -release -boundscheck=off bitarray.d bitwise.d && ./bitarray
bitwise.filter  = 29 secs, 259 ms, 840 μs, and 9 hnsecs
bitwise.foreach = 24 secs, 668 ms, 420 μs, and 2 hnsecs
bitwise2Big.foreach = 16 secs, 778 ms, 923 μs, and 5 hnsecs
bitwise2Little.foreach = 16 secs, 849 ms, 760 μs, and 2 hnsecs
bitwise3.foreach = 21 secs, 905 ms, 458 μs, and 8 hnsecs
bitrange.foreach = 3 secs, 580 ms, and 864 μs

source here

@wilzbach could you please have a look and make sure I did not miss any of your previous points?

Thanks a lot! Looks good, but I think you still have to think about some special cases in opSlice, have a look at this small snippet:

auto arr = only(4, 1);
writeln(arr[1..$]); // [1]
auto r = arr.bitwise;
r[29].writeln; // error
r[29..33].writeln; // error
auto r = 4.repeat.bitwise;
r[0..8].writeln; // error

import std.algorithm.comparison : equal;

auto rb = rndGen.bitwise;
assert(isInfinite!(typeof(rb)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static assert

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct

assert(isInfinite!(typeof(rb)));

auto rb2 = rndGen.bitwise;
// Don't forget that structs are passed by value
Copy link
Contributor

@wilzbach wilzbach Dec 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OT: which is why random ranges should never be allowed to be passed by value (it can lead easily to randomness errors), you may see e.g. the reference shell that Mir uses

See also this Dconf 2015 talk: http://dconf.org/2015/talks/wakeling.html

else static if (isBidirectionalRange!R)
{
bool isOverlapping = parent.empty;
if (!parent.empty)
Copy link
Contributor

@wilzbach wilzbach Dec 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about:

if (parent.empty)
    return true;
else
    return parent.save.dropOne.empty && maskPos < backMaskPos;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wilzbach redundant flow

@9il
Copy link
Member

9il commented Jan 2, 2017

Shouldn't the bitwise stream behave similarly to the BitRange or at least allow this behavior?

It should.

@wilzbach
Copy link
Contributor

wilzbach commented Jan 4, 2017

Shouldn't the bitwise stream behave similarly to the BitRange or at least allow this behavior?
It should.

The 2.073 release is coming up soon (today is the scheduled feature freeze), so what do we with bitwise?
I vote for reverting before 2.073 to avoid exposing half-finished work and thus running into deprecation problems.

@MartinNowak
Copy link
Member

The endianess must definitely be fixed.

@MartinNowak MartinNowak removed this from the 2.073.0-b0 milestone Jan 7, 2017
@edi33416
Copy link
Contributor Author

edi33416 commented Jan 9, 2017

This commit addresses only the endianess issue. I am now consuming the bits from lsb to msb.
This is the desired behavior, right?

@andralex
Copy link
Member

Yes, bit "zero" is traditionally LSB and bit 31 or 63 etc. is LSB. So it's expected we consume from lsb regardless of endianness.

@andralex
Copy link
Member

@wilzbach you lost me with the perf measurements. What is the tl;dr? I'm mostly confused because the code should be identical for the little and big endian machines - in all cases, consume from LSB through MSB.

Performance is not a criteria for acceptance so it shouldn't stop this lest we spend the rest of the year getting the perfect shade. Perf improvements can always come later. And I'm not sure caching the mask is a good idea anyway. Generally: all other things being approximately equal, a little computation is to be preferred to a little extra state to maintain.

Copy link
Member

@andralex andralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edi33416 replace that assert with static assert pls - thx!

else static if (isBidirectionalRange!R)
{
bool isOverlapping = parent.empty;
if (!parent.empty)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wilzbach redundant flow

static if (isBidirectionalRange!R)
{
len -= (backMaskPos - 1);
len -= (bitsNum - backMaskPos);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why parens?

@wilzbach
Copy link
Contributor

wilzbach commented Jan 13, 2017

@wilzbach you lost me with the perf measurements. What is the tl;dr?

Recalculating the mask isn't the most performant way:

  • bitwise with mask recalculation: 25s
  • bitwise with mask caching: 17s
  • bitwise using core.bitop.BitRange: 21s
  • just BitRange: 3.5s (core.bitop.BitRange iterates only over set bits)

I'm mostly confused because the code should be identical for the little and big endian machines - in all cases, consume from LSB through MSB.

I just included it because bitwise was MSB -> LSB before.

Performance is not a criteria for acceptance so it shouldn't stop this lest we spend the rest of the year getting the perfect shade. Perf improvements can always come later.

Yes, but

  1. I had a bit of time and wanted to prove my point that calculating the mask is slower
  2. as far as I understand this was intended to be an optimization for std.range.uniform and my point was just that for 3 bits or more it's already faster to just throw away the generated bits:
> ldc -O5 -release -boundscheck=off test.d && ./test 1 2 8 256 1024 65536
benchmark: 1 (bits: 1)
random.uniform  = 742 ms, 824 μs, and 5 hnsecs
random.bitwise  = 218 ms, 972 μs, and 9 hnsecs
random.uniform.fixed = 592 ms, 647 μs, and 2 hnsecs
benchmark: 2 (bits: 2)
random.uniform  = 746 ms, 839 μs, and 5 hnsecs
random.bitwise  = 497 ms and 189 μs
random.uniform.fixed = 594 ms, 196 μs, and 1 hnsec
benchmark: 7 (bits: 3)
random.uniform  = 747 ms, 499 μs, and 6 hnsecs
random.bitwise  = 712 ms, 791 μs, and 7 hnsecs
random.uniform.fixed = 595 ms, 432 μs, and 4 hnsecs
benchmark: 255 (bits: 8)
random.uniform  = 746 ms, 915 μs, and 1 hnsec
random.bitwise  = 1 sec, 787 ms, 508 μs, and 2 hnsecs
random.uniform.fixed = 592 ms, 198 μs, and 7 hnsecs
benchmark: 1023 (bits: 10)
random.uniform  = 743 ms and 597 μs
random.bitwise  = 2 secs, 208 ms, 421 μs, and 5 hnsecs
random.uniform.fixed = 593 ms, 368 μs, and 6 hnsecs
benchmark: 65535 (bits: 16)
random.uniform  = 743 ms, 132 μs, and 7 hnsecs
random.bitwise  = 3 secs, 483 ms, 936 μs, and 3 hnsecs
random.uniform.fixed = 592 ms, 316 μs, and 4 hnsecs

random.uniform.fixed: If k is known at compile-time, ldc can do a couple of optimizations.
(benchmark code)

edit: It would be interesting to use BitRange for this benchmark as well)

as a user-friendly way to query for function attributes.))
$(LI $(RELATIVE_LINK2 bitwise, Added `std.range.bitwise` to create
a bitwise adapter over an integral type range, consuming the range elements
bit by bit.))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changelog.dd wasn't removed yet, but it will be soon.
Could you please create a file in the new changelog folder? This will also end merge conflicts with the changelog file.

@andralex andralex merged commit 04e3e11 into dlang:master Jan 14, 2017
@andralex
Copy link
Member

Cool, @wilzbach you may want to now add the bitmask caching. Thx!

@9il
Copy link
Member

9il commented Feb 27, 2017

Another issue: front and back are not mutable.

@edi33416 edi33416 deleted the fix_bitwise branch October 9, 2017 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants