Skip to content
This repository was archived by the owner on Oct 12, 2022. It is now read-only.

add core.int128#3663

Merged
dlang-bot merged 1 commit intodlang:masterfrom
WalterBright:int128
Jan 15, 2022
Merged

add core.int128#3663
dlang-bot merged 1 commit intodlang:masterfrom
WalterBright:int128

Conversation

@WalterBright
Copy link
Copy Markdown
Member

This is the opening salvo to support the cent type in D. It'll be needed by both the compiler and the generated executables. This version is not optimized, it just establishes a baseline of functionality.

It's built to be used as a standalone package, or have the compiler generate calls to, or have the compiler use it internally to bootstrap cent support.

It's byte endianness independent.

@dlang-bot
Copy link
Copy Markdown
Contributor

Thanks for your pull request, @WalterBright!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + druntime#3663"

@rikkimax
Copy link
Copy Markdown
Contributor

rikkimax commented Jan 7, 2022

Right now this wouldn't work without extra work for people using -betterC due to not being compiled in.

An option might be to specify the signedness via a template parameter, which would allow for both signed and unsigned working and it working in -betterC too.

alias U = ulong;
enum Ubits = U.sizeof * 8;

struct Cent
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a ucent

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ucent is a keyword, cannot use that as a struct name.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the Cent struct will work with both signed and unsigned operations.

* complemented value
*/
pure
Cent com(Cent c)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these free functions should be operator overloads

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided against that because:

  1. better to pass arguments by value rather than by ref
  2. easier to interface with C, will probably make them extern (C) functions

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't need to be extern(C), especially with three letter names.

@WalterBright
Copy link
Copy Markdown
Member Author

Right now this wouldn't work without extra work for people using -betterC due to not being compiled in.

I know, but have to start somewhere.

An option might be to specify the signedness via a template parameter, which would allow for both signed and unsigned working and it working in -betterC too.

The underlying signedness depends on which operation is done, just like the CPU. As for templates, I wanted to keep the name mangling out of such low level stuff.

@rikkimax
Copy link
Copy Markdown
Contributor

rikkimax commented Jan 7, 2022

Right now this wouldn't work without extra work for people using -betterC due to not being compiled in.

I know, but have to start somewhere.

Me an @thewilsonator talked about this on Discord briefly, what we are looking for is more along the lines of my code of: (Boost licensed) https://gist.github.com/rikkimax/a31a3e085f4aeb00ce7206804a9f2a56

But you seem to have something very different in mind. Perhaps you are treating this as compiler hooks rather than a usable type in itself?

@WalterBright
Copy link
Copy Markdown
Member Author

Now, why would adding a new module to druntime, which nobody uses, result in the message:

Test 'runnable/sdtor.d' failed. The logged output:
/tmp/dmd/generated/linux/release/32/dmd -conf= -m32 -Irunnable -preview=dtorfields  -fPIC  -odtest_results/runnable -oftest_results/runnable/sdtor_0  runnable/sdtor.d 
runnable/sdtor.d(3872): Error: declaration `(S[2] sb = __error__;)` is not yet implemented in CTFE

from the test suite? Are things being pulled that didn't pass?

@WalterBright
Copy link
Copy Markdown
Member Author

@rikkimax I am looking for something designed to be used internally, not so much at user level, hence no operator overloading.

@Geod24
Copy link
Copy Markdown
Member

Geod24 commented Jan 7, 2022

If you want to design it iteratively, which I'm sure you do, put it under core.internal.int128, and it will alleviate some of the issue reviews might have, because it's an accepted place to put internal stuff (more hooks are there nowadays).
As long as it doesn't break the build for other people (e.g. other compilers or other archs), then you are free to do whatever you want.

struct Cent
{
U lsl; // low 64 bits
U msl; // high 64 bits
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The high bits should be signed. This simplifies testing for negative numbers to msl < 0.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, but then unsigned operations become more complicated and require a cast.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the most common use cases, this field is either going to be 0 or -1 anyway, so I see no reason not to streamline that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should stay unsigned for simplicity. Signed arithmetic is inherently harder, and so should be where the added work is.

@WalterBright
Copy link
Copy Markdown
Member Author

@Geod24 I view this as not just a compiler thing. It can be used as a building block for many purposes, like the overflow checks in bitop.d. Note that it doesn't import anything, this is deliberate.

@Geod24
Copy link
Copy Markdown
Member

Geod24 commented Jan 7, 2022

@WalterBright : All the more reasons to put it in core.internal. It's a lower barrier of entry and doesn't prevent us from moving it to another location later.

You mention that many of your initiative PRs are blocked, but from what I've seen they are always held back on the same disagreement:

  • You favor an iterative approach with small PRs;
  • Reviewers want to know what is the end goal;

In my opinion, there's two ways to move forward:

  1. Put that iterative stuff behind a switch / in an internal module, so it isn't public / it's clear it shouldn't yet be used;
  2. Submit a fully finished feature as a single, large PR, and then submit individual chunks as PRs, so one can review both the end goal and the steps, not just the steps;

My understanding is that option 2 doesn't work well for you, hence why I suggest option 1.

Trust me, I want to merge your PRs. But for any initiative PR, there is no way for me to tell if it's going in the right direction, I must take you on faith, lest I implement it myself to see why you are taking this specific step.

Copy link
Copy Markdown
Member

@ibuclaw ibuclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Divide operations should be distinguished from Divide+Modulus operations.

@ibuclaw
Copy link
Copy Markdown
Member

ibuclaw commented Jan 7, 2022

Now, why would adding a new module to druntime, which nobody uses, result in the message:

Test 'runnable/sdtor.d' failed. The logged output:
/tmp/dmd/generated/linux/release/32/dmd -conf= -m32 -Irunnable -preview=dtorfields  -fPIC  -odtest_results/runnable -oftest_results/runnable/sdtor_0  runnable/sdtor.d 
runnable/sdtor.d(3872): Error: declaration `(S[2] sb = __error__;)` is not yet implemented in CTFE

from the test suite? Are things being pulled that didn't pass?

@WalterBright you are 58 commits behind dlang/druntime:master. There's been a bunch of new front-end lowerings to templates added in the meantime, so you need to rebase.

@RazvanN7 are you merging these? Why is there no tests for when druntime implementations are absent (i.e: empty object.d)?

@RazvanN7
Copy link
Copy Markdown
Contributor

RazvanN7 commented Jan 7, 2022

@ibuclaw yes, i have been merging some lowerings lately, however, I don't understand what you are referring to? Do you mean betterC tests?

@ibuclaw
Copy link
Copy Markdown
Member

ibuclaw commented Jan 7, 2022

@ibuclaw yes, i have been merging some lowerings lately, however, I don't understand what you are referring to? Do you mean betterC tests?

__error__ regression is caused by dlang/dmd#13476

@RazvanN7
Copy link
Copy Markdown
Contributor

RazvanN7 commented Jan 7, 2022

Maybe rebasing will fix the issue.

Copy link
Copy Markdown
Member

@ljmf00 ljmf00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add missing tests to fully cover gt, dec, shl, shr, sar and mul.

@WalterBright
Copy link
Copy Markdown
Member Author

Trying a rebase.

@WalterBright WalterBright force-pushed the int128 branch 5 times, most recently from 74a00d2 to bdc10f0 Compare January 8, 2022 03:28
@WalterBright WalterBright force-pushed the int128 branch 4 times, most recently from c410808 to 8b2a7d1 Compare January 8, 2022 06:46
@WalterBright
Copy link
Copy Markdown
Member Author

@ibuclaw ready to go

Copy link
Copy Markdown
Member

@ibuclaw ibuclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having one last sweep through, there's two more easy optimizations/simplifications that can be done to neg and gt - the latter can be made to be identical to ugt, but cast the high part to signed. Also spotted one documentation typo in or.

I'll defer improvements to multiplication, division, rotate, and arithmetic shift for a follow up PR.

Side note, would having and_not be useful alongside and as a shortcut for the c1 & ~c2 case?

pure
Cent neg(Cent c)
{
return inc(com(c)); // ~c + 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return inc(com(c)); // ~c + 1
if (c.lo == 0)
c.hi = -c.hi;
else
{
c.lo = -c.lo;
c.hi = ~c.hi;
}
return c;

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the simplicity of the first for a baseline implementation. There are a lot of optimizations that can and should be done, but they are all harder to understand. Having a baseline gives something to judge it against.

Comment on lines +570 to +579
if (cast(I)c1.hi >= 0)
{
if (cast(I)c2.hi >= 0)
return ugt(c1, c2);
return true;
}
if (cast(I)c2.hi >= 0)
return false;
return ugt(c1, c2);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (cast(I)c1.hi >= 0)
{
if (cast(I)c2.hi >= 0)
return ugt(c1, c2);
return true;
}
if (cast(I)c2.hi >= 0)
return false;
return ugt(c1, c2);
return (c1.hi == c2.hi) ? (c1.lo > c2.lo) : (cast(I)c1.hi > cast(I)c2.hi);

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am all in favor of these improvements. But what would really help is to approve this PR. That way, I can begin incorporating cent support into the compiler, while you can submit PRs for these optimizations. That way, you get the credit for the optimizations, and we can be working in parallel. Currently, not having this PR pulled is a blocker.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted, I have nothing further to add after these remaining corrections.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic I wrote is easier to understand and visually determine to be correct. That's why I prefer it as the baseline implementation. Your version is faster, but is hard to read, and so should go in a followup PR. That way, it the PR breaks things, we know where to look.

* shifted value
*/
pure
Cent sar(Cent c, uint n)
Copy link
Copy Markdown
Member

@ibuclaw ibuclaw Jan 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd certainly improve usability if all these functions weren't named after x86 mnemonics. For instance, lshift, rshift, and arshift instead of shl, shr, and sar. Similarly, the rotate functions could be lrotate and rrotate, and the comparison functions be prefixed with s to make it clear that they are signed versions of the unsigned counterparts (eg: sgt and ugt).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't exactly follow the x86 mnemonics, and I've seen those mnemonics more or less used by various CPUs my entire career. They're second nature :-)

I used a u prefix for unsigned, and no prefix for signed, to match D's byte/ubyte, short/ushort, cent/ucent, etc. I deliberately did not use the x86 mul/imul, div/idiv mnemonics.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this is a public interface and so the naming conventions should really resemble something suitable for use outside of the compiler run-time.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

byte/ubyte, etc., are a public interface.

Comment on lines +38 to +42
pure
bool tst(Cent c)
{
return c.hi || c.lo;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every single use of this function is !tst(c), so it'd be much better just naming it bool isZero(Cent c) and inverting the condition.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reminded of isEmpty() for ranges, and every time I use it I have to stop and think a moment.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numbers aren't ranges though. Whether you want an is prefix or not, I'm not particularly opinionated on, but tst is an awful name which I guarantee will confuse newcomers to the language based of my experience of both new D programmers and new programmers.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True means non-zero and False means zero. Anything that flips that about, like isZero, is inherently confusing to me.

tst is not inherently confusing. It's been in wide use for at least 50 years now to mean !=0, probably more, and I've never heard of anyone being confused by it. I was a newbie once, too, and don't recall any confusion with it.

I have used inverted logic (isZero is inverted logic), and the only reason it is used (NAND gates, NOR gates) is because one less transistor is required. NAND logic is inherently confusing, and hard to get used to. I'd design a circuit using positive logic, and then mechanistically convert it to inverted logic using DeMorgan's Theorem.

@RazvanN7
Copy link
Copy Markdown
Contributor

@WalterBright once you address @ibuclaw 's review, we can get this in. Specifically:

#3663 (comment)
#3663 (comment)
#3663 (comment)

@ljmf00
Copy link
Copy Markdown
Member

ljmf00 commented Jan 15, 2022

I would like to vectorize this if this is worth vectorizing.

@maxhaton
Copy link
Copy Markdown
Member

I would like to vectorize this if this is worth vectorizing.

Probably not worth the hassle for now.

@ibuclaw
Copy link
Copy Markdown
Member

ibuclaw commented Jan 15, 2022

I would like to vectorize this if this is worth vectorizing.

Probably not worth the hassle for now.

And even then, the compiler would be able to detect the pattern an do the right thing for all cases except for division. But why vectorize only division? :-)

@ljmf00
Copy link
Copy Markdown
Member

ljmf00 commented Jan 15, 2022

I would like to vectorize this if this is worth vectorizing.

Probably not worth the hassle for now.

And even then, the compiler would be able to detect the pattern an do the right thing for all cases except for division. But why vectorize only division? :-)

Is DMD able to do it? Yes, I'm talking more about the division, but there are other functions that can be hardly vectorized (multiply and others that rely on big foreach loops, which can't be unrolled) due to the lack of a loop vectorization pass, maybe there is a better approach to dit more directly tho. What I'm trying to say is: for such critical operations, relying on complex compiler optimizations to do some operations can (a) hurt compile-time performance (b) the compiler may not be able to do a good job at all.

There is also a case I'm thinking of, where custom functions could be beneficial to sum 4 cents using 2 __vector(ulong[2]). Is the compiler able to figure that out?

Another thing I would like to consider here: currently in C, the alignment of __int128 is 16, and AFAIK, the default alignment of a struct in D is 8, right? At least appears to be: https://gcc.godbolt.org/z/65Kj5qdTK .

Copy link
Copy Markdown
Member

@ibuclaw ibuclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least make this change.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants