Skip to content
This repository was archived by the owner on Oct 12, 2022. It is now read-only.
/ druntime Public archive

[RFC] Add the core.sentinel module#2123

Closed
marler8997 wants to merge 1 commit intodlang:masterfrom
marler8997:sentinel
Closed

[RFC] Add the core.sentinel module#2123
marler8997 wants to merge 1 commit intodlang:masterfrom
marler8997:sentinel

Conversation

@marler8997
Copy link
Contributor

@marler8997 marler8997 commented Mar 2, 2018

REQUEST FOR COMMENT

In D we have APIs that accept both D arrays and pointers with sentinel values (https://en.wikipedia.org/wiki/Sentinel_value). The classic example is the string, where some functions accept D's string type and others accept null-terminated strings (i.e. the C standard library functions). This change proposes the addition of the SentinelPtr and SentinelArray types.

// A pointer to an array with elements of type `T` terminated by `sentinelValue`
template SentinelPtr(T, T sentinelValue = defaultSentinel!T) { ... }

// An standard D array that, along with a length is also terminated by `sentinelValue`.
// This type can safely be used as a standard D array and its `ptr` field is also
// a valid SentinelPtr.
template SentinelArray(T, T sentinelValue = defaultSentinel!T) { ... }

// C string
alias cstring = SentinelPtr!(const(char));

A SentinelPtr is a pointer to an array that contains a sentinel value to terminate its elements (i.e. a C string). A SentinelArray is a D array whose pointer is a SentinelPtr. D's "string literal" is an example of a SentinelArray since it contains both a length and guarantees null-termination. Note that the new cstring type (an alias to SentinelPtr!(const(char)) is an exact definition of "c-style" strings. This allows the semantics of c-strings to exist within the D type system, which comes with alot of advantages (see below).

Adding a SentinelPtr and SentinelArray type would allow an API to explicitly declare when a sentinel value is required. This allows the type system to enforce this requirement rather than relying on documentation/trust that the user only passes sentinel pointers when they are required, i.e.

// Params:
//     str = a null-terminated string
void foo1(const(char)* str);

// Params:
//     str = a null-terminated string
void foo2(cstring str);

// NOTE: cstring == SentinelPtr!(const(char))

void bar(string s)
{
    foo1(s.ptr); // Even though this is invalid, the type system doesn't catch it
    foo2(s.ptr); // Error: `immutable(char)*` is not convertible to SentinelPtr!(const(char))
}

Expanding on the example a bit further, the correct thing to do there would have been:

// Copies str to a temporary buffer so it can add a sentinel value and return it
const(char)* tempCString(const(char)[] str);

void bar(string s)
{
    foo1(s.tempCString); // OK
    foo2(s.tempCString); // OK
}

This pattern is seen all over phobos. The problem here is that if bar had been called with a string literal in the first place, then the copy in tempCString would have been completely unnecessary, but the bar function has no way of knowing whether or not s already has a sentinel value. This is why SentinelArray comes in handy because now bar can know this, i.e.

// If `T` is a SentinelArray this is a no-op that simply returns `str`. Otherwise,
// it copies str to a temporary buffer so it can add a sentinel value and return it
auto tempCString(T)(T str);

void bar(T)(T s)
{
    foo1(s.tempCString); // OK
    foo2(s.tempCString); // OK
}

bar("hello"); // Assuming the string literal type is now `SentinelString`,
              // this means `tempCString` will be a no op!

It's important to note that SentinelArray allows the type to indicate that the array not only "owns" all the elements between array[0..$], it also owns the sentinel element array[$]. The array type itself has no way to indicate this.

One reason for adding this to druntime (instead of phobos) is so that it could take advantage of these types in its C bindings. In cases where functions currently use pointers to accept sentinel arrays, they can be replaced with SentinelPtr to explicitly declare this requirement.

The overall impact should be an increase in safety, better template introspection and provides the opportunity to avoid unnecessary copy/allocations of arrays in certain situations. Safety would be increased because functions can declare when sentinel values are required allowing compile-time and runtime checks to enforce this. This also allows templates to take advantage of a common type to know when an array has a sentinel value. This can result in selecting more optimized algorithms when they apply (see https://www.youtube.com/watch?v=AxnotgLql0k&t=14m41s). It also allows functions to omit an extra copy/allocation when a sentinel array is required. The classic example of this is usage of the tempCString function which becomes a no-op for SentinelArray!char.

Note that with this change the string literal type could be redefined as SentinelArray!(immutable(char)), (whose alias is SentinelString) which would explicitly guarantee it is null-terminated allowing functions to know this as well. This prevents the need to copy the string to a temporary buffer when it needs to be passed to a function accepting "c-style" strings.

@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @marler8997! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the annotated coverage diff directly on GitHub with CodeCov's browser extension
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

@jmdavis
Copy link
Member

jmdavis commented Mar 2, 2018

The reason for adding this to druntime (instead of phobos) is so that it could take advantage of these types in its C bindings. In cases where functions currently use pointers to accept sentinel arrays, they can be replaced with SentinelPtr to explicitly declare this requirement.

Aside from whether these are a good idea in general, that won't work for C bindings. The C bindings must match the C declarations. And we don't do wrappers for the C bindings in druntime.

@marler8997
Copy link
Contributor Author

marler8997 commented Mar 2, 2018

Aside from whether these are a good idea in general, that won't work for C bindings. The C bindings must match the C declarations. And we don't do wrappers for the C bindings in druntime.

C doesn't mangle their function names...so the bindings should just work :) Also I'm not proposing adding wrappers, the proposal is to use cstring in place of const(char)* and SentinelPtr!char in place of char* for C functions that require null-terminated strings, i.e.

extern(C) size_t strlen(const(char)* str); // Current
extern(C) size_t strlen(cstring str);      // Proposed

// NOTE: cstring is an alias to SentinelPtr!(const(char))

I believe we could also make this work for C++ by telling the compiler to mangle the new SentinelPtr type as a raw pointer.

@marler8997 marler8997 force-pushed the sentinel branch 7 times, most recently from 64bd50e to 22c0a49 Compare March 2, 2018 06:41
@wilzbach
Copy link
Contributor

wilzbach commented Mar 2, 2018


edit: I didn't see the previous responses on my phone at that time.

@marler8997
Copy link
Contributor Author

The reason for adding this to druntime (instead of phobos [or dub]) is so that it could take advantage of these types in its C bindings. In cases where functions currently use pointers to accept sentinel arrays, they can be replaced with SentinelPtr to explicitly declare this requirement.

...This also allows templates to take advantage of a common type to know when an array has a sentinel value. This can result in selecting more optimized algorithms when they apply

@marler8997 marler8997 force-pushed the sentinel branch 2 times, most recently from 8640218 to 1c793d2 Compare March 2, 2018 07:36
@marler8997
Copy link
Contributor Author

marler8997 commented Mar 2, 2018

Also if this was integrated, at some point we would probably want to change the type of string literals from string to SentinelString. If this was the case, then these sentinel types would need to be "built-in" as much as the string type currently is (hence why I'm putting it in druntime for now).

Note this extra change to the "string literal" type may require a DIP, however, it can be done as an addition to adding the types to druntime first. However, I wouldn't go around changing all the types in druntime to use cstring/SentinelPtr unless the "string literal" type was changed first.

Also note that if the string literal type was changed to SentinelString then we would first need to have "multiple alias this" since that is the only way I can find to make SentinelString (which is SentinelArray!(immutable(char))) implicitly convertible to both SentinelArray!(const(char)) and immutable(char)[].

if the given array does not contain the sentinel value at `array.ptr[array.length]`.
*/
this(T[] array)
in { assert(array.ptr[array.length] == sentinelValue,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Using a contract/assert means this is only enforced in debug builds. This does not improve the safety of the program in terms of the @safe attribute.
  2. You're going out of bounds here. You can't assume that array.ptr[array.length] is supposed to be a sentinel, even if it happens to have the right value.

Same throughout.

Copy link
Contributor Author

@marler8997 marler8997 Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Using a contract/assert means this is only enforced in debug builds. This does not improve the safety of the program in terms of the @safe attribute.

I'm not sure, you could either go with assert or enforce. It's synonymous with array bounds checking, you may only want it turned on in debug mode. Maybe this check should be "tied into" the same setting, only enabled when array bounds checking is enabled.

  1. You're going out of bounds here. You can't assume that array.ptr[array.length] is supposed to be a sentinel, even if it happens to have the right value.

Right that's the point of this type. You can't go outside of the bounds of an array (or a "slice") unless you know the array also owns the elements you are checking. The point of SentinalArray is that you are explicitly declaring it "owns" the sentinel element at array.ptr[array.length].

This particular constructor you've commented on "coercing" a normal array to a SentinalArray, and whenever you convert from one type to another, it is up to the developer to make sure the conversion is valid, there's nothing the compiler can do about that. However, if SentinalArray is builtin, meaning that string literals will already be of type SentinalArray, then this coercion will only be used as much as normal casting (and should only be allowed in @system/@trusted code).

Note: after your comment I've added @system to these constructors to indicate the operation is not @safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's synonymous with array bounds checking, you may only want it turned on in debug mode.

Note that @safe needs bounds checking to work, which is why -release keeps bounds checking on in @safe code.

The point of SentinalArray is that you are explicitly declaring it "owns" the sentinel element at array.ptr[array.length].
[...]
it is up to the developer to make sure the conversion is valid

So getting anything @safe is not the plan here? It's just about adding sanity checks to @system code?

How about expecting the sentinel in arr[$ - 1] (or maybe anywhere in arr)? With that and with a run-time check that's always there, you could maybe make it impossible to construct an invalid SentinelArray. Then it could be @safe:

extern(C) cstring getenv(cstring name) @safe;
extern(C) int puts(cstring s) @safe;
    /* No idea if the char*s are really the only thing keeping these functions from being @safe. */

void main() @safe
{
    cstring p = getenv("PATH\0".asSentinelPtr);
    puts(p);
}

(And if "PATH" were already a SentinelArray as you're planning, it wouldn't need the ugly explicit terminator.)

Copy link
Contributor Author

@marler8997 marler8997 Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So getting anything @safe is not the plan here? It's just about adding sanity checks to @System code?

the grand plan would be to change the string literal type to SentinalString so you wouldn't have to "coerce" them. So it should all work in @safe code. The only code that would not be safe would be converting normal D arrays to SentinalArray since there's no way to know if an array owns the value array[$]. Actually I also added the StringLiteral template which is @trusted so you could also use that in @safe code, i.e.

puts(StringLiteral!"hello"); // safe

How about expecting the sentinel in arr[$ - 1] (or maybe anywhere in arr)? With that and with a run-time check that's always there, you could maybe make it impossible to construct an invalid SentinelArray. Then it could be @safe:

Good idea. This is an additional use case that could be used in @safe. I'll see if I can find a good way to add this in.

I know I already answered this, but your getenv example could currently be written as:

    cstring p = getenv(StringLiteral!"PATH");

Copy link
Contributor Author

@marler8997 marler8997 Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok here it is:

/**
This function converts an array to a SentinelArray.  It requires that the last element `array[$-1]`
be equal to the sentinel value. This differs from the function `asSentinelArray` which requires
the first value outside of the bounds of the array `array[$]` to be equal to the sentinel value.
This function does not require the array to "own" elements outside of its bounds.
*/
@property auto reduceToSentinelArray(T)(T[] array) @trusted
in {
    assert(array.length > 0);
    assert(array[$ - 1] == defaultSentinel!T);
   } do
{
    return asSentinelArrayUnchecked(array[0 .. $-1]);
}

///
@safe unittest
{
    auto s = "abc\0".reduceToSentinelArray;
    assert(s.length == 3);
    () @trusted {
        assert(s.ptr[s.length] == '\0');
    }();
}

@marler8997 marler8997 force-pushed the sentinel branch 10 times, most recently from 8eadec2 to 19033d8 Compare March 2, 2018 21:46
@marler8997 marler8997 force-pushed the sentinel branch 16 times, most recently from ff61fe0 to 91897bc Compare March 3, 2018 09:11
@@ -0,0 +1,109 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this file be part of the benchmark suite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't planning on keeping it in the finalized change. Where is the benchmark suite? Maybe it would make sense to add this to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In /benchmark

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(though that one is specifically about comparing different druntime versions, but I think just moving the file into this folder should be good enough as it will be a lot easier to find when it's checked in the source code.)

ConstPtr asConst;
}
alias asConst this; // facilitates implicit conversion to const type
// alias ptr this; // NEED MULTIPLE ALIAS THIS!!!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dlang/dmd#3998 though it will require quite an effort to revive this.

Returns:
the length of the array
*/
size_t findLength() const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In phobos this is called walkLength

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants