Skip to content

[New DIP] add an 'in' operator for arrays#101

Closed
moon-chilled wants to merge 7 commits intodlang:masterfrom
moon-chilled:master
Closed

[New DIP] add an 'in' operator for arrays#101
moon-chilled wants to merge 7 commits intodlang:masterfrom
moon-chilled:master

Conversation

@moon-chilled
Copy link

No description provided.

@JackStouffer
Copy link
Contributor

JackStouffer commented Nov 22, 2017

I understand the utility with the syntax similarity with Python. However, I'm skeptical of any built-in operator doing anything but O(1) evaluations. I think it can encourage poor behavior.

In your example in the DIP, what should really be happening is a hash set (I know there isn't one in Phobos, but there are on DUB) should be used to store the elements and check for presence. Doing a linear search for every input is really costly. I know it's just an example, but IMO we shouldn't be making it more simple to do the wrong thing.

@moon-chilled
Copy link
Author

Redblacktrees have O(log(n)) in checks, and yet they have an in operator. And it's arguable that unless we want to add a hashset as a builtin feature (a la python), the simplicity gained by O(n) lookup in arrays is worth the performance hit.

@wilzbach
Copy link
Contributor

FYI Andrei approved the in operator for iota recently: dlang/phobos#5629

(You might want to mention this precedent in your DIP)

@JackStouffer
Copy link
Contributor

JackStouffer commented Nov 22, 2017

Difference with iota is that all in can be found quickly because all elements are known ahead of time. No search is required.

@rjframe
Copy link

rjframe commented Nov 23, 2017

This proposal should probably remove AAs as an example because the in operator works differently for AAs and RedBlackTrees (which the proposal seems to be based on).

  1. The in operator for arrays returns a pointer to the element, not true/false (which is what RedBlackTree returns).

  2. The AA searches the index, rather than the stored element (int[string] var; if ("somestring" in var) {}). For an array this is just if (3 < var.length) {}.

@moon-chilled
Copy link
Author

@wilzbach Good idea I'll add that later

@rjframe it's true that the behaviour is different, but the comparison is still relevant and should be made -- they're both basic language features, why does one let you check inside it but not the other?

@PetarKirov
Copy link
Member

PetarKirov commented Nov 23, 2017

The upper time complexity bound for elem in container is O(log(n)) - see the bottom of https://dlang.org/phobos/std_container.

If we were to go with allowing elem in array, the function should return a pointer to the position of the first matching element or null. For example:

int[] arr = [7, 1, 3, 2, 6, 2, 4];
int x = 2;

if (auto ptr = x in arr)
    // prints: "found: 2, at position: 3"
    writefln("found: %s, at position: %s", *ptr, ptr - arr.ptr); 
else
    assert (p is null); // not reached

assert (11 !in arr);

I'd in favor of adding the in operator to SortedRange, as an alias to contains though.

@moon-chilled
Copy link
Author

moon-chilled commented Nov 23, 2017

The problem with elem in array returning a pointer is that -- as in your example -- there could be more than one of the item, so the intent is kinda ambiguous. Perhaps it could return an array of pointers to those values? I don't really like that solution, but it could work.

@mdparker
Copy link
Member

mdparker commented Mar 7, 2018

@Elronnd Sorry for making you wait so long. Thanks for your patience!

With the new procedures approved, I'm starting up the DIP process again. This one is second in line, so if you are still willing to push it through the process (and I hope you are), I ask that you familiarize yourself with the new procedure document. I've got a blog post coming in a few days that will explain the motivation for the changes. I also ask that you make sure the document conforms to the new template. Just some minor structural changes.

We'll start making progress on your DIP as soon as I know what the status is of the one in front of yours.

@mdparker
Copy link
Member

@Elronnd You're next up! If you want to get started, I need to know soon, otherwise I'll have to move on to the next in line.

@moon-chilled
Copy link
Author

Hi, sorry for the delay, I've started working on the next version. It should be ready today or tomorrow. If it's more convenient to, then by all means, do the next one in line first.

@mdparker
Copy link
Member

Alright then. I'll come back and look at this again when you've got your new version ready. Thanks!

@mdparker
Copy link
Member

mdparker commented Aug 9, 2018

@Elronnd Is this ready to start moving forward?

@moon-chilled
Copy link
Author

Yep!

@andralex
Copy link
Member

In the interest of time - unless this DIP presents a fresh argument/angle and has broad support, it's quite unlikely it will get approved.

@moon-chilled
Copy link
Author

That sounds reasonable. I'm closing this now, and will probably make a new dip in the near future.


The `in` operator for arrays will take a needle, of type `T`, and a haystack,
of type `T[]`. If any of the elements of the haystack are equal to the needle,
then it will return true. Otherwise, it will return false. No new grammar
Copy link
Member

@PetarKirov PetarKirov Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the right signature would be:

inout(T)* opIn_impl(T)(inout(T)[] haystack, auto ref const(T) needle)
if (__traits(compiles, { bool ok = haystack[0] == needle; }))

This is would be consistent with testing membership in AAs and will allow easy use of the found element, if any.

Example implementation: https://run.dlang.io/is/elciqv

Edit: sorry for the noise, I didn't read the comments and forgot that I even reviewed this DIP :O

Copy link
Author

@moon-chilled moon-chilled Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that returning a pointer is a bad idea since array values aren't guaranteed to be unique, so if I do 5 in [5, 6, 5, 4], I get a pointer only to the first 5. A solution could be to return an array of pointers or (I prefer this) an array of indices. Both of those have the advantage that arrays convert implicitly to bool (which might be a bad thing but that's a separate discussion), meaning that if (5 in arr) will still work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's wrong with simply returning a pointer to the first occurrence, or returning an advanced slice, ala std.algorithm.find?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants