Skip to content

Add operator kwarg to Cols#58

Merged
bkamins merged 9 commits intomainfrom
bkamins-patch-2
Dec 16, 2022
Merged

Add operator kwarg to Cols#58
bkamins merged 9 commits intomainfrom
bkamins-patch-2

Conversation

@bkamins
Copy link
Member

@bkamins bkamins commented Dec 4, 2022

@codecov
Copy link

codecov bot commented Dec 4, 2022

Codecov Report

Base: 96.36% // Head: 96.36% // No change to project coverage 👍

Coverage data is based on head (6fc6ae4) compared to base (69313ee).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #58   +/-   ##
=======================================
  Coverage   96.36%   96.36%           
=======================================
  Files           1        1           
  Lines          55       55           
=======================================
  Hits           53       53           
  Misses          2        2           
Impacted Files Coverage Δ
src/DataAPI.jl 96.36% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@bkamins bkamins requested a review from nalimilan December 4, 2022 22:25
@bkamins
Copy link
Member Author

bkamins commented Dec 4, 2022

CC @jar @pdeffebach @yjunechoe

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@adienes
Copy link

adienes commented Dec 11, 2022

I think we discussed this in the previous issue but I forgot---could you remind me why we use keywords here rather than a function? Then it would have Cols(...; operation=any) and Cols(...; operation=all) using built in functions any and all, and could use user-defined predicates.

@bkamins
Copy link
Member Author

bkamins commented Dec 11, 2022

We have not discussed it previously. Actually what you propose is a good idea. But any and all are not correct functions to be passed. We can say that union function is the default and intersect is a function to pass if one wants intersection. However, potentially user could pass any function operating on sets and returning a set as a result.

Then the definition in DataFrames.jl would be:

@inline Base.getindex(x::AbstractIndex, idx::Cols) =
    isempty(idx.cols) ? Int[] : idx.operation(getindex.(Ref(x), idx.cols)...)

It would be more error prone than the current implementation (potentially leading to crazy error stack traces) but more flexible indeed.
@nalimilan - what do you think?

@adienes
Copy link

adienes commented Dec 11, 2022

Makes sense. Then we'd also get setdiff and symdiff for free

@jariji
Copy link

jariji commented Dec 11, 2022

Yep, I agree with adienes.

@nalimilan
Copy link
Member

Yeah why not. Maybe we need a more specific name than "operation" then for the argument. Again the same discussion as the one about "combine" in unstack? :-)

@bkamins
Copy link
Member Author

bkamins commented Dec 11, 2022

AFAICT this fits well. In mathematics union and intersect etc. are typically called "operations", c.f. e.g. https://en.wikipedia.org/wiki/Set_(mathematics)#Basic_operations

@bkamins
Copy link
Member Author

bkamins commented Dec 11, 2022

I have updated the PR.

@adienes
Copy link

adienes commented Dec 12, 2022

Perhaps operator is very slightly a better choice than operation for some very tenuous reasons:

  1. often the arguments in Cols will be indicator/characteristic functions of sets, and a function that acts on other functions is often called an operator
  2. to me it scans slightly more clearly that what follows operator= should be a verb (i.e. a function)
  3. it is 1 character shorter :)

The wikipedia page for the Algebra of Sets uses both terms interchangeably, so I'm not sure which is more standard.

@bkamins
Copy link
Member Author

bkamins commented Dec 12, 2022

I checked several references, and the border between operation and operator is thin. Operation (mathematics) gives a most clear distinction:

An operator is similar to an operation in that it refers to the symbol or the process used to denote the operation, hence their point of view is different. For instance, one often speaks of "the operation of addition" or "the addition operation", when focusing on the operands and result, but one switches to "addition operator" (rarely "operator of addition"), when focusing on the process

Given this explanation indeed operator seems better, but I am not sure.

@adienes
Copy link

adienes commented Dec 12, 2022

especially when the symbols and are used instead of intersect and union then operator definitely seems like the better word there

@bkamins bkamins changed the title Add operation kwarg to Cols Add operator kwarg to Cols Dec 12, 2022
@bkamins
Copy link
Member Author

bkamins commented Dec 12, 2022

changed

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@bkamins bkamins merged commit 7fa794b into main Dec 16, 2022
@bkamins bkamins deleted the bkamins-patch-2 branch December 16, 2022 12:29
@bkamins
Copy link
Member Author

bkamins commented Dec 16, 2022

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants