Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for Not with multiple positional indices #3302

Merged
merged 12 commits into from
Apr 4, 2023
Merged

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Mar 14, 2023

No description provided.

@bkamins bkamins requested a review from nalimilan March 14, 2023 22:28
@bkamins bkamins added this to the 1.6 milestone Mar 14, 2023
@bkamins bkamins mentioned this pull request Mar 17, 2023
NEWS.md Outdated Show resolved Hide resolved
@bkamins
Copy link
Member Author

bkamins commented Mar 29, 2023

@nalimilan - when doing a final check before merging I realized that:

select(df, Not(1, 1])

will error, while

select(df, Not(:a, :a])

will work. The reason is that former is translated to Not([1, 1]) which is disallowed (duplicate column selected), while the latter to Not(Cols(:a, :a)) which is OK, as Cols allows duplicates (and deduplicates them).

I have documented the difference, but I do not fully like that we have such a corner case when we have the difference (however, it seems that it is best what we can have so I have not proposed a change as I do not see any improvement here). What do you think?

@nalimilan
Copy link
Member

Ah. Can't we avoid using Cols implicitly?

@bkamins
Copy link
Member Author

bkamins commented Apr 2, 2023

I think it is the opposite. We would want to use Cols, but when integers are passed we cannot. However, now I see that what we can (and should) do is to run unique in case of integers. The reason is that Not-negation drops duplicates. I will push an update to the PR soon.

The current behavior is:

julia> using DataFrames

julia> df = DataFrame(x=1:3, y=2:4, z=3:5)
3×3 DataFrame
 Row │ x      y      z
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      2      3
   2 │     2      3      4
   3 │     3      4      5

julia> df[Not(2, 2), :]
2×3 DataFrame
 Row │ x      y      z
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      2      3
   2 │     3      4      5

julia> df[:, Not(2, 2)]
ERROR: ArgumentError: Elements of [2, 2] must be unique

and it is inconsistent. df[:, Not(2, 2)] should just drop second column and all is good then.

src/other/index.jl Outdated Show resolved Hide resolved
docs/src/lib/indexing.md Outdated Show resolved Hide resolved
src/other/index.jl Show resolved Hide resolved
@bkamins bkamins merged commit a36bbbf into main Apr 4, 2023
@bkamins bkamins deleted the bk/flexible_not branch April 4, 2023 10:32
@bkamins
Copy link
Member Author

bkamins commented Apr 4, 2023

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants