Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Remove find type assertion to allow other iterables #16110

Merged
merged 3 commits into from
May 4, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions base/array.jl
Original file line number Diff line number Diff line change
Expand Up @@ -777,7 +777,7 @@ function findprev(testf::Function, A, start::Integer)
end
findlast(testf::Function, A) = findprev(testf, A, length(A))

function find(testf::Function, A::AbstractArray)
function find(testf::Function, A)
# use a dynamic-length array to store the indexes, then copy to a non-padded
# array for the return
tmpI = Array(Int, 0)
Expand All @@ -791,9 +791,9 @@ function find(testf::Function, A::AbstractArray)
I
end

function find(A::AbstractArray)
function find(A)
nnzA = countnz(A)
I = similar(A, Int, nnzA)
I = Vector{Int}(nnzA)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this change the dimensionality of the output when the input is an AbstractArray ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think find always returns a vector: this is what similar did as a single dimension argument was passed. Anyway it doesn't make sense to keep the size of the input array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah these methods with size arguments to similar always make me scratch my head in terms of what they're for. I guess similar(::SparseVector, T, n) could return something other than Vector, but wouldn't make too much sense to use that for find.

count = 1
for (i,a) in enumerate(A)
if a != 0
Expand Down
8 changes: 8 additions & 0 deletions test/arrayops.jl
Original file line number Diff line number Diff line change
Expand Up @@ -362,6 +362,14 @@ a = [0,1,2,3,0,1,2,3]
@test findprev(isodd, [2,4,5,3,9,2,0], 7) == 5
@test findprev(isodd, [2,4,5,3,9,2,0], 2) == 0

# find with general iterables
s = "julia"
@test find(s) == [1,2,3,4,5]
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test conflicts directly with #16024. I also think that this is bizarre behavior:

julia> find("foo\0bar")
6-element Array{Int64,1}:
 1
 2
 3
 5
 6
 7

Sure, that's what falls out of the definition, but how is this a useful or meaningful operation?

Copy link
Sponsor Member

@StefanKarpinski StefanKarpinski May 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential solutions:

  • change a != 0 to !isequal(a, 0); won't find -0.0
  • change a != 0 to !(isequal(a, 0) | isequal(a, -0.0)); weird and maybe inefficient
  • change a != 0 to a != convert(eltype(A), 0)

The last retains the bizarre behavior for strings of finding the indices of all characters but '\0'; the former two would at least lose that weird behavior by definition, which is nice, I guess.

Copy link
Member Author

@ararslan ararslan May 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah crap, I didn't realize this was getting the non-null indices; I thought it was getting all indices. (Though in retrospect it should have been obvious that it would do this. 😕) I definitely should have added a test for that. This behavior isn't what I had intended.

Would it make sense to add a separate method for strings that just does something like collect(1:endof(s))? For things like Base.UTF8proc.GraphemeIterator it would still do the string-to-int comparison and return all indices. That would only become an issue if the NotComparableError thing happens.

What do you think about that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for either keeping the current behaviour, or adding a method just to raise an error. Do we have a use case for this (it didn't work until now...)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have a use case in mind for find("string"); I was primarily interested in opening up find(testf::Function, A) for strings. Returning all indices for a string makes sense to me though, at least more so than returning indices of non-null bytes. I suppose that isn't so bad, just kind of weird.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but returning all valid indices isn't the definition of find, so I don't think it's a good idea to use it for that. If needed, we could imagine using the new function from #16260 for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, find finds the indices of all numerically nonzero values; the only reason why it currently gets non-null indices for strings is that 0 == '\0' is true. If/when integer-character comparison goes away, it will fail for strings anyway. I'd be fine making find("string") a MethodError. What do you think, @StefanKarpinski?

@test find(c -> c == 'l', s) == [3]
g = graphemes("日本語")
@test find(g) == [1,2,3]
@test find(isascii, g) == Int[]

## findn ##

b = findn(ones(2,2,2,2))
Expand Down