Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

figure what to do about find("string") #16269

Closed
StefanKarpinski opened this issue May 9, 2016 · 19 comments
Closed

figure what to do about find("string") #16269

StefanKarpinski opened this issue May 9, 2016 · 19 comments
Labels
domain:strings "Strings!"

Comments

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented May 9, 2016

See #16110 (diff). Summary, after merging #16110, we have find("abc") == 1:3. This has some issues. First, it finds the non-NUL indices of characters in the string argument, which is a weird and probably bad operation. Second, it conflicts with #16024, which I want to merge.

@nalimilan
Copy link
Member

The second point is actually a possible solution: just remove that test, and have find fail on strings just like it always did before the recent generalization to iterables. I find it quite appealing.

@StefanKarpinski
Copy link
Sponsor Member Author

In that case, why are strings the only iterable that we don't allow calling find on?

@rfourquet
Copy link
Member

I understood that it would fail only because a char won't be comparable to 0 (and find(::String) would not be special cased).

@StefanKarpinski
Copy link
Sponsor Member Author

Currently, it's a deprecation warning, not an error.

@rfourquet
Copy link
Member

A more explicit failure could be to compare an element x to zero(x) in find.

@nalimilan
Copy link
Member

nalimilan commented May 9, 2016

@rfourquet We discussed that at #16024 already. zero(::Char) isn't defined because it would be the additive identity, i.e. 0.

@nalimilan nalimilan reopened this May 9, 2016
@nalimilan
Copy link
Member

@StefanKarpinski It's a deprecation warning, and previously it didn't work at all. So nobody should use it, and when it will be removed we will be fine. Everything is perfectly consistent here, and so far nobody has expressed the need for find(::AbstractString), only for find(::Function, ::AbstractString).

@rfourquet
Copy link
Member

@nalimilan yes precisely, so that would be another way to rule out find(::String), for lack of zero method (but other than that, I don't know if using zero would be a good idea).

@StefanKarpinski
Copy link
Sponsor Member Author

My issue is that given the current definition of find this is the behavior that falls out. It's one thing to say that find only makes sense for AbstractArrays but if we're going to lift that restriction, why are strings the only thing it doesn't make sense for? Because we don't like the behavior that happens to fall out of the definition? That's arbitrary.

@nalimilan
Copy link
Member

Because find is based on x != 0, and that doesn't make sense for Chars?

@mbauman
Copy link
Sponsor Member

mbauman commented May 9, 2016

There's that problem, but I think the bigger trouble is that find is documented to return linear indices, but it happens to use iteration in its implementation. So it'll be wrong for any object that doesn't have matched iteration and indexing behaviors:

julia> s = "αω"
       idxs = find(s)
       s[idxs[end]]
ERROR: UnicodeError: invalid character index

@ararslan
Copy link
Member

ararslan commented May 9, 2016

Just thinking aloud here, but why is find defined to return the indices of numerically nonzero values? I get that it works well for locating the true indices in boolean arrays, but it seems to me that finding nonzeros should be a job for findnz or similar.

@ararslan
Copy link
Member

ararslan commented May 9, 2016

Also, maybe it's just me, but I find this incredibly bizarre:

julia> find(Any["0", :(0), 0.0, 0, false, '\0'])
1-element Array{Int64,1}:
 1

@pabloferz
Copy link
Contributor

pabloferz commented May 9, 2016

An option would be to change the definition of find as the valid indices of a non-linear iterable and the current find otherwise.

function Base.find(s::String)
    i = 0
    v = Int64[]
    while length(v) < length(s)
        j = nextind(s, i)
        push!(v, j)
        i = j
    end
    v
end

@nalimilan
Copy link
Member

nalimilan commented May 9, 2016

@mbauman @pabloferz Yeah, it could be more useful to go over indices and return these, with a fallback to linear indices. This would be in line with the functions @timholy is working on. This is related to the question of whether to return a CartesianIndex for LinearSlow arrays.

@ararslan That wouldn't be bizarre if you got an error for "0" and '\0'.Indeed, we could restrict find to booleans, and use findnz to find non-zero entries. In that case, it would be obvious that neither can work on strings.

@ararslan
Copy link
Member

ararslan commented May 9, 2016

Actually, as @rfourquet mentioned, comparing against zero(x) may not be such a bad idea if we're okay with find("string") failing. The result will be a MethodError since zero has no method matching zero(::Char). If you know you're looking for nonzeros when using find, I actually think that's a reasonably descriptive error.

@ararslan
Copy link
Member

ararslan commented May 9, 2016

@nalimilan Granted, find(Any["0", :(0), 0.0, 0, false, '\0']) is using the find(A) method, so the only way that "0" or '\0' would cause an error is if the NotComparableError thing happens.

Regardless, I would be in favor of restricting find to booleans and using findnz for nonzeros.

@ararslan
Copy link
Member

Hm. Using findnz like that would take some reworking though, since currently findnz is only defined for matrices and it returns a tuple rather than a simple array of indices, as find currently does.

@JeffBezanson
Copy link
Sponsor Member

Happily, subsumed by #23120.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:strings "Strings!"
Projects
None yet
Development

No branches or pull requests

8 participants