Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make searchsorted*/findnext/findprev return values of keytype #32978

Merged
merged 13 commits into from
Apr 28, 2020

Conversation

sostock
Copy link
Contributor

@sostock sostock commented Aug 20, 2019

Fixes #32568.

  • The returntype of searchsorted* is now always the keytype of the container (and does not depend on the type of the value being searched).
  • The returntype of findnext/findprev is now always the keytype of the container or Nothing (and does not depend on the type and value of the start argument).

Even though strings don’t have a keytype, I changed the respective findnext/findprev methods as well. They now always return values of type Union{Int,Nothing} or Union{UnitRange{Int},Nothing}.

Copy link
Sponsor Member

@mbauman mbauman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great PR, thank you! It looks like we need keytype defined a bit earlier in the bootstrapping order… but I don't fully understand why because to my eyes it looks like it should have already been defined at the point it errors.

@@ -1385,7 +1385,7 @@ end
function findnext(B::BitArray, start::Integer)
start > 0 || throw(BoundsError(B, start))
start > length(B) && return nothing
unsafe_bitfindnext(B.chunks, start)
unsafe_bitfindnext(B.chunks, Int(start))
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsafe_bitfindnext (and prev, too) accepts a start::Integer. I was about to make a comment that we should just move this there and/or restrict its signature, but then I realized it's also used by BitSet. BitSet demands an Int64 result, whereas its use for BitArray demands an Int result. So this seems like the right way to go about it. 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it changed, but my impression is that BitSet calls unsafe_bitfindnext by feeding it an Int, not an Int64, so I would also favor restricting unsafe_bitfindnext to Int input, as this is an internal function. But the PR is also fine as is!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now restricted unsafe_bitfindnext to Int.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sorry for the nitpick, but please change also unsafe_bitfindprev accordingly too :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was fast!

base/sort.jl Outdated Show resolved Hide resolved
@sostock
Copy link
Contributor Author

sostock commented Aug 20, 2019

I also encountered the bootstrap error during development, although I managed to build Julia successfully in the end by applying my changes incrementally. I tried to debug it (following the documentation), but I couldn’t even get the build process running in gdb.

Due to the bootstrap issue, I used oftype(l, start) instead of convert(keytype(A), start) in base/array.jl. Once the bootstrap issue is resolved, this could be changed back (although it probably doesn’t make a difference).

@sostock
Copy link
Contributor Author

sostock commented Feb 3, 2020

I just tried to debug this again (following the documentation), but it does not seem to be working. Unfortunately, I have no experience with gdb, so I don’t know what could be done about this.

GNU gdb (GDB) 8.3.1
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
/usr/lib/../share/gcc-9.2.0/python/libstdcxx/v6/xmethods.py:731: SyntaxWarning: list indices must be integers or slices, not str; perhaps you missed a comma?
  refcounts = ['_M_refcount']['_M_pi']
ERROR: unknown option `--build`
[Inferior 1 (process 5168) exited with code 01]
../contrib/debug_bootstrap.gdb:3: Error in sourced command file:
No stack.
(gdb)

@rfourquet rfourquet added kind:bugfix This change fixes an existing bug domain:collections Data structures holding multiple items, e.g. sets labels Apr 21, 2020
@rfourquet
Copy link
Member

@sostock thanks for this great PR! I believe you can fix the compilation issue by adding keytype to the list of imported functions in "sort.jl" (in the list starting with using .Base: copymutable, ... at the top). At least I managed to build your branch this way, was there another issue?

@sostock
Copy link
Contributor Author

sostock commented Apr 21, 2020

Thanks, @rfourquet! Importing keytype works for me as well. I think there are no other issues with this PR.

Copy link
Member

@rfourquet rfourquet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo the seemingly unnecessary change to searchsortedfirst testing for x isa Unsigned.

@@ -1385,7 +1385,7 @@ end
function findnext(B::BitArray, start::Integer)
start > 0 || throw(BoundsError(B, start))
start > length(B) && return nothing
unsafe_bitfindnext(B.chunks, start)
unsafe_bitfindnext(B.chunks, Int(start))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it changed, but my impression is that BitSet calls unsafe_bitfindnext by feeding it an Int, not an Int64, so I would also favor restricting unsafe_bitfindnext to Int input, as this is an internal function. But the PR is also fine as is!

base/bitarray.jl Outdated
@@ -1411,8 +1411,9 @@ function findnextnot(B::BitArray, start::Integer)
l = length(Bc)
l == 0 && return nothing

chunk_start = _div64(start-1)+1
within_chunk_start = _mod64(start-1)
st = Int(start)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, note that for this kind of thing, there is no problem writing start = Int(start), which saves you from having to find another name :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then I will change it.

base/sort.jl Outdated
@@ -285,14 +285,15 @@ function searchsortedfirst(a::AbstractRange{<:Integer}, x::Real, o::DirectOrderi
lastindex(a) + 1
else
if o isa ForwardOrdering
-fld(floor(Integer, -x) + first(a), h) + 1
y = isa(x, Unsigned) ? floor(-Signed(x)) : floor(Integer, -x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When x isa Unsigned, this method looks like it won't be called, but rather the next method below (I guess this split didn't exist when you put up this PR).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the method below is used for x::Unsigned. I have restored the original, except for using Signed(first(a)) instead of first(a) (which is needed for passing the tests added in this PR).

@@ -130,7 +130,7 @@ function findnext(testf::Function, s::AbstractString, i::Integer)
@inbounds i == z || isvalid(s, i) || string_index_err(s, i)
for (j, d) in pairs(SubString(s, i))
if testf(d)
return i + j - 1
return Int(i + j - 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a conversion is needed somewhere in this function, my personal preference would favor converting at the begining (i = Int(i), which might help having to compile less function specializations, like isvalid or SubString, but this also might not matter). But keep it as you prefer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have changed it.

@@ -285,14 +285,14 @@ function searchsortedfirst(a::AbstractRange{<:Integer}, x::Real, o::DirectOrderi
lastindex(a) + 1
else
if o isa ForwardOrdering
-fld(floor(Integer, -x) + first(a), h) + 1
-fld(floor(Integer, -x) + Signed(first(a)), h) + 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the Signed here is for the case where x is not an Integer ? Just by curiosity, without this conversion here, the type conversion attached to the return value of the function (::keytype(a)) is not enough?

Copy link
Contributor Author

@sostock sostock Apr 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is for the case when x is Signed and first(a) is Unsigned. For example, if floor(Integer, -x) == Int64(-5) and first(a) == UInt64(1), adding them yields the huge number 0xfffffffffffffffc (the negative Int64 gets promoted to UInt64). The result of the complete line is then a UInt64 that is too large to convert to keytype(a).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right I was off as Signed is not applied to x, and ok, the ::keytype(a) fails then, thanks.

@rfourquet rfourquet merged commit 6248170 into JuliaLang:master Apr 28, 2020
mbauman added a commit that referenced this pull request Apr 28, 2020
* origin/master: (833 commits)
  Improve typesubtract for tuples (#35600)
  Make searchsorted*/findnext/findprev return values of keytype (#32978)
  fix buggy rand(RandomDevice(), Bool) (#35590)
  remove `Ref` allocation on task switch (#35606)
  Revert "partr: fix multiqueue resorting stability" (#35589)
  exclude types with free variables from `type_morespecific` (#35555)
  fix small typo in NEWS.md (#35611)
  enable inline allocation of structs with pointers (#34126)
  SparseArrays: Speed up right-division by diagonal matrices (#35533)
  Allow hashing 1D OffsetArrays
  NEWS item for introspection macros (#35594)
  Special case empty covec-diagonal-vec product (#35557)
  [GCChecker] fix a few tests by looking through casts
  Use norm instead of abs in generic lu factorization (#34575)
  [GCChecker,NFC] run clang-format -style=llvm
  [GCChecker] fix tests and add Makefile
  Add introspection macros support for dot syntax (#35522)
  Specialize `union` for `OneTo` (#35577)
  add pop!(vector, idx, [default]) (#35513)
  bump Pkg version (#35584)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:collections Data structures holding multiple items, e.g. sets kind:bugfix This change fixes an existing bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Returntype of searchsorted* should always be the keytype
3 participants