Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a bitstype just for IANA Variable TimeZones #335

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

oxinabox
Copy link
Contributor

@oxinabox oxinabox commented Apr 15, 2021

This is yet another PR relating to #271
It doesn't solve it totally,
It needs #327 to handling FixedTimeZones, and #332 to make the ZonedDateTime parametric.
But with those 3 it would solve everything.

It is most similar to #287 / #323
but with rather than a dynamically created lookup table as timezones are used, it has a static (but lazy) look up table.
This table always maps the same integer id, to the same timezone.
This means we don't need to worry about serialization.

Here is the C code for the perfect hashing that gperf generated.

Right now this code is kinda gross.
Especially around loading it.
We have 2 caches, and idk we probably only need 1.

src/types/ianatimezone.jl Outdated Show resolved Hide resolved
src/types/ianatimezone.jl Outdated Show resolved Hide resolved
@@ -46,7 +46,10 @@ function TimeZone(str::AbstractString, mask::Class=Class(:DEFAULT))
tz, class = get!(TIME_ZONE_CACHE, str) do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@omus input on the best way to rewrite this function would apprecated.

name(a::IANATimeZone) = name(backing_timezone(a))
transitions(tz::IANATimeZone) = transitions(backing_timezone(tz))

# TODO: should i just make this check the fields of VariableTimeZone and just delegate all?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be addressed

src/types/ianatimezone.jl Outdated Show resolved Hide resolved
13, 5, 43, 3, 355, 12, 168, 148, 90, 179,
4, 1, 315, 3, 3, 3, 2, 37, 5, 65,
22, 320, 245, 1, 1681, 1681, 1681,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should i rewrap this?
10 per line?
Up to the line limit?
1 per line?


transits = transitions(tz)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be premature optimization to pull this out to be done before.
Idea is to avoid relooking up the backing_timezone for IANATimeZone multiple times

@@ -163,3 +163,7 @@ function last_valid(local_dt::DateTime, tz::VariableTimeZone)
possible = interpret(local_dt, tz, Local)
return isempty(possible) ? first(shift_gap(local_dt, tz)) : last(possible)
end


first_valid(dt::DateTime, tz::IANATimeZone, args...) = _do_and_rewrap(first_valid, dt, tz, args...)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probnably _do_and_rewrap needs to be renamed

src/types/ianatimezone.jl Outdated Show resolved Hide resolved
src/types/ianatimezone.jl Outdated Show resolved Hide resolved
src/types/ianatimezone.jl Outdated Show resolved Hide resolved
src/types/ianatimezone.jl Outdated Show resolved Hide resolved

const IANA_TIMEZONES = Vector{VariableTimeZone}(undef, IANA_TABLE_SIZE)

# TODO: maybe fill this during build(), probably by generating a julia file.
Copy link
Contributor Author

@oxinabox oxinabox Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe it is fine, since it only queries what files exist, using timezone_names rather than reading them.
Maybe timezone_names (or __init__) should do the validation that TimeZones.jl has build properly and has the TZDATA.COMPILED_DIR.
Which would maybe clean up the TimeZone constructor a bit.

src/types/ianatimezone.jl Outdated Show resolved Hide resolved
Comment on lines 34 to 43
len >= 19 && (hval += asso_values[units[19]])
len >= 12 && (hval += asso_values[units[12]])
len >= 11 && (hval += asso_values[units[11]])
len >= 9 && (hval += asso_values[units[9] + 1])
len >= 8 && (hval += asso_values[units[8]])
len >= 6 && (hval += asso_values[units[6] + 1])
len >= 4 && (hval += asso_values[units[4]])
len >= 2 && (hval += asso_values[units[2] + 1])
len >= 1 && (hval += asso_values[units[1]])
len > 0 && (hval += asso_values[units[end]]) # add the last
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
len >= 19 && (hval += asso_values[units[19]])
len >= 12 && (hval += asso_values[units[12]])
len >= 11 && (hval += asso_values[units[11]])
len >= 9 && (hval += asso_values[units[9] + 1])
len >= 8 && (hval += asso_values[units[8]])
len >= 6 && (hval += asso_values[units[6] + 1])
len >= 4 && (hval += asso_values[units[4]])
len >= 2 && (hval += asso_values[units[2] + 1])
len >= 1 && (hval += asso_values[units[1]])
len > 0 && (hval += asso_values[units[end]]) # add the last
@inbounds begin
len >= 19 && (hval += asso_values[units[19]])
len >= 12 && (hval += asso_values[units[12]])
len >= 11 && (hval += asso_values[units[11]])
len >= 9 && (hval += asso_values[units[9] + 1])
len >= 8 && (hval += asso_values[units[8]])
len >= 6 && (hval += asso_values[units[6] + 1])
len >= 4 && (hval += asso_values[units[4]])
len >= 2 && (hval += asso_values[units[2] + 1])
len >= 1 && (hval += asso_values[units[1]])
len > 0 && (hval += asso_values[units[end]]) # add the last
end

This saves some time and simplifies the native code but relies on each code unit being in the range 1:125.

Might be worth it just to pad the array of asso_values to cover the domain of UInt8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This saves some time and simplifies the native code but relies on each code unit being in the range 1:125

I tried this and didn't notice any improvement.
How were you benchmarking it?

Might be worth it just to pad the array of asso_values to cover the domain of UInt8

I tried this and it made it way worse. I think because the tuple gets too lange for julia to be willing to do some optimizations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

julia> @btime for x in ["America/Winnipeg", "Etc/UTC", "Canada/Central"]
           perfect_hash(x)
       end
  71.438 ns (1 allocation: 112 bytes)

julia> @btime for x in ["America/Winnipeg", "Etc/UTC", "Canada/Central"]
           perfect_hash_inbounds(x)
       end
  66.137 ns (1 allocation: 112 bytes)

Copy link
Contributor Author

@oxinabox oxinabox Apr 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running @btime on the whole testset in test/types/ianatimezone.jl
With the benchmark

@btime for x in ("America/Winnipeg", "Etc/UTC", "Canada/Central")
                TimeZones.perfect_hash(x)
             end

Noting that i got rid of the vector in the benchmark code since that was dominating.
In julia 1.6

  • current code without the inbounds annotation|: 38.097 ns (0 allocations: 0 bytes)
  • with inbounds (not safe): 33.217 ns (0 allocations: 0 bytes)
  • with inbounds + make assoc_values bigger: 79.305 ns (0 allocations: 0 bytes)

Copy link
Contributor Author

@oxinabox oxinabox Apr 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • with inbounds and assoc to 255: 44.585 ns (0 allocations: 0 bytes)

growing assoc to 127 seems to help: fikkiwubg wutgh tgat:

  • with just assoc to 127 : 28.334 ns (0 allocations: 0 bytes)
  • with inbounds ((Not safe): 23.236 ns (0 allocations: 0 bytes)
  • with inbounds and checking based on bitmask: 47.035 ns (0 allocations: 0 bytes)
  • with inbounds and checking based in <: 36.359 ns (0 allocations: 0 bytes)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm i don't trust my benchmarks
Timing for 127 + checking based on <: sometimes is 36ns sometimes is 48ns.

I think we should do that though, and i will push that code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants