Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tighten leap second handling when parsing from RFC 3339 #406

Closed
mzabaluev opened this issue Dec 3, 2021 · 22 comments · Fixed by #412
Closed

Tighten leap second handling when parsing from RFC 3339 #406

mzabaluev opened this issue Dec 3, 2021 · 22 comments · Fixed by #412
Labels
A-parsing Area: parsing A-well-known-format-description Area: well known format descriptions C-enhancement Category: an enhancement with existing code

Comments

@mzabaluev
Copy link
Contributor

mzabaluev commented Dec 3, 2021

The current implementation of format_description::well_known::Rfc3339 allows second 60 to occur in any time, which is then rolled back to the last nanosecond of 59. As leap seconds are only really allowed at 23:59:60 UTC, this behaviour feels way too loose.

Fortunately, there is always the full parsed date for the Rfc3339 format parser to look at. So it could compute the UTC time and check if the hour is 23 and the minute is 59, otherwise it should reject the parsed time. It might make sense to also check if it's the last day of the month, though I don't know if the language in ITU-R TF.460-6 is considered normative by the IERS.

@mzabaluev
Copy link
Contributor Author

mzabaluev commented Dec 3, 2021

In a later breaking change an added API, I'd prefer the parsing rules to be restricted so that an RFC 3339 string with the second part of 60 is rejected with an error variant that would allow the application to perform its own recovery, perhaps with a method to retrieve the library-recommended date approximation.

@jhpratt
Copy link
Member

jhpratt commented Dec 4, 2021

This shouldn't be too difficult. The main thing to keep note of is that the time is parsed in the local time zone, while leap seconds are always xxxx-12-31 23:59:60Z.

With regard to rejecting the leap second, that will not happen, even in a breaking change. If it's possible to do something reasonable (in this case using the a value one nanosecond prior) then we will do so. Leap second support may be implemented in the future, mind you.

@jhpratt jhpratt added A-parsing Area: parsing A-well-known-format-description Area: well known format descriptions C-enhancement Category: an enhancement with existing code labels Dec 4, 2021
@mzabaluev
Copy link
Contributor Author

mzabaluev commented Dec 8, 2021

With regard to rejecting the leap second, that will not happen, even in a breaking change. If it's possible to do something reasonable (in this case using the a value one nanosecond prior) then we will do so.

The rollback may be up to a full second, as 23:59:60.999999999 is a valid leap second time. Depending on the application, the meaning of "reasonable" here may feel... stretched.

Leap second support may be implemented in the future, mind you.

I see. The desired behaviour depends on the application:

  • most will probably accept the current behaviour and not worry about this obscure issue;
  • some would expect a time realization with a leap smear and want to reject non-compliant timestamps;
  • others will want support for leap seconds in the data model in some way (not necessarily the one chrono thrusts onto its not always suspecting users).

@jhpratt
Copy link
Member

jhpratt commented Dec 8, 2021

Sure, it could be up to a full second. Either way, I'm not comfortable rejecting an otherwise valid moment just because it's a leap second. Every other restriction comes from either the specification or functional limits in the code — namely restricting the UTC offset to ±23:59:59 at most, which is not an issue in practice as no location has ever observed such an extreme offset to my knowledge. I've no problem limiting it to be on June 30 or December 31 at 23:59:60, as the spec requires values to be semantically valid. Anything further restriction is unlikely.

If leap seconds are "properly" supported in the future, algorithms that work around a leap second wouldn't be. That would include leap smear, which inherently distorts the objectively correct time. Barring international agreement on changing how leap seconds work (or eliminating them altogether), my ultimate goal is to acknowledge reality within the proleptic Gregorian calendar.

@mzabaluev
Copy link
Contributor Author

mzabaluev commented Dec 8, 2021

reality within the proleptic Gregorian calendar.

With computers on the internet, that's usually subject to local clock drift and some approximation of globally synchronized time usually achieved over NTP.

Regardless of international agreements, popular protocols already exist that require their time representation to use leap smear and avoid over-59 seconds. Time-rs implementing workarounds or a future IERS-compliant support for UTC, with no alternatives, is actually in disagreement with the protobuf specification.

I think Google, AWS, and others are trying to establish a de-facto workaround that would make time accounting friendlier to computing. Furthermore, their NTP servers already provide smeared time, so anyone relying on those is in on the workaround whether or not their time stack expects otherwise.

I think the above calls for a separate Rfc3339NoLeap format parser at least, to give the application choice on how to handle the matter.

@jhpratt
Copy link
Member

jhpratt commented Dec 8, 2021

Yes, there is clock drift and synchronization concerns, but that is an unavoidable reality that has nothing to do with this crate. I am taking the clock time on the user's machine as ground truth. NTP is not handled here, nor will it be in the future.

Leap seconds do exist, at least for the next few years until IERS meets with regard to its future (I wouldn't be surprised if they just delay again). Leap smear is not universally accepted, and even where it is used the form it takes varies. Once tzdb handling is implemented, I'm absolutely not going to make it so that adding one second doesn't actually add one second of clock time — imagine the bugs that could cause. Proper handling of leap seconds is possible and will eventually be implemented.

I am currently using a workaround that uses the closest value within the same calendar day (and hour and minute). The fact that certain protocols use leap smearing doesn't change the goal of this crate: to provide as accurate information as possible. Google's protocol using Google's leap smear isn't surprising. If dedicated protobuf handling is implemented in the future (unlikely), tzdb information can be used to leap smear then. Until that point, I'm providing the best information I can within the confines of reality, RFCs, and constraints of the time crate. I can't be in disagreement with a specification (provided by a company, not a standards body nonetheless) that I never claim to support.

@jhpratt
Copy link
Member

jhpratt commented Dec 8, 2021

If you truly want to handle it yourself, I can add a method to the Parsed struct that accepts well known formats. That is low-level and not recommended for the overwhelming majority of users, however.

@mzabaluev
Copy link
Contributor Author

mzabaluev commented Dec 8, 2021

I can't be in disagreement with a specification (provided by a company, not a standards body nonetheless) that I never claim to support.

Fortunately, there is no need to do any breaking changes to support the protobuf well-known types specification, other than adding a variant of the RFC 3339 format parser that would reject leap seconds. Even that is not even essential to the binary data format in protobuf, but less strict parsing of string inputs may result in unwanted ambiguity in mapping. Formatting and the internal data do not currently produce leap second times.

@mzabaluev
Copy link
Contributor Author

mzabaluev commented Dec 8, 2021

If you truly want to handle it yourself, I can add a method to the Parsed struct that accepts well known formats.

How about adding an Option member with the original subsecond part in Some, set in case when the rollback to 59th second has been applied to the existing second and subsecond values? This should be enough for advanced users and is backward-compatible.

@mzabaluev
Copy link
Contributor Author

I'll try to submit a PR with proposed changes some time later.

@jhpratt
Copy link
Member

jhpratt commented Dec 8, 2021

How about adding an Option member with the original subsecond part in Some, set in case when the rollback to 59th second has been applied to the existing second and subsecond values?

This would implicitly guarantee the current behavior, which is not desirable. I can investigate delaying the leap second normalization, but this would require some internal information that I'm not sure I want to store.

I'll try to submit a PR with proposed changes some time later.

Be aware that you are likely wasting your time. Adding another format for the sole purpose of rejecting a valid format will not be accepted. 23:59:60Z is a valid time and will always be accepted where the specification requires it. Duplicating all well-known formats would be a significant overhead and result in code and binary bloat, in addition to increased maintenance burden.

My goal is not to support every method of handling time, it's to support the correct method. Leap smear is not how time currently works despite what Google may want others to think.

@mzabaluev
Copy link
Contributor Author

This would implicitly guarantee the current behavior, which is not desirable.

A change of behavior to expose leap seconds without a rollback would break the current API, as the same input string will map to different time component values. So why not add an escape hatch for 0.3, which can be deprecated or removed when leap seconds are properly supported in the revised API?

@jhpratt
Copy link
Member

jhpratt commented Dec 9, 2021

How would it break the current API? The result of the parsing is currently inaccurate and will be so until tzdb handling is implemented. Making something more accurate isn't a breaking change. That's the exact reasoning I used to change the parser from using 59 to 59.999_999_999. Yes, the return value changed, but it's more accurate so it's fine.

@mzabaluev
Copy link
Contributor Author

The result of the parsing is currently inaccurate

Yes, and the API user is left completely unaware about this, with no recourse. This can make the library unsuitable for some applications, with real world examples I tried to point out above. No matter how we may dislike Google, Amazon, et al., the alternative where every computer system in the world has to independently follow the pronouncements of a committee with 6 months advance notice is no longer universally accepted.

That's the exact reasoning I used to change the parser from using 59 to 59.999_999_999

The behavior pre-change was faulty considering the intent, so that arguably was a bug fix.

@mzabaluev
Copy link
Contributor Author

The more strict parser would also have the benefit of making the map between OffsetDateTime values and RFC 3339 strings bijective, up to the equivalences in both domains.

@jhpratt
Copy link
Member

jhpratt commented Dec 9, 2021

I proposed a way that would let you handle it but received no direct answer. Please address this if you would like the ability to handle the leap second case manually. I am not willing to take on the maintenance burden of duplicating all well-known formats, especially when the specification requires succeeding on a valid value. The only time I will knowingly fail on an otherwise valid value is when the UTC offset is beyond ±24 hours, as no such time zone has ever existed. But that is not relevant here. Leap seconds are valid, so I parse them.

If Google, Amazon, etc. want to use leap smear, that's on them because they are the ones not following reality. They're lying about what time it actually is, whereas my end goal is to not do so. If I have to choose between following some big corporation who thinks they can change things unilaterally and an internationally accepted standard, I'm choosing the latter ten times out of ten. It has nothing to do with "dislike"; it's just that I prefer reality over things that are made up in the name of "simplicity". You can continue yelling into the void about how some large corporations think we should do it this way, but the simple reality is that's not how clocks currently operate.

To be perfectly clear, I am not and will not implement leap smear until it becomes an internationally recognized standard. Nor will I accept a PR that does so.

The behavior pre-change was faulty considering the intent, so that arguably was a bug fix.

My point exactly. Improving something is considered a bug fix because the previous behavior was not fully correct.


I feel like we're going in circles here, so take everything I say in this comment at face value. If you continue to push for something I've already rejected, I'll just ignore that point from here forwards.

@mzabaluev
Copy link
Contributor Author

I can add a method to the Parsed struct that accepts well known formats.

Without more detail, I can't see how that would help. It seems more complicated to me than addressing it in a specific well-known format parser.

@mzabaluev
Copy link
Contributor Author

mzabaluev commented Dec 9, 2021

To be perfectly clear, I am not and will not implement leap smear until it becomes an internationally recognized standard.

That's not what I'm proposing to do. I want to give the programmer a choice to reject leap second times if their application expressly forbids dealing with them. How they receive their time inputs should be completely outside the scope of the time library.

@jhpratt
Copy link
Member

jhpratt commented Dec 9, 2021

Without more detail, I can't see how that would help. It seems more complicated to me than addressing it in a specific well-known format parser.

It would let you handle it manually. The vast majority of users will not need to do this, which is why going through the Parsed struct is reasonable. Is it more verbose? Yes, but that's the price you pay for explicitly not following the spec.

That's not what I'm proposing to do. I want to give the user a choice to reject leap second times if their application expressly forbids dealing with them. How they receive their time inputs should be completely outside the scope of the time library.

Adding a method to Parsed would let you reject any value for any reason.

@mzabaluev
Copy link
Contributor Author

mzabaluev commented Dec 9, 2021

the simple reality is that's not how clocks currently operate.

Well, the way this crate currently approaches this reality when parsing RFC 3339 is "pretend all these leap second time inputs refer to a single nearby moment, because otherwise we'd end up like chrono." It can be twice as inaccurate as the NTP servers that implement a leap second smear. If there is a plan to fix it that would NOT force every system to update a database from a remote source at least every few months, I'd like to learn about it.

@jhpratt
Copy link
Member

jhpratt commented Dec 9, 2021

The current approach is the one that will be used until tzdb handling is implemented as it is the closest to reality within the constraints of the current data structures. I have considered the tradeoffs and came to the conclusion that this is the appropriate behavior.

I am not interested in discussing leap smear further as I have made clear. I don't even know why you're bringing it up given that you've said it's not what you want implemented. I will lock this thread if you insist on continuing discussion about leap smear. This issue is currently open solely to track more accurate validity checks.

@mzabaluev
Copy link
Contributor Author

#412 fixes the issue as reported. For further discussion, I suggest using #413.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parsing Area: parsing A-well-known-format-description Area: well known format descriptions C-enhancement Category: an enhancement with existing code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants