-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need expiry time / time to live #440
Comments
+1 I think this is something we should address before the interim.
Looking back to -00, we've never had TTLs in the draft, perhaps there were pre-adoption versions that had it? We do have issue #249, perhaps this can be closed as a duplicate? |
I think this is a dupe of #249, but most comments in #249 are before the current forwarding preference Object model, so we should close one of the two issues, but I don't have a strong opinion on which. @fluffy Can you clarify why this is a significant implementation blocker? That might motivate us to ensure we create something that's at least good enough for now. Some questions
Possible Use cases:
|
Great questions. Agree this is a dup of #249 but so much stuff there I think it might be easier to collect up requirements on a clean issue. I'll vaguely use TTL but read that as just some sort of time, could be absolute or delta. More on that later. For interactive like webex, the TTL tends to be very low - think numbers in the range of a few hundreds to a few thousand milliseconds. We are sending the P frames with a lower TTL than the I frames as that can significantly reduce cache sizes without much impact on user experience. We are sending video with group per track and audio with datagrams. Clearly many other case would have much higher TTL but just talking about this case. Requirements: Can represent TTL at about at least 10 ms resolution. An immutable property of the object set by the end publisher / origin. The relays SHOULD cache it for at least the time in the TTL. We would expect to pay a CDN for storage based on this time as well as bandwidth. Clearly there are failure cases and other operational cases where it would be dropped but we would expect the CDN to provide some sort of reasonable SLA on not dropping before that time. As far as the speck goes, I think it should say that "for relays that cache" SHOUL keep for at least the time in TTL. The relay SHOULD NOT forward it after the TTL has expired. It is not longer useful at this point and we don't want the bandwidth used by data that will be thrown out by the client because it is too old. On the topic of this is a TTL style delta or an absolute expiry time. It would be nice to have a solution where the client did not need NTP synchronized time but we could easily live with a requiring synchronized time if that was the best direction. The end publisher would put in a time when it created the object. If we go with the delta style solution, the publisher might put in 500 ms. Each relay that received this object would look at local time it was received and expire it 500 ms after that. People talk about decrementing the delta when it gets sent to other relays but that gets complicated and does not add much value. If we do deltas, I think the delta should just be relative to the time the relay received the object and delta is not changed when the object is sent downstream. Note if we go with absolute time, I imagine clients would get a the current time not from the operating system but from the relay. If we go with absolute time, I think it would be client adds an absolute time stamp of when to expire and we assume the clients and all the relays are NPT synced. I prefer the time delta approach because it will be smaller on the wire and does not require synchronized time. This is starting to end up an implementation blocker because we want to move our existing pre moq stuff we did with quicr over to match the moq spec. But there is no way we can transition without some sort of ways to send a TTL to the relays. |
Actually with live video streaming (such as sports), the TTL would need to equal the DVR Window duration, i.e the time window over which users are able to access any part of the stream. For a 4hr American football match for example, you may want a 5hr TTL window to include the pre-game show and the whole game, all while the player has only a 12s forward buffer.
I agree with this interpretation. Relay caching is a performance optimization enacted by a relay operator. It should be decoupled from the core pub/sub behaviors . A relay is still a valid relay if it never caches and instead retrieves everything from upstream.
This is a departure from HTTP semantics and while I'm not against it, we need to be careful with inferred behavior. In HTTP land, an object is always available until the origin returns a 404, irrespective of the TTL signaled in any cache header. For moqt, we would be overloading TTL to convey both desired cache duration and object availability. Is there ever a use-case where we would want to differentiate the two? I can think of one. A sports provider has a 4-hour sports game. Most users will be at the live edge, but during the live broadcast users can skip back and watch prior highlights. The distributor has to pay the relay CDN for cache space. They want to cache the live edge and the highlights for performance reasons, but not the entirety of the 4-hr event. So they may set a 5min cache TTL (figuring that they will receive repeated requests for live edge and highlight content within a 5 minute window) but a 4 hour availability window. If we use TTL to signal both, we can't do that. |
Thanks for the comments. I believe the reason to require a cache no longer serve an Object is policy/legal/etc oriented, since the content of Objects doesn't change? It feels like there could be a better mechanism to remove content than a TTL, so maybe TTL is better if it's restricted to being a suggested amount of time to cache the content? |
This doesn't feel like an Object property, because if you played a live stream live you might get one TTL, and if you played it the next day, I would expect the TTL to be different or not even present? As such, would it be OK to add a TTL field to SUBSCRIBE_OK? |
Individual Comment: If the TTL is relative to when the relay receives it, then in Will's example, the cache may expire some content, but on a later request, it can go upstream again and it will either exist or not at that time. If it exists, then it gets refreshed, if not, it's non-existence is cacheable. Do we want/need an optimization that will allow the relay to transition object status from "normal" to "expired/permanently gone" without another trip upstream? Just typing this makes me think we should take a long look at the HTTP caching functions (eg: revalidation) and pick the ones we think make sense for moq. If moq is successful, CDNs will want to adapt HTTP caches to serve these objects, or possibly serve the same objects over moq and HTTP, so we should align where it makes sense. |
Updating from conversation from Will today. We have the use case of a where on a 2 hour sports event, the DVR window is 30 minutes and can not scrub back more than 30 minutes from live edge. This leads to wanting to be able to say "relays don't send this object more than 30 minutes after it was produced" We also have for real time cases "don't send this object more than 500 ms after it was received by relay" |
I wonder if it would make sense to make TTL expressed in groups, instead of wall time? E.g. TTL=1 means "stop sending the previous group as soon as the beginning of the new group arrives", TTL=2 means "keep the current group and the right before it", and vice versa. This has an advantage of working with variable-size groups, and also compresses much better. |
That idea makes sense with Stream per Group, but I'm not sure how it applies to the other two Object encodings. |
Based on discussion last week and on slack, I believe this is where the WG is heading. It is possible this could be a per-track or per-subscription value to save bytes on the wire, particularly for the Object per Stream or Object per Datagram modes. Fixes #440
I guess the general observation is that the delivery deadline mechanism should be probably mapping-specific? If you're doing stream-per-group/object, you'd want to be resetting streams, whereas with stream-per-track you don't really have that option. |
I feel making cache duration as timeline based delivery deadline is somewhat confusing. Should I, as publisher , need a given object beyond a certain point in time to be stored in my relay network, in order to meet my application requirements , is one way of thinking about it. This makes is independent of transport mapping or mechanisms. |
The timeouts in this PR are defined to to ensure that the invariant properties implied by the object delivery preference are maintained, e.g. if stream-per-group is used, it is impossible to accidentally timeout an object in the middle of the group. Fixes moq-wg#440
I think once we get the basic mechanism down, then we need to look at if / how interacts with stream reset |
A tweaked version of #448 Fixes #440 Fixes #249 Closes #450 Closes #448 I'll note that we might want to add params to TRACK_STATUS and allow this to be returned from that as well, which would allow a relay to keep it in its cache. Attempts to write down where we ended up at the end of the June 12th interim.
At some point we removed the time to live of how long obhects exist in the cache before they are removed.
It is becoming an significant implementation blocker to not have some version of this. I would like to add some version of this back soon even if we later change the exact details of how times are calculated and represented.
The text was updated successfully, but these errors were encountered: