careful selection of hitpass vs hitmiss #2865

dridi · 2018-12-11T11:27:02Z

Loosely related to #2864.

In all cases, I'm talking about vcl_backend_response here.

With the (now somewhat old) switch from hitpass to hitmiss when resp.uncacheable is set and the reintroduction of hitpass via return(pass) we have witnessed an increase in objects. It is of course a known behavior, but we think we could do better.

First, there's a significant difference between return(pass) and return(pass(DURATION)). The former keeps the current state of beresp.{ttl,grace,keep} and can lead to #2864 while the latter uses the DURATION as the TTL and disables grace and keep altogether.

Another difference is that specifying a duration for a hitpass produces an HFP TTL log record, but a plain return(pass) doesn't. I think it should, and in both cases we probably don't want grace or keep.

Below is the current built-in VCL:

sub vcl_backend_response {
    if (bereq.uncacheable) {
        return (deliver);
    } else if (beresp.ttl <= 0s ||
      beresp.http.Set-Cookie ||
      beresp.http.Surrogate-control ~ "no-store" ||
      (!beresp.http.Surrogate-Control &&
        beresp.http.Cache-Control ~ "no-cache|no-store|private") ||
      beresp.http.Vary == "*") {
        # Mark as "Hit-For-Miss" for the next 2 minutes
        set beresp.ttl = 120s;
        set beresp.uncacheable = true;
    }
    return (deliver);
}

Using hitmiss for all cases leads to an increased number of objects during normal operations. The body of such objects is properly cleaned up once delivered to the client (like a pass or a hitpass) but that's still a problem for backend-trusting setups. In particular, hit_misses pile up.

The suggestion is to only use hitmiss for its purpose: enable opportunistic fetches on the assumption that the current transaction is not cacheable but similar transactions should be. In most cases, HTTP-compliant backends are telling us that an object should not be cacheable. In fact the only case that should lead to a hitmiss is when a client does not show up with a cookie and the backend decides to assign one while serving a cacheable resource.

Suggested built-in:

sub vcl_backend_response {
    if (bereq.uncacheable) {
        return (deliver);
    }

    if (beresp.ttl <= 0s ||
      beresp.http.Surrogate-control ~ "no-store" ||
      beresp.http.Cache-Control ~ "no-cache|no-store" ||
      beresp.http.Vary == "*") {
        # Mark as "Hit-For-Pass" for the next 2 minutes
        return (pass(120s));
    }

    if (beresp.http.Set-Cookie) {
        # Mark as "Hit-For-Miss" for the next 2 minutes
        set beresp.ttl = 120s;
        set beresp.uncacheable = true;
        set beresp.storage = storage.Transient;
        return (deliver);
    }

    if (beresp.http.Cache-Control ~ "private") {
        # Mark as "Hit-For-Pass" for the next 2 minutes
        return (pass(120s));
    }

    return (deliver);
}

You may notice that I also changed the Surrogate-Control logic because even if we aren't told as an edge server not to store, we should still look at the Cache-Control to figure whether the response is cacheable.

A hitpass also adds the opportunity to later fetch something cacheable, but that's only true when the underlying resource changes altogether. A well behaving backend, when setting a cookie on a cacheable response should say that it is private (cacheable only by the original client) and in the absence of no-cache or no-store directives, only then can we assume a response is hitmiss material.

This is a problem for setups with well-behaving backends that properly drive the cache policy directly from response headers.

The text was updated successfully, but these errors were encountered:

dridi · 2018-12-11T11:28:58Z

I edited the VCL to add a missing return(deliver), sorry.

dridi · 2018-12-11T14:08:01Z

Another point I forgot to convey is that we ran into situations where we don't want a hitmiss to land in Transient. Instead of doing this in core code, it could be part of built-in VCL.

I will edit the description accordingly.

Reading the Edge Architecture Specification I'm also wondering whether we should support it in the built-in VCL too. It can be used to drive a different TTL+grace for Varnish that is different than the client's TTL or convey ESI support. And since we now version the VCL syntax, another open question for such a change would be to version the built-in too.

karptonite · 2018-12-20T14:58:31Z

This is a bit off topic, but your change to the Surrogate-Control logic also solves another issue. As you say, the presence of a Surrogate-Control header doesn't invalidate Cache-Control: no-cache|no-store; I'm using Surrogate-Control: content="ESI/1.0" to indicate that ESI should be processed, and that header was preventing the Cache-Control header from doing its thing before I added some custom VCL.

slimhazard · 2019-01-08T09:47:22Z

+-0 due to mixed experience with hitpass and hitmiss. Really unsure about the right solution.

I too have been alarmed at the large number of objects created by hitpass. In a production project we tried it when hitmiss became possible, but went back to hitpass due to the memory consumption. Especially since the memory usage seemed to keep increasing, so that it wasn't obvious if it could be kept within bounds set for a -s allocator. In essence, we used custom VCL to go back to something similar to what the patch proposes (which can be taken as evidence that @dridi has the right idea).

OTOH hitmiss was introduced because hitpass was notoriously hard to understand, and many users ran into problems with objects that became uncacheable and there was nothing you can do about it until the hitpass-TTL expires. For many users it seemed to be an inexplicable bug in Varnish -- maybe they got an explanation on varnish-misc or #varnish, but there's no telling how many users came away thinking that Varnish is broken.

The patch returns hitpass with a 2-minute TTL for most cases -- it becomes more of the rule rather than the exception. So it would bring us back to trying to explaining why objects become uncacheable for two minutes, no matter what Cache-Control says.

I guess it has a lot to do with whether @dridi is right about this:

In fact the only case that should lead to a hitmiss is when a client does not show up with a cookie and the backend decides to assign one while serving a cacheable resource.

Or maybe there's a way to reduce the number of objects created by hitmiss?

BTW, the use of Transient sort of works around the problem of memory consumption, because Transient is unbounded by default. But in a setup where you use -s Transient to set bounds, so as to prevent running out of memory if Transient gets excessively large, you can still hit the limits.

bsdphk · 2019-01-21T12:32:31Z

We clearly need to think more about this. Will bring back at a near future bugwash.

nigoroll · 2019-02-04T12:54:32Z

notes on bugwash:

I fail to see the issue this ticket hinges on, explained by @dridi as a high number of HFM objects if hitmiss objects are inserted concurrently. We should understand if/how this is different from high number of objects with hit-for-miss #2754
we should change hfm and hfp such that grace and keep are always zero
we should use a low hfm ttl in builtin.vcl / our examples, because a long ttl has only minor benefits (basically the hfm ttl is just the interval for which we prevent coalescing)

bsdphk · 2019-02-04T13:26:15Z

@dridi moves this to VIP (with #2864) until actionable consensus.

bsdphk · 2019-03-04T11:47:52Z

VIP24: https://github.com/varnishcache/varnish-cache/wiki/VIP-24%3A-Hitpass-turning-into-hitmiss-after-ttl

dridi added b=enhancement r=trunk labels Dec 11, 2018

dridi added c=varnishd a=need bugwash labels Dec 11, 2018

dridi mentioned this issue Feb 4, 2019

Surrogate-Control handling in built-in vcl_backend_response #2893

Closed

bsdphk added a=Move To VIP and removed a=need bugwash labels Feb 4, 2019

bsdphk assigned dridi Feb 4, 2019

bsdphk removed b=enhancement c=varnishd r=trunk labels Feb 4, 2019

bsdphk closed this as completed Mar 4, 2019

dridi mentioned this issue Oct 17, 2019

hitpass turning into hitmiss after ttl #2864

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

careful selection of hitpass vs hitmiss #2865

careful selection of hitpass vs hitmiss #2865

dridi commented Dec 11, 2018 •

edited

Loading

dridi commented Dec 11, 2018

dridi commented Dec 11, 2018

karptonite commented Dec 20, 2018

slimhazard commented Jan 8, 2019

bsdphk commented Jan 21, 2019

nigoroll commented Feb 4, 2019

bsdphk commented Feb 4, 2019

bsdphk commented Mar 4, 2019

careful selection of hitpass vs hitmiss #2865

careful selection of hitpass vs hitmiss #2865

Comments

dridi commented Dec 11, 2018 • edited Loading

dridi commented Dec 11, 2018

dridi commented Dec 11, 2018

karptonite commented Dec 20, 2018

slimhazard commented Jan 8, 2019

bsdphk commented Jan 21, 2019

nigoroll commented Feb 4, 2019

bsdphk commented Feb 4, 2019

bsdphk commented Mar 4, 2019

dridi commented Dec 11, 2018 •

edited

Loading