new_audit(cache-headers): detects savings from leveraging caching by patrickhulce · Pull Request #3531 · GoogleChrome/lighthouse

patrickhulce · 2017-10-11T21:09:22Z

patrickhulce · 2017-10-11T21:11:05Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+// Ignore assets that have low likelihood for cache miss.
+const IGNORE_THRESHOLD_IN_PERCENT = 0.1;
+// Discount the wasted bytes by some multiplier to reflect that these savings are only for repeat visits.
+const WASTED_BYTES_DISCOUNT_MULTIPLIER = 0.1;


see the screenshot for why this was necessary (even at 1/10th leverage browser caching reports 2s of savings), 10% chance of repeat visit seems reasonable-ish?

patrickhulce · 2017-10-16T16:44:59Z

PTAL :)

paulirish

didnt look at tests yet. some comments so far.

paulirish · 2017-10-18T01:17:15Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+
+// Ignore assets that have low likelihood for cache miss.
+const IGNORE_THRESHOLD_IN_PERCENT = 0.1;
+// Discount the wasted bytes by some multiplier to reflect that these savings are only for repeat visits.


// As this savings is only for repeat visits, we discount the savings considerably.
// Basically we assume a 10% chance of repeat visit

paulirish · 2017-10-18T01:25:53Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+   * @param {number} maxAgeInSeconds
+   * @return {string}
+   */
+  static toDurationDisplay(maxAgeInSeconds) {


seems good if we move this stuff into lighthouse-core/report/v2/renderer/util.js yah?

paulirish · 2017-10-18T01:28:18Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+    const upperDecileIndex = RESOURCE_AGE_IN_HOURS_DECILES.findIndex(
+      decile => decile >= maxAgeInHours
+    );
+    if (upperDecileIndex === 11) return 1;


11 => RESOURCE_AGE_IN_HOURS_DECILES.length - 1 ?

paulirish · 2017-10-18T01:29:58Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+
+  /**
+   * Computes the user-specified cache lifetime, 0 if explicit no-cache policy is in effect, and null if not
+   * user-specified. See https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html.


can you nuke the period at the end here. it breaks autolinkers

paulirish · 2017-10-18T23:15:52Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+
+  /**
+   * Computes the percent likelihood that a return visit will be within the cache lifetime, based on
+   * Chrome UMA stats see the note above.


I know its been our policy to have consts at the top but in this case i think it hurts readability.. would prefer to have those relevant const just defined here inside of getCacheHitLikelihood.

doing that would avoid pingponging back and forth between the top of the file and down here when reading it.

wdyt?

yeah sure sg

guess you can drop "see the note above." now

paulirish · 2017-10-18T23:50:39Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+    }
+
+    if (headers.has('expires')) {
+      const expires = new Date(headers.get('expires')).getTime();


yay for standards that enable this parser to handle the http date format. \o/

paulirish · 2017-10-18T23:52:09Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+    return (
+      CACHEABLE_STATUS_CODES.has(record.statusCode) &&
+      STATIC_RESOURCE_TYPES.has(record._resourceType) &&
+      !resourceUrl.includes('?') &&


why excluding all of these? there was voodoo around these not being cached at the proxy level but i dont know of a reason a browser has different policy.

PSI does it based on the claim that resources with query strings are not heuristically cacheable, seems like a reasonable assumption if the asset has a query string and no explicit cache policy then don't cache it, people do weird changes on gets with query string all the time

Innnnteresting. I went and found the history of that line. (took a while).

It goes back to here: pagespeed/page-speed@4c4f031#diff-2478b085708a8d438d5057d0365f067fR384

It originally landed with " This is a debatable policy. " :)

I have a feeling like folk's use of query strings are different in 2017 than 2010, but who knows.

Can we leave some comments here that provide some context?
And perhaps a TODO to explore including these records and see what it tells us.

paulirish · 2017-10-18T23:52:36Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+      helpText:
+        'A well-defined cache policy can speed up repeat visits to your page. ' +
+        '[Learn more](https://developers.google.com/speed/docs/insights/LeverageBrowserCaching).',
+      description: 'Leverage browser caching',


Leverage browser caching for static assets

paulirish · 2017-10-18T23:53:56Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+    return artifacts.requestNetworkRecords(devtoolsLogs).then(records => {
+      const results = [];
+      for (const record of records) {
+        if (!CacheHeaders.isCacheableAsset(record)) continue;


are redirects just filtered out in that fn?

yup, only 200, 203, 206 allowed

paulirish · 2017-10-19T00:02:06Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+          headers.set(header.name, header.value);
+        }
+
+        // Ignore assets that have an etag since they will not be re-downloaded as long as they are valid.


i'm no etag expert but i am not consistently seeing this described behavior

for example: this (tooootally random) page... https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag the html resource has an etag but refreshing it (various ways) does refetch it and get a 200.

then again on http://music.com/ the logo.png has an etag and no other caching headers and it gets served from memory cache on reloads.

so perhaps the document is handled differently?

what does "as they are valid" mean? you mean "match the server's etag"?

this comment could probably afford to break into two lines

having an etag and the server actually acting on it and doing the right thing are different things :)

when the if-none-match: <etag> header is sent the server should reply with a 304 if it matches the resource it's planning on sending, perhaps file an issue for a separate audit that checks for improper server handling of etag'd assets?

clarified the comment a bit

patrickhulce · 2017-10-19T02:05:17Z

feedback addressed, labels switched PTAL :)

paulirish

the linearInterpolation is amazingly good. thank you.

I'm still not sold on showing the "Est. Likelihood of Cache Miss" column, but if we do, I think we should flip it to "Cache Hit".

If we omitted the column i think we'd do more in the helpText.. like:

A well-defined cache policy can speed up repeat visits to your page. Estimated Likelihood of Cache Hit is based off collected Chrome statistics, where the median request stays cached for 12 hours and the X is at Y hours. [Learn more]

Regardless we should just show this to some Lighthouse users and see how they interpret it.

paulirish · 2017-10-19T02:07:15Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+
+  /**
+   * Computes the percent likelihood that a return visit will be within the cache lifetime, based on
+   * Chrome UMA stats see the note above.


guess you can drop "see the note above." now

paulirish · 2017-10-19T02:08:18Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+    // are clearly diminishing returns to cache duration i.e. 6 months is not 2x better than 3 months.
+    // Based on UMA stats for HttpCache.StaleEntry.Validated.Age, see https://www.desmos.com/calculator/7v0qh1nzvh
+    // Example: a max-age of 12 hours already covers ~50% of cases, doubling to 24 hours covers ~10% more.
+    const RESOURCE_AGE_IN_HOURS_DECILES = [0, 0.2, 1, 3, 8, 12, 24, 48, 72, 168, 8760, Infinity];


how about adding this guy

console.assert(RESOURCE_AGE_IN_HOURS_DECILES.length === 10, 'deci means 10, yo')

will need https://eslint.org/docs/rules/no-console#options to allow assert

done with require('assert')

paulirish · 2017-10-19T02:13:48Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+      if (cacheControl['no-cache'] || cacheControl['no-store']) return 0;
+      if (Number.isFinite(cacheControl['max-age'])) return Math.max(cacheControl['max-age'], 0);
+    } else if ((headers.get('pragma') || '').includes('no-cache')) {
+      // Pragma can disable caching if cache-control is not set, see https://tools.ietf.org/html/rfc7234#section-5.4


// The HTTP/1.0 Pragma header can disable caching if cache-control is not set, see tools.ietf.org/html/rfc7234#section-5.4

just want to make it clear this shit is from decades ago. :)

brendankenny

The cache hit distribution looks over population of all page loads of all sites, which includes sites visited frequently and sites visited rarely and all sort of other visit patterns, which means that the overall distribution of cache hits might not actually resemble the distribution of cache hits for any particular page. Since the goal is to get more caching out there that's not necessarily a problem because the cache hit rate function is strictly increasing, but it does mean the effect size will often be out of step with what happens on the specific test site (we're comparing their assets' cache lengths to what is essentially the distribution of asset age from the mean website revisit)

We'd also be giving implicit ideal cache lengths (at least if the user sets a goal of reducing the red line). "Good" if cache length is at least 7 days, "Average" if cache length is at least 12 hours, and "Poor" for somewhere below that. If we had to write that in the audit description, I'm not sure if we'd want to commit to those numbers?

Just to be clear, I don't know a better way to present the data here :) It's just that the cache hit probabilities end up being very specific claims when we really just want to provide general guidance motivated with real data.

brendankenny · 2017-10-19T22:19:05Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+  /**
+   * @return {number}
+   */
+  static get PROBABILITY_OF_RETURN_VISIT() {


does this need a getter? (e.g. vs IGNORE_THRESHOLD_IN_PERCENT)

makes the tests easier than copy pasting

makes the tests easier than copy pasting

but it's not used in a test? :)

is too 😛

lighthouse/lighthouse-core/test/audits/byte-efficiency/cache-headers-test.js

Line 29 in 4c661fd

const DISCOUNT_MULTIPLIER = CacheHeadersAudit.PROBABILITY_OF_RETURN_VISIT;

brendankenny · 2017-10-19T22:32:36Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+    // Based on UMA stats for HttpCache.StaleEntry.Validated.Age, see https://www.desmos.com/calculator/7v0qh1nzvh
+    // Example: a max-age of 12 hours already covers ~50% of cases, doubling to 24 hours covers ~10% more.
+    const RESOURCE_AGE_IN_HOURS_DECILES = [0, 0.2, 1, 3, 8, 12, 24, 48, 72, 168, 8760, Infinity];
+    assert.ok(RESOURCE_AGE_IN_HOURS_DECILES.length === 12, '1 for each decile, 1 on each boundary');


if it's 0th, 10th, 20th, ...100th percentiles, shouldn't it be 11?

Looks like Infinity might be for a past the upper bound check (boundary somehow past the 100th percentile :), but could also replace upperDecileIndex === RESOURCE_AGE_IN_HOURS_DECILES.length - 1 check with upperDecileIndex === -1

Looks like Infinity might be for a past the upper bound check (boundary somehow past the 100th percentile

Yeah because 1 year+ is all basically 100th but doing linear interpolation with Infinity isn't quite fair :) ~~I guess it doesn't really matter that much though if we ignore 90% and up, how about I replace with Number.MAX_VALUE and nuke the check?~~ I take it back let's keep this here and just halve the IGNORE_PROBABILITY value so up to 6 months is flagged

brendankenny · 2017-10-19T22:38:23Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+    if (upperDecileIndex === 0) return 0;
+
+    // Use the two closest decile points as control points
+    const upperDecile = RESOURCE_AGE_IN_HOURS_DECILES[upperDecileIndex];


nit: maybe switch these names? upperDecile/lowerDecile would refer to 0, 0.1, 0.2, etc while upperDecileValue or whatever would be the entry in the array

brendankenny · 2017-10-19T22:47:40Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+   *  3. It does not have a query string.
+   *
+   * Ignoring assets with a query string is debatable, PSI considered them non-cacheable with a similar
+   * caveat. Consider experimenting with this requirement to see what changes. See discussion


it might be worth counting the assets that pass the other cacheable requirements but have a query string. Possible with an HTTP Archive query but a pain; a lot easier to get $.audits.cache-headers.extendedInfo.value.queryStringCount or whatever :)

discussed more with paul, and we'll just include query string assets for now, count sgtm

brendankenny · 2017-10-19T22:53:45Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+        cacheLifetimeInSeconds = cacheLifetimeInSeconds || 0;
+
+        let cacheHitProbability = CacheHeaders.getCacheHitProbability(cacheLifetimeInSeconds);
+        if (cacheHitProbability >= IGNORE_THRESHOLD_IN_PERCENT) continue;


does this mean Lighthouse is implicitly saying that the "correct" cache length is 7 days since that's the only way to bring this down to 0?

essentially, it's saying any cache policy >= 7 days has 90% of the benefit so we won't flag it

since they're impact on wastedMs is so low though I'm game to put in the table as long as we're showing cache hit likelihood 92% or whatever :)

brendankenny · 2017-10-19T22:59:48Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+    // This array contains the hand wavy distribution of the age of a resource in hours at the time of
+    // cache hit at 0th, 10th, 20th, 30th, etc percentiles. This is used to compute `wastedMs` since there
+    // are clearly diminishing returns to cache duration i.e. 6 months is not 2x better than 3 months.
+    // Based on UMA stats for HttpCache.StaleEntry.Validated.Age, see https://www.desmos.com/calculator/7v0qh1nzvh


do we need to look at other cache entry stats too? It seems like this is only stale entries (so biases toward later next visits as non-stale entries would just be loaded and not log here?) and only for assets that qualify for 304 checks

brendankenny · 2017-10-19T23:07:32Z

lighthouse-core/report/v2/renderer/util.js

+
+    for (const unitLabel of unitLabels) {
+      const label = /** @type {string} */ (unitLabel[0]);
+      const unit = /** @type {number} */ (unitLabel[1]);


nit: it seems like overkill to do the type casting instead of just two parallel arrays and a regular for loop or an array of objects and use Object.keys() (Object.entries can't come soon enough) or any number of other approaches :)

🔪 🐑 -> 🍖 🎁 -> 🔱

done

brendankenny · 2017-10-19T23:13:35Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+  static get meta() {
+    return {
+      category: 'Caching',
+      name: 'cache-headers',


bikeshedding on name? It's not just 'cache-headers' but also a judgement of them. asset-cache-length? asset-caching-ttl?

hm if we go by consistency with the other byte efficiency audits they basically fall into either <noun of thing being detected> or uses-<best practice we're encouraging>

how about...

uncached-assets
low-cache-ttl
uses-caching
uses-cache-headers
uses-long-cache-ttl
?

uses-long-cache-ttl certainly isn't exactly catchy but describes it well :) I like that since it's not just use, it's (if they're used) that they're long

paulirish · 2017-10-20T00:41:56Z

It's just that the cache hit probabilities end up being very specific claims when we really just want to provide general guidance motivated with real data.

Spoke with brendan about some of this just now. One argument that seems reasonable to me is...

What cache lengths do we recommend? IMO if you completely control the URL your resource is accessed by, then you can always afford 1yr TTL. If you don't (because you're metrics.biz/analytics.js) then you can only commit to X hours TTL. And so if there's only two real cases which get unique TTLs then we shouldn't overcomplicate.

patrickhulce · 2017-10-30T23:29:47Z

friendly bump on this :)

patrickhulce · 2017-11-02T21:24:28Z

🏏 ...🏏 ...

brendankenny

Since @kdzwinel added header support to the smokehouse server, maybe add a cache header test to the byte efficiency smoke test? :):)

I'm not sure what to do with the cache hit rate. With recent and upcoming work to make our opportunities better reflect reality, the platonic reality these come from doesn't seem particularly useful for any particular site beyond just "you should have longer caching"

brendankenny · 2017-11-02T22:25:25Z

lighthouse-core/audits/byte-efficiency/byte-efficiency-audit.js

          wastedKb,
          results,
-        },
+        }, result.extendedInfo),


maybe add a comment that this merges in any extendedInfo provided by the derived audit?

brendankenny · 2017-11-02T22:26:07Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+  /**
+   * @return {number}
+   */
+  static get PROBABILITY_OF_RETURN_VISIT() {


makes the tests easier than copy pasting

but it's not used in a test? :)

brendankenny · 2017-11-02T22:28:16Z

lighthouse-core/audits/byte-efficiency/cache-headers.js

+  static get meta() {
+    return {
+      category: 'Caching',
+      name: 'cache-headers',


uses-long-cache-ttl certainly isn't exactly catchy but describes it well :) I like that since it's not just use, it's (if they're used) that they're long

patrickhulce · 2017-11-02T23:32:38Z

Since @kdzwinel added header support to the smokehouse server, maybe add a cache header test to the byte efficiency smoke test? :):)

hasn't actually landed yet, but when it does sg :)

I'm not sure what to do with the cache hit rate. With recent and upcoming work to make our opportunities better reflect reality, the platonic reality these come from doesn't seem particularly useful for any particular site beyond just "you should have longer caching"

It feels like "Repeat visit" is another target where we'll surface savings and this audit will target that. I'd like to ideally follow the same course of action here I'm pushing on other audits which is "let's not agonize over the savings we surface right now since we have it as a high priority item to redo it all" :)

patrickhulce · 2017-11-07T23:22:59Z

@brendankenny do you still have requested changes?

patrickhulce · 2017-11-11T00:25:06Z

please 🙏

:)

brendankenny · 2017-11-11T02:59:26Z

The PR looks good to me other than the cache miss column, and nothing has really changed in the discussion since the comment 15 days ago :) Maybe we can talk about this in the Monday meeting so we can work out a consensus on moving forward.

"let's not agonize over the savings we surface right now since we have it as a high priority item to redo it all"

To me this is just an argument for leaving it out now and adding it when we've figured it out :) Why add something that's going to be vague or wrong in the near term and removed in the long term?

A middle ground could maybe be drop the cache hit column but still have the overall savings, and maybe call out that it's specifically for a repeat visitor at x days later, whatever x works out to.

patrickhulce · 2017-11-15T18:14:05Z

I will update with just a score, not the estimated time savings.

patrickhulce · 2017-11-17T18:07:19Z

score changes done, PTAL :)

brendankenny

LGTM :)

patrickhulce commented Oct 11, 2017

View reviewed changes

paulirish changed the title ~~new-audit(cache-headers): detects savings from leveraging caching~~ new_audit(cache-headers): detects savings from leveraging caching Oct 11, 2017

patrickhulce added 3 commits October 13, 2017 09:19

new-audit(cache-headers): detects savings from leveraging caching

488ffdd

add comment

dae25d2

add test

5b59b98

patrickhulce force-pushed the cache_audit branch from b8fd5f7 to 5b59b98 Compare October 13, 2017 16:19

add cache control to extendedInfo

bad00ba

patrickhulce requested a review from paulirish October 13, 2017 16:55

paulirish modified the milestones: Sprint Uno: Oct 2-13, Sprint Dos: Oct 16-27 Oct 16, 2017

paulirish added the waiting4review label Oct 17, 2017

paulirish requested changes Oct 19, 2017

View reviewed changes

paulirish added waiting4committer and removed waiting4reviewer labels Oct 19, 2017

feedback

a291703

patrickhulce requested review from brendankenny and vinamratasingal-zz as code owners October 19, 2017 02:03

patrickhulce added waiting4reviewer and removed waiting4committer labels Oct 19, 2017

paulirish reviewed Oct 19, 2017

View reviewed changes

paulirish added waiting4committer and removed waiting4reviewer labels Oct 19, 2017

more feedback

e5c17ca

patrickhulce added waiting4reviewer and removed waiting4committer labels Oct 19, 2017

brendankenny suggested changes Oct 20, 2017

View reviewed changes

paulirish added this to the Sprint Tres: Oct 30 - Nov 12 milestone Oct 30, 2017

update test

4c661fd

paulirish approved these changes Nov 2, 2017

View reviewed changes

brendankenny reviewed Nov 2, 2017

View reviewed changes

patrickhulce added 2 commits November 2, 2017 16:18

rename to uses-long-cache-ttl

00f5875

Merge branch 'master' into cache_audit

998ccd7

vinamratasingal-zz modified the milestones: Sprint Tres: Oct 30 - Nov 12, Sprint Quatro: November 13-26 Nov 13, 2017

patrickhulce added waiting4committer and removed waiting4reviewer labels Nov 15, 2017

patrickhulce added 3 commits November 15, 2017 13:25

Merge branch 'master' into cache_audit

ec5b727

updates to remove estimated savings

3ab3474

revert extendedInfo changes

bde3329

patrickhulce added waiting4reviewer and removed waiting4committer labels Nov 17, 2017

paulirish assigned brendankenny Nov 17, 2017

brendankenny approved these changes Nov 17, 2017

View reviewed changes

brendankenny merged commit 32ed80c into master Nov 17, 2017

brendankenny deleted the cache_audit branch November 17, 2017 23:42

ghost mentioned this pull request Dec 17, 2017

Update lighthouse in / from 2.5.0 to 2.7.0 chauncey-garrett/dotfiles#57

Open

paulirish removed the waiting4reviewer label Mar 6, 2018

paulirish mentioned this pull request May 15, 2020

core: remove uses of deprecated extendedInfo field #10779

Merged

Conversation

patrickhulce commented Oct 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickhulce commented Oct 16, 2017

Uh oh!

paulirish left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickhulce commented Oct 19, 2017

Uh oh!

paulirish left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brendankenny left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickhulce commented Oct 11, 2017 •

edited

Loading

brendankenny left a comment •

edited

Loading

brendankenny Nov 2, 2017 •

edited

Loading

patrickhulce Oct 20, 2017 •

edited

Loading