Skip to content

new_audit: mixed-content https upgradeability#3953

Merged
paulirish merged 15 commits intoGoogleChrome:masterfrom
christhompson:mixed-content
Jan 16, 2018
Merged

new_audit: mixed-content https upgradeability#3953
paulirish merged 15 commits intoGoogleChrome:masterfrom
christhompson:mixed-content

Conversation

@christhompson
Copy link
Contributor

@christhompson christhompson commented Nov 30, 2017

This introduces a new gatherer and audit that tests which requests a page makes can be directly upgraded to use HTTPS, and which fail when upgraded. The report gives a list of resources which were loaded insecurely in the default pass, their initiating document, and whether they are upgradeable. This implements the audit proposed in #3164.

It is currently implemented as a separate custom config, as it requires an additional pass in which to do the upgrading. The new pass uses the mixed-content gatherer, which intercepts each HTTP request, and sends an HTTP 302 Found redirect to upgrade it to HTTPS. The audit compares the network records for the default pass with the upgraded requests form the mixed-content pass.

I've also included some basic tests for the functionality of both the gatherer and the audit.

This can currently be run using the following invocation:
lighthouse-cli/index.js --chrome-flags="--headless" --config-path=./lighthouse-core/config/mixed-content.js http://www.bbc.com/

The gatherer relies on the Network request interception commands, which may require Canary at the moment since they're currently Experimental.

The audit report looks like this (the "Full URL" column uses Lighthouse's URL type, which reveals the full URL on hover):

audit-example

@christhompson
Copy link
Contributor Author

One note is that the audit currently has some potential false-negatives, mostly around requests that involve redirects, where it will say a resource is non-upgradeable when it may in fact be upgradeable to HTTPS.

These tend to be most prevalent in what I sometimes refer to as "second-order requests" -- requests made by a subresource loaded by the main page (often advertising libraries). One approach I have tried is to only check and report resources loaded by the main frame (where the initiator is the same as the URL loaded by the gatherer). It's a bit of a tradeoff of complete details about the resources requested and accuracy -- if anyone feels strongly about false-negatives, I can make that change.

@christhompson christhompson changed the title new-audit: mixed-content https upgradeability new_audit: mixed-content https upgradeability Nov 30, 2017
@adrifelt
Copy link

Two ideas about false negatives:

  1. You could put together a whitelist of common 3P inclusions with known HTTPS support and shortcut the check for those inclusions. We'd save the time for doing the second pass for that resource, and also would be able to avoid a false positive.

  2. We could change the UI so that we emphasize the "Yes" resources and never clearly say "No." The feature would be positioned as helping you find the things that are easy to switch and point out the things that are potentially problematic. For the "No" cases, we could say "Maybe," "Needs Investigation," or something smarter that a UX person could think of. :)

@estark37
Copy link

estark37 commented Dec 7, 2017

@christhompson is this waiting on review/advice from a Lighthouse person? @paulirish could you suggest someone to review? Thanks!

@christhompson
Copy link
Contributor Author

Yes, this should be ready for review.

Also pinging @vinamratasingal who may be able to give more feedback about the UI of the audit report.

Re: False negatives, the whitelist would be a nice addition, but there are still some cases that fall outside that (where the current implementation fails to detect a successful upgrade). We still need to load the entire page a second time, so I don't think we'd be able to short-circuit. I think the "Needs Investigation" change may be the best, especially since it frames the audit more positively overall (maybe combined with a tweak to the main audit explanation text).

@vinamratasingal-zz
Copy link

Hi, just got the chance to look at this. Some quick feedback (also apologies if this had been decided/discussed on a previous thread, I'm just jumping in):

  • My understanding that this was going to show up in the Best Practices section of the report. Is that still the plan or is the plan to put in its own section of the report?
  • It would be great if we could condense the top level description of the category and add some documentation explaining this. Usually we have documentation on developers.google.com for all the audits, not sure if this is something your team has already done/is planning on doing. Suggestion on wording for the description: "These audits check whether mixed content warnings are blocking your page to upgrade to HTTPS. [Learn more]"
  • Can we separate out the upgradable/insecure requests? I believe we actually show insecure requests (see screenshot below) and the specific URLs if the website is not on HTTPS, so perhaps we just list the requests that we know for sure are upgradable.
  • Is there a way that we can help users prioritize/organize the different URLs that we show (perhaps ones that are more important to upgrade than others?)
  • Also, is this audit only applicable for websites already supporting HTTPS? (i.e. if I as a dev haven't even begun migrating my site to HTTPS, this audit isn't really applicable since I need to do that before handling mixed-content warnings?)

screen shot 2017-12-08 at 12 24 39 pm

@christhompson
Copy link
Contributor Author

My understanding that this was going to show up in the Best Practices section of the report. Is that still the plan or is the plan to put in its own section of the report?

I'd be happy if we could include it in the Best Practices section. I'd thought it was not going to be included in the default config at the moment since it causes an extra page load pass.

It would be great if we could condense the top level description of the category and add some documentation explaining this. Usually we have documentation on developers.google.com for all the audits, not sure if this is something your team has already done/is planning on doing. Suggestion on wording for the description: "These audits check whether mixed content warnings are blocking your page to upgrade to HTTPS. [Learn more]"

I can reduce the text at the top of the audit. I am planning on writing documentation that can be linked to, but was unsure of the process/order for doing that.

Can we separate out the upgradable/insecure requests? I believe we actually show insecure requests (see screenshot below) and the specific URLs if the website is not on HTTPS, so perhaps we just list the requests that we know for sure are upgradable.

This may also help resolve the false-negatives issue. The audit could be framed as "there is no harm in upgrading these resources to HTTPS right now, to maximize your readiness to switch to HTTPS in the future".

Is there a way that we can help users prioritize/organize the different URLs that we show (perhaps ones that are more important to upgrade than others?)

I included the Initiator column to try to add a priority level of some sort -- things initiated by the main page are more under the control of the developer, and so they would be higher priority. Maybe it would make sense to only include the resources that were initiated by the main page, especially given the discussion on the previous bullet.

Also, is this audit only applicable for websites already supporting HTTPS? (i.e. if I as a dev haven't even begun migrating my site to HTTPS, this audit isn't really applicable since I need to do that before handling mixed-content warnings?)

This audit is intended for sites that have not yet migrated to HTTPS. The idea is that they can check whether they can just "flip the switch" to HTTPS by simply changing the relevant URLs to HTTPS, or if they need to contact specific third-parties about how to upgrade. If all URLs are upgradeable, then the developer knows they should have minimal problems when making the switch.

@vinamratasingal-zz
Copy link

re: Best Practices- ah, I got confused. No, the plan currently is only to expose it via the full config on the CLI, not via Best Practices.

re: documentation- we have a tech writer (Kayce B), and it would be good to reach out to him to get the documentation written.

re: changes- awesome. Do you mind uploading an updated screenshot so I can take another look?

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the delay here and thanks for the contribution!!

as you note, we'll keep this in a separate config and out of the default until there's a way to do it without adding an additional pass, but overall approach seems reasonable to me.

insecureUrls.forEach(record => {
const resource = {
host: new URL(record.url).hostname,
full: record.url,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: how about fullUrl for clarity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

const upgradePassSecureHosts = new Set();
upgradedRecords.forEach(record => {
upgradePassHosts.add(new URL(record.url).hostname);
if (MixedContent.isSecureRecord(record) && record.finished) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is .finished want we want without checking .failed?

from devtools it seems as though finished will be true even if the request failed, but perhaps in practice we observe something different? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I added another check for !record.failed.


const totalRecords = defaultRecords.length;
const score = 100 *
(secureRecords.length + 0.5 * upgradeableResources.length)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! I like this scoring rubric :)

* @return {string}
*/
const btoa = function(str) {
return Buffer.from(str, 'utf8').toString('base64');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be honest I find this line more readable than good ol' btoa and since we only use it once I'd vote to nix

at the least maybe we can say encodeAsBase64 or something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense. I've pulled this inline here.

return Buffer.from(str, 'base64').toString('utf8');
};

describe('btoa', () =>{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels a bit weird to test a function that gets exercised only in the tests, did you mean to expose the btoa from the gatherer file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I just inlined the b64 function in the other file, I did the same here and got rid of the test. In hindsight don't think it's a particularly useful path to test (and some of the other tests should break if something strange happens to it).

@christhompson
Copy link
Contributor Author

Thanks for the comments. I've updated the PR to address the comments so far.

I've simplified the audit report quite a bit by restricting it to list resources that are upgradeable that are initiated by the main page only. Here's what it looks like currently:

audit-example-2

Copy link
Member

@paulirish paulirish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you've made here looks great. Really like the original UX and the latest revised stuff.

Since this currently relies on interception, we are a little blocked on getting https://bugs.chromium.org/p/chromium/issues/detail?id=764505 sorted out. But since this isn't landing in the default config, we may have some options.

Really nice job putting this one together.

@vinamratasingal-zz
Copy link

This looks great, thanks for working through with us Chris! One last question before giving the LGTM on the UX:

  • Why are we talking about the number of insecure requests total? Should we just focus on the number of upgradable requests instead?

@christhompson
Copy link
Contributor Author

@vinamratasingal Good catch. With the latest changes I didn't really update the main string. The scoring of the audit will still cause failures on any insecure resources being loaded, but given the new output of the audit, I think something like the following would be good:

  • Failure text: "Some resources loaded are insecure: XX requests could be upgraded to HTTPS."
  • Passing text: "All resources loaded are secure."

I can push a quick update making that change.

@paulirish I was a little worried the Experimental nature of request interception would be a pain. I'd been testing the custom config using --headless so I guess I got lucky and never ran into the issue because of that. Let me know what you think the best path going forward is for adding the audit or working around this bug.

This updates the audit failure text to only reference the number
of upgradeable resources. It also simplifies how resources are
tracked now that we don't output details about non-upgradeable
resources. It also only counts upgradeable resources that were
initiated by the main page for the count and score, so that the
count of upgradeable resources in the text matches the number of
items in the audit report table.
@vinamratasingal-zz
Copy link

New strings LGTM. Mind publishing an updated screenshot so I can take a final pass and LGTM?

@christhompson
Copy link
Contributor Author

Here's a screenshot of the latest version of the report:

audit-example3

@vinamratasingal-zz
Copy link

LGTM

@christhompson
Copy link
Contributor Author

Pinging this thread. @paulirish As you mentioned, this currently only works with the Chrome headless flag. Would it still be possible to land this since it isn't included in any default config/category? This would then mostly be a complication to the documentation for how to run the audit.

@paulirish
Copy link
Member

@christhompson yup we can definitely land this. I was playing with this audit today and had a few ideas. Instead of leaving review comments I went ahead and made the changes: christhompson#2
WDYT?

Also now that the Table and URL renderers suck less, I don't think we need the extra Hostname column in the report. Does that work for you? Or would you rather keep it explicit so they're easier to scan. I could go either way.

image

If this all works, for you, lets land it. Then we can write docs on 1) how to run this audit and 2) what the audit is about.

Cheers and thanks for your patience on this one! really appreciate it.

@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this State. It's up to you to confirm consent of the commit author(s) and merge this pull request when appropriate.

@christhompson
Copy link
Contributor Author

I like the cleanliness of the new column style in your changes -- it also fixes the minor annoyance of the left/right aligned columns in the old table. Adding the is-on-http audit to the config is also a good idea -- thanks for adding that.

Copy link
Member

@paulirish paulirish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Thanks again for your patience on this PR. Apologies for the delay.

@paulirish paulirish merged commit 1df4565 into GoogleChrome:master Jan 16, 2018
@paulirish
Copy link
Member

paulirish commented Jan 16, 2018

Landed! 🎉 🎉


Some next steps:

  • readme docs on how to run this audit
  • audit docs for what this audit is about
  • an npm script to make running this easier? e.g. yarn run mixedcontent http://www.bbc.com

@vinamratasingal-zz
Copy link

vinamratasingal-zz commented Jan 16, 2018

Chris- are you going to take on the work of the docs? Or do we need help from @kaycebasques here? (Kayce does all of our tech docs on developers.google.com)

@christhompson
Copy link
Contributor Author

I have a draft of some docs for this -- I'd be happy to coordinate with Kayce to get them in proper shape.

@vinamratasingal-zz
Copy link

Sounds good- let me know if you need help with this. Thanks for your patience Chris! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants