new_audit(anchor-text-audit): descriptive anchor text audit#3490
new_audit(anchor-text-audit): descriptive anchor text audit#3490paulirish merged 7 commits intoGoogleChrome:masterfrom
Conversation
| url.origin == pageHost.origin && | ||
| url.pathname == pageHost.pathname && | ||
| url.search == pageHost.search | ||
| ) { |
There was a problem hiding this comment.
we decided to skip all links that do not link to another page (e.g. '#')
| url.protocol.toLowerCase() === 'javascript:' || | ||
| (url.origin == pageHost.origin && | ||
| url.pathname == pageHost.pathname && | ||
| url.search == pageHost.search) |
There was a problem hiding this comment.
we are skipping links that point to the same website (e.g. '#' or 'javascript:void(0)')
There was a problem hiding this comment.
we should have a helper method for the non-javascript case
lighthouse/lighthouse-core/lib/url-shim.js
Lines 116 to 129 in 05030ad
does that work for you?
There was a problem hiding this comment.
Include this as an inline comment in the code if it's purpose might be unclear.
There was a problem hiding this comment.
@patrickhulce that works perfectly!
@rviscomi with equalWithExcludedFragments this will probably won't need more explanation. I'll skip the comment.
patrickhulce
left a comment
There was a problem hiding this comment.
nice! easy win here 🎉 🔍 💣
| 'start', | ||
| 'right here', | ||
| 'more', | ||
| 'learn more', |
There was a problem hiding this comment.
😱 our report would fail like crazy!
There was a problem hiding this comment.
The help text for this audit has a "learn more" link 😆
(as bad of an example as it sets, in our defense this tool will not be crawled by search bots)
| return false; | ||
| } | ||
|
|
||
| return BLACKLIST.includes(anchor.text.trim()); |
There was a problem hiding this comment.
need to toLowerCase this too? maybe add a test case to catch it too :)
There was a problem hiding this comment.
👍 Good point, fixed.
|
|
||
| return BLACKLIST.includes(anchor.text.trim()); | ||
| }) | ||
| .map(anchor => ({ |
There was a problem hiding this comment.
does the anchor have extra data you're trying to trim out here? if not, let's just drop since the prop names are the same
There was a problem hiding this comment.
based on the return value of the gather, looks like they only have these two props... so agree we can drop the .map().
There was a problem hiding this comment.
Correct, these are the only two props. Good catch!
| value: failingAnchors, | ||
| }, | ||
| details, | ||
| displayValue: failingAnchors.length ? `${failingAnchors.length} anchors found` : undefined, |
There was a problem hiding this comment.
we usually add the qualifier here for display value, how about 2 non-descriptive anchors found?
There was a problem hiding this comment.
oh, that's fine then 👍 ignore me :)
|
|
||
| const Audit = require('../audit'); | ||
| const URL = require('../../lib/url-shim'); | ||
| const BLACKLIST = [ |
There was a problem hiding this comment.
not sure it makes a big difference with such a small list but how about a Set since we just basically want .has
There was a problem hiding this comment.
Good point! We will be checking 100s of links and these will be mostly misses against the blocklist, so it turns out it can actually make a difference: https://github.com/GoogleChrome/lighthouse/pull/3490/files#r143292224
| url.protocol.toLowerCase() === 'javascript:' || | ||
| (url.origin == pageHost.origin && | ||
| url.pathname == pageHost.pathname && | ||
| url.search == pageHost.search) |
There was a problem hiding this comment.
Include this as an inline comment in the code if it's purpose might be unclear.
| url.protocol.toLowerCase() === 'javascript:' || | ||
| (url.origin == pageHost.origin && | ||
| url.pathname == pageHost.pathname && | ||
| url.search == pageHost.search) |
There was a problem hiding this comment.
Do we need to compare the search properties? #foo and #bar are the same web page all else being equal. Unless the page is using ajaxy #! URLs :-\
| return false; | ||
| } | ||
|
|
||
| return BLACKLIST.includes(anchor.text.trim()); |
There was a problem hiding this comment.
Performance nit: maybe make BLACKLIST a Set and use Set.has()? The blacklist may grow, and this is inside a loop with potentially many iterations, so it could be significant. Ok to leave it as is if you don't feel it'll be significant.
There was a problem hiding this comment.
i'd leave it alone. we have bigger fish to fry. :)
There was a problem hiding this comment.
I thought it's probably nothing but then I checked… and damn, Array.includes is slow:

This is for 100 elements in array/set and 1000 checks. Bottom two results are probably most important in our case.
https://jsperf.com/set-array-access
| value: failingAnchors, | ||
| }, | ||
| details, | ||
| displayValue: failingAnchors.length ? `${failingAnchors.length} anchors found` : undefined, |
There was a problem hiding this comment.
If the length is 1, we should say "1 anchor found".
| @@ -0,0 +1,82 @@ | |||
| /** | |||
| * @license Copyright 2016 Google Inc. All Rights Reserved. | |||
|
|
||
| const Audit = require('../audit'); | ||
| const URL = require('../../lib/url-shim'); | ||
| const BLACKLIST = [ |
There was a problem hiding this comment.
We'll be slowly doing a s/Whitelist/Safelist/g and s/Blacklist/Blocklist/g, but i'd love to get a headstart here. :)
There was a problem hiding this comment.
CS lingo is the worst. Thanks for pointing that out.
| return false; | ||
| } | ||
|
|
||
| return BLACKLIST.includes(anchor.text.trim()); |
| return false; | ||
| } | ||
|
|
||
| return BLACKLIST.includes(anchor.text.trim()); |
There was a problem hiding this comment.
i'd leave it alone. we have bigger fish to fry. :)
|
|
||
| return BLACKLIST.includes(anchor.text.trim()); | ||
| }) | ||
| .map(anchor => ({ |
There was a problem hiding this comment.
based on the return value of the gather, looks like they only have these two props... so agree we can drop the .map().
| afterPass(options) { | ||
| const expression = `(function() { | ||
| ${DOMHelpers.getElementsInDocumentFnString}; // define function on page | ||
| const selector = 'a[href]:not([rel~="nofollow"])'; |
There was a problem hiding this comment.
would it be better to combine this with anchors-with-no-rel-noopener and sort client-side for nofollow vs noopener into a single anchor gatherer? Would need to collect rel so that audits can filter on it
There was a problem hiding this comment.
could be all-anchors or something (we could do it in a followup PR if you want to keep this one focused)
There was a problem hiding this comment.
all-anchors SGTM, but since this will require changing anchors-with-no-rel-noopener I think we should do that in a followup PR as you suggested
668f28c to
b16c181
Compare
paulirish
left a comment
There was a problem hiding this comment.
Overall this looks great. A few wording changes..
| }); | ||
|
|
||
| const headings = [ | ||
| {key: 'href', itemType: 'url', text: 'URL'}, |
|
|
||
| const headings = [ | ||
| {key: 'href', itemType: 'url', text: 'URL'}, | ||
| {key: 'text', itemType: 'text', text: 'Text'}, |
| return { | ||
| category: 'Content Best Practices', | ||
| name: 'anchor-text', | ||
| description: 'Anchors have descriptive text.', |
There was a problem hiding this comment.
Links have descriptive text
cc @rviscomi you okay with calling these "links" instead of 'anchors' ? IMO its more natural and i dont think the technical correctness is worth it.
There was a problem hiding this comment.
"links" or "<a> elements" SGTM
@kdzwinel please also update the name and failureDescription fields for consistency
There was a problem hiding this comment.
Changing name kinda forced me to get rid of 'anchor' completely: b710137 . @paulirish PTAL
| category: 'Content Best Practices', | ||
| name: 'anchor-text', | ||
| description: 'Anchors have descriptive text.', | ||
| failureDescription: 'Anchors do not have descriptive text', |
| return elements | ||
| .map(node => ({ | ||
| href: node.href, | ||
| text: node.innerText |
There was a problem hiding this comment.
s/innerText/textContent
| return { | ||
| rawValue: failingAnchors.length === 0, | ||
| extendedInfo: { | ||
| value: failingAnchors, |
There was a problem hiding this comment.
in theory this could be a fairly large set of links. Is there a benefit to having a copy in extendedInfo as well as in details? If not we should probably just keep details
There was a problem hiding this comment.
extendedInfo is a bit easier to work with in unit tests:
assert.equal(auditResult.extendedInfo.value.length, 1);
assert.equal(auditResult.extendedInfo.value.includes(invalidAnchor), true);
will have to became:
assert.equal(auditResult.details.items.length, 1);
assert.equal(auditResult.details.items[0][0].text, invalidAnchor.href);
assert.equal(auditResult.details.items[0][1].text, invalidAnchor.text);
But yeah, it doesn't justify keeping it around.
| }, | ||
| 'anchor-text': { | ||
| score: false, | ||
| displayValue: '3 anchors found', |
There was a problem hiding this comment.
maybe details: { items: {length: 3}} as well to make sure they're in there? (you could also assert the actual items contents if you thought that was worthwhile...the only cost is more verbose/harder to scan test expectations)
There was a problem hiding this comment.
Checking length SGTM, but checking all contents gets very verbose (showing just one of three items):
details: {
items: {
length: 3,
0: {
0: {
text: 'https://example.com/',
},
1: {
text: 'click this',
}
},
},
},
I'd leave that to unit tests (in fact, since we are getting rid of extendedInfo, I'll have to use details in unit tests).
There was a problem hiding this comment.
I'd leave that to unit tests
SGTM
| .filter(anchor => { | ||
| if ( | ||
| anchor.href.toLowerCase().startsWith('javascript:') || | ||
| URL.equalWithExcludedFragments(anchor.href, artifacts.URL.finalUrl) |
There was a problem hiding this comment.
why are links within the page not counted?
There was a problem hiding this comment.
As far as I understand, these links have no SEO value and often are handled by JS anyway (e.g. <a href='#'>show more</a> to expand some text inline).
| const auditResult = AnchorTextAudit.audit(artifacts); | ||
| assert.equal(auditResult.rawValue, true); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
add a javascript: unit test?
…t for javascript: links.
b16c181 to
db23054
Compare
|
@brendankenny all done, thank you! @rviscomi I'll need you to sign off on these description changes that Paul suggested. |
|
@rviscomi is calling this "Link text" rather than "Anchor text" ok with you? (we've already made the change) |
|
👍 Anchor -> Link SGTM, reviewed earlier |
ⒸⓁⒾⒸⓀ ⒽⒺⓇⒺ
|
Closes #3179
Failing audit:

Successful audit:
