Skip to content

Improve byline algorithm#40

Merged
leibovic merged 1 commit intomozilla:masterfrom
leibovic:byline
Mar 20, 2015
Merged

Improve byline algorithm#40
leibovic merged 1 commit intomozilla:masterfrom
leibovic:byline

Conversation

@leibovic
Copy link
Contributor

Still a WIP, as I'll have to rebase on top of #37

This fixes at least these two testcases:
http://www.salon.com/2015/01/02/coca_colas_anti_american_outsourcing_scheme_how_big_soda_gets_the_public_to_shoulder_its_costs/
http://alistapart.com/article/designing-for-easy-interaction

Using the author meta tag is definitely a strict improvement, as is checking the nodes before throwing them away. In order to get the salon.com testcase to work, I added logic for preferring a rel="author" element, but I don't know if that's too specific to that testcase.

Also, I don't like the var self = this; crap, but it was the quickest way to getting this to work to test :)

@leibovic
Copy link
Contributor Author

I updated this to not favor the rel="author", since that was causing one of the tests to fail (a valid failure).

@leibovic leibovic force-pushed the byline branch 4 times, most recently from ffd3c76 to f08b286 Compare March 20, 2015 15:01
@leibovic
Copy link
Contributor Author

With the new _removeAndGetNext logic, I realize this still has the problem of not checking the children of nodes that we're about to remove (e.g. digging through a header node to look for a byline). Perhaps we should try recursively looking through nodes before removing them if we haven't found a byline yet.

@gijsk
Copy link
Contributor

gijsk commented Mar 20, 2015

With the new _removeAndGetNext logic, I realize this still has the problem of not checking the children of nodes that we're about to remove (e.g. digging through a header node to look for a byline). Perhaps we should try recursively looking through nodes before removing them if we haven't found a byline yet.

This sounds like a good idea for a followup, as discussed IRL.

Readability.js Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the comment. :-)

@gijsk
Copy link
Contributor

gijsk commented Mar 20, 2015

This generally looks great, just a few nits, for which, see my earlier comments. :-)

leibovic added a commit that referenced this pull request Mar 20, 2015
Improve byline algorithm. r=Gijs
@leibovic leibovic merged commit d0df9d8 into mozilla:master Mar 20, 2015
@leibovic leibovic deleted the byline branch March 20, 2015 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants