Skip to content

Scrape new Genius song page html#3594

Merged
sampsyo merged 5 commits intobeetbox:masterfrom
stlutz:master
May 17, 2020
Merged

Scrape new Genius song page html#3594
sampsyo merged 5 commits intobeetbox:masterfrom
stlutz:master

Conversation

@stlutz
Copy link
Copy Markdown
Contributor

@stlutz stlutz commented May 16, 2020

As noted in #3535, Genius now doesn't always produce html pages that the existing code can scrape. While the fix in #3554 stops the lyrics plugin from completely crashing, it now simply ignores these pages, even though they do contain the desired lyrics. I added a few lines to the algorithm to deal with this new layout.

While I was at it, I also removed the indirection over the /song api, so we only need to query Genius twice per song instead of thrice.

Another change was to include the artist in the search query sent to Genius. This produces much better search results for songs with very common names but less known arists.

stlutz added 4 commits May 16, 2020 13:26
Searching only for the title and just verifying the artist afterwards leads to songs with very common titles not being found, since Genius limits the amount of returned hits.
An example would be 'Saviour' by 'Circa Waves'.
…ng lyrics.

The search results already include the correct song page url, making it superfluous to do another request via the /song api just to get it.
@sampsyo
Copy link
Copy Markdown
Member

sampsyo commented May 16, 2020

Woohoo; looks awesome! Would you mind adding a quick changelog entry describing how this works now?

@sampsyo
Copy link
Copy Markdown
Member

sampsyo commented May 17, 2020

Awesome; thanks!!

sampsyo added a commit that referenced this pull request May 17, 2020
@sampsyo sampsyo merged commit 485abb0 into beetbox:master May 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants