Add multivalue genre tag support by antlarr · Pull Request #2503 · beetbox/beets

antlarr · 2017-04-04T16:57:13Z

These changes allow beets to store multiple genre tags for multivalues, with the intent of fixing #505 . Note that this is not ready to be merged yet, but it's a beginning of discussion on how to properly fix the issue.

Note that these commits store lists in album.genre and item.genre properties, while storing them as text strings in the database using a separator. Also, I checked that if a user has something like this as path definition:

paths:
    default: $albumartist/$genre/$album%aunique/$genre/$track - $title

then $genre is formatted as a string correctly . In my test case, creating a directory called "Rock;Blues" .

This can break some complex configurations. For example, I had this as album_fields definition to add a album_genre_for_path tag that would contain the most used genre in an album:

album_fields:
    album_genre_for_path: |
        if 'album_genre' in globals():
            return album_genre.title()
        if hasattr(items[0], 'album_genre'):
            return items[0].album_genre.title()
        genres = {}
        cnt, candidate = 0, ''
        for item in items:
            genres[item.genre] = genres.get(item.genre, 0) + 1
            if genres[item.genre] > cnt:
                cnt, candidate = genres[item.genre], item.genre.title()
        return candidate

When evaluating this tag, beet broke sine item.genre is no longer hashable. I think this is a step better than silently breaking the directory structure. I fixed it easily by replacing the code above with:

    album_genre_for_path: |
        if 'album_genre' in globals():
            return album_genre.title()
        if hasattr(items[0], 'album_genre'):
            return items[0].album_genre.title()
        genres = {}
        cnt, candidate = 0, ''
        for item in items:
            if isinstance(item.genre, list):
                values = item.genre
            else:
                values = [item.genre]
            for genre in values:
                genres[genre] = genres.get(genre, 0) + 1
                if genres[genre] > cnt:
                    cnt, candidate = genres[genre], genre.title()
        return candidate

Then I also tested what happens when reading a file written by another application with multiple genre tags:

Python 3.6.1 (default, Mar 23 2017, 13:04:44) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import beets.mediafile
>>> mf=beets.mediafile.MediaFile('01 - Big Brother & the Holding Company -  Bye, Bye Baby.flac')
>>> mf.genre
['Rock', 'Album Rock']

Which is what I would have expected.

mention-bot · 2017-04-04T16:57:15Z

@antlarr, thanks for your PR! By analyzing the history of the files in this pull request, we identified @sampsyo, @Kraymer and @brunal to be potential reviewers.

antlarr · 2017-04-04T17:25:30Z

I tested this with flac and mp3 files (that is, with APE tags, where multiple genre tags are written and ID3 tags, where \0 bytes are used to separate genres in a single ID3 tag). Everything works as expected due to ListMediaField (btw, congratulations to whoever implemented it and the *ListStorageStyle classes, they are really nicely done)

The remaining questions are:

Should a global parameter be added to the configuration? If yes, should it be named multivalue_separator or ... ? Also should it default to ';' ? '; ' ? ', ' ? This should be also set as the default value for tmpl_first, so it behaves correctly when handling the genre tag.
What to use for queries in the StringList type? I think using SubstringQuery is probably "good enough", but that can be discussed as it could return some unexpected matches . I'm thinking on someone querying for "ck;Bl" for example, but on the other side, maybe someone would like to query for "Rock;Blues" to get items with both values (although that wouldn't match genres stored as "Blues;Rock" which should be considered ).
Should a config option be added (in the mood of id3v23), like ... use_single_tags_with_separator (defaulting to False) just in case any user prefers to use the current behavior in case (s)he is using the music files with another application that doesn't support multivalue genre tags yet?

sampsyo

Very cool! This is diff is surprisingly small—I expected this to be a far more disruptive change.

A couple of ideas come to mind for improving backwards compatibility, neither of which we necessarily have to use:

We could keep a field called genre that just gets the joined string, and add a new genres field for the list. That's what MediaFile previously did.
We could somehow allow a string to be assigned into the genre field, which would then be treated as a single-element list.

The only thing that makes me somewhat uncomfortable here is the age-old slippery slope question. How many other fields would it make sense to transform into lists? Should artist be a list? albumartist? If we make too many changes like this, will we disrupt simpler code that just cares about a single value?

Anyway, genre is clearly the best use case for this, so maybe it's not worth worrying too much about other tags.

beets/dbcore/types.py

antlarr · 2017-04-05T17:15:20Z

Btw, I guess you'll doubt that getall('TIPL') can be replaced with get('TIPL') (mainly because I doubted about it too :) ). So here is the test I did with mutagen to check that it doesn't allow two TIPL frames:

>>> import mutagen
>>> m=mutagen.File('full.mp3')
>>> m.tags.get('TIPL')
>>> frame=mutagen.id3.Frames['TIPL'](encoding=mutagen.id3.Encoding.UTF8, people=[['aa','bb'],['cc','dd']])
>>> m.tags.add(frame)
>>> m.save()

Here, I checked with okteta (an hex editor) that the file included the added TIPL frame.

>>> frame2=mutagen.id3.Frames['TIPL'](encoding=mutagen.id3.Encoding.UTF8, people=[['ee','ff'],['gg','hh']])
>>> frame
TIPL(encoding=<Encoding.UTF8: 3>, people=[['aa', 'bb'], ['cc', 'dd']])
>>> frame2
TIPL(encoding=<Encoding.UTF8: 3>, people=[['ee', 'ff'], ['gg', 'hh']])
>>> m.tags.add(frame2)
>>> m.tags.getall('TIPL')
[TIPL(encoding=<Encoding.UTF8: 3>, people=[['ee', 'ff'], ['gg', 'hh']])]
>>> m.save()

and here, I checked again with an hex editor that the frame wasn't just added, but it replaced the previous TIPL frame.

sampsyo

Things are coming along nicely! Here are a few more comments on the latest.

sampsyo · 2017-04-05T17:12:57Z

beetsplug/info.py

-        if value is not None:
+            if value:
+                formatted[key] = u'; '.join(value)
+        elif value is not None:


This, for what it's worth, is one of the things that worries me about making arbitrary fields into lists: now client code will frequently need to have boilerplate for the form if isinstance(value, list):. This sort of thing puts additional weight on the side of having an explicit way to access lists, or some other foolproof conversion to produce a scalar value.

I understand what you mean. Note that list fields (genre, arranger ... ) now are always lists, the problem comes with code like that which tries to work with all fields the same way. Would you prefer that I create a class that inherit from lists and reimplements __str__ so it returns '; '.join(self) ? That way code like that could be changed to:

if value: formatted[key] = str(value)

But still, that's not perfect neither.

beets/mediafile.py

sampsyo · 2017-04-05T17:14:58Z

beets/mediafile.py

        this style.
        """
-        self.set_list(mutagen_file, [value])
+        if value is None:


That's odd—I'm a little surprised that ListStorageStyle is allowed to return None from get. That seems like it makes things more complicated for clients in general. Maybe that's worth fixing?

In fact, I'd say it's ok. get is implemented as:

try: return self.get_list(mutagen_file)[0] except IndexError: return None

So None is returned when get_list returns an empty lists, that is, when the tag doesn't exist in the file. This allows to identify when a tag doesn't exist and allows to remove a tag from a file. Do you think that it would be better to return an empty string and always interpret an empty value as a non-existant tag? (so setting ... say artist to '' would delete the tag entry)

antlarr · 2017-04-05T18:35:16Z

I just noticed I had this written in a browser tab but forgot to click comment. I'll post it for the records, even if it's probably old now.

Very cool! This is diff is surprisingly small—I expected this to be a far more disruptive change.

Yes, I also was surprised, but most of the work was already implemented in ListMediaField, the *ListStorageStyle classes and mutagen.

A couple of ideas come to mind for improving backwards compatibility, neither of which we necessarily have to use:

We could keep a field called genre that just gets the joined string, and add a new genres field for the list. That's what MediaFile previously did.

Yes, I saw that, but the genres tag is (afaik) not used by any other application, but genre is.

We could somehow allow a string to be assigned into the genre field, which would then be treated as a single-element list.

Have a look at the test_write_genre_list_from_string test I added in test_mediafile.py to test fad5809 :). I think that's even better since it allows assigning a string but automatically converts it to a list.

The only thing that makes me somewhat uncomfortable here is the age-old slippery slope question. How many other fields would it make sense to transform into lists? Should artist be a list? albumartist? If we make too many changes like this, will we disrupt simpler code that just cares about a single value?

Hmm, I don't think that would be too much work. I can give it a try if you want. (This is now done)

antlarr · 2017-04-12T09:59:55Z

beets/dbcore/query.py

+    def negated_clause(self):
+        clause, subvals, table = self.col_clause()
+        _table = self.model_cls._table
+        primary_key = '%s.id' % (_table)


@sampsyo Do you think it's worth it to add a get_primary_key_field to Model so instead of hardcoding id I call self.model_cls.get_primary_key_field() which would iterate over its fields and search which has a PRIMARY_ID type? I think that would be more elegant, but I'm not sure if it's worth the effort.

antlarr · 2017-04-12T10:03:07Z

beets/dbcore/db.py

        query = query or TrueQuery()  # A null query.
        sort = sort or NullSort()  # Unsorted.
-        where, subvals = query.clause()
+        where, subvals, new_tables = query.clause()


@sampsyo should I still allow Query objects to maybe return 2-tuples just in case any external plugin still uses the old API? It's easy to check for unpack errors but it's a bit ugly, so I preferred to ask your opinion.

.travis.yml

A StringList type is a list of strings that can be formatted as a string (using a separator character/string) to be stored in the database or used as path component. By default, I used ';' as separator, but I guess that should be made configurable through a global config parameter (though I guess most people would use ';' or ', '). Also, probably query should be something different from query.SubstringQuery, but that can be changed later.

genres is removed, but that's not standard, nor needed anymore.

This removes the need for a separator parameter, which now should be made global.

…s separator - Define a null value as [] - Parse '' as the null value (and not as ['']). - Use the same separator ('; ') as tmpl_first does by default, so tmpl_first keeps working as usual.

Allow to set item.genre = u'Rock; Blues' and automatically convert that to a string, so we keep better backward compatibility. Also added a test for that

…ngly

Make to_sql and from_sql return/accept None values

Convert it to a string before adding the genres list to a string

ListStorageStyle.get can return None, so set should accept it

MP3ListStorageStyle.fetch can return [], so store should accept it.

MP3PeopleStorageStyle now gets/sets lists of people. Note that MP3PeopleStorageStyle should only be used with TIPL tags so there's no need to use getall, since it'll always return only one element (or None).

Also remove an unneeded piece of code I added in a previous commit.

…more Use the `multivalue_separator` config option whenever splitting or joining multivalue fields

This adds a very simple TupleMediaField class that inherits ListMediaField and just converts the result of __get__ to a tuple since ListMediaField is also used for ImageListField, and I didn't know if that should be changed to tuples too. Changed all tests to use tuples accordingly.

The StringList type now uses json to store the list of strings in the database. Also, since sampsyo expressed he tried to avoid using database scheme versioning, I added a migration method that I think should be quick enough to be run on every start (although I think I still would prefer to use versioning, which would allow for this to be run once). This iterates on every entry in a StringList field that sqlite recognizes as "not valid json" and converts it to json. After the database is migrated once, this should return 0 rows so the migration would take only the time it takes sqlite to test JSON_VALID() on StringList types.

There may be cases where genre contains values like `"Symphonic Rock"` since `JSON_VALID('"Symphonic Rock"')` is True for sqlite, we have to check too that the json value is actually an array.

The new JSonSubstringListQuery class implements a SubstringQuery over a StringList type using json sqlite functions in the database. For this, I had to change the clause() method return tuple to add a new element which is a list of tables to add to the FROM in the final sql expression. So now, JSonSubstringListQuery returns something like: ``` ("json_genre1.value like ? escape '\\'", ('Rock',), ('json_each(genre) as json_genre1',) ) ``` Since json_each is a table that has a column named id, This would collide with the id column from items/albums, so I changed Database.fetch from `SELECT * ...` to `SELECT DISTINCT items.id, items.title, items.artist, items.genre ...` I added a DISTINCT since using multiple tables can repeat results. Now it's possible to do things like: `beet ls genre:Classic` which returns Classical music as well as 'Classic Rock' music `beet ls genre:Classic genre:Rock` which returns music tagged as Classic AND Rock (or Classic Rock) `beet ls genre:Classic , genre:Rock` which returns all Classical music and all Rock music All of that, using json data, so: beet ls 'genre:["Rock' doesn't return any result.

Negating json sql queries is not as simple as prefixing the where clause with NOT, so now NotQuery first checks if the subquery has a negated_clause function, and uses it if possible. I implemented JSonSubstringListQuery.negated_clause to return expressions like: ``` items.id NOT IN (SELECT items.id FROM items, JSON_EACH(items.genre) AS json_genre1 WHERE json_genre1.value LIKE 'Rock') ``` This allows to run commands like: `beet ls genre:Blues ^genre:Rock`

Add a new JSonRegexpListQuery class that implements regular expressions queries on StringList fields. To use it without hardcoding any class names, I had to add more flexibility to the query `prefixes` dictionary. Now, a type defines not only what query class is used with it, but what query class is used for regular expressions and the `prefixes` dictionary not only associates each prefix with a Query class, but can associate a prefix with a function that gets a field name and returns a Query class to use.

Sqlite includes the json extension since version 3.9.0 and there aren't many distributions that don't have it, but there are still a few important distributions that have sqlite < 3.9.0 (Debian stable, centOS, openSUSE Leap and Ubuntu Trusty), so in those cases, fallback to use equivalent python code, which is slower, but at least we don't leave any distribution behind. Btw, Arch, Debian stable backports|testing|unstable|experimental, Fedora 24|25|26|Rawhide, FreeBSD, Gentoo, Mageia, Manjaro, openBSD, openSUSE Tumbleweed and Ubuntu 16.04 LTS|16.10|17.04 already contain at least 3.11.0.

`multivalue_separator` is no longer used to store multivalue fields in the database, so remove the sentence that says that.

If a new field is added to the model, the migration happens before it's added, so it can try to recover the values of a non-existing column. In that case, just ignore the error since there's obviously nothing to migrate there.

arcresu · 2019-05-19T00:57:20Z

@antlarr thanks for your work on this PR. It's been a while since it's had any attention, but it seems that multi-valued tags are still a feature that people would like to see!

Just a heads up that we're planning on splitting MediaFile out into a standalone Python module & git repository during the current beets release cycle, and this is clearly going to affect this PR. Your branch is anyway conflicting with master already, but after the split it'll mean that the changes will also need to be split into two separate sets. Are you still interested in working on this? If so it might be sensible to start with the MediaFile changes at beetbox/mediafile.

jtpavlock · 2020-07-10T06:18:10Z

Closing as no response from the original author. There's still a lot of good info to be gained from this, but any further implementation ideas should be directed to #505. Also, I think it'd be ideal to first implement the basis for multi-value tags, and then add genres as an additional PR after.

sampsyo reviewed Apr 4, 2017

View reviewed changes

beets/dbcore/types.py Outdated Show resolved Hide resolved

beets/dbcore/types.py Outdated Show resolved Hide resolved

beets/dbcore/types.py Outdated Show resolved Hide resolved

sampsyo reviewed Apr 5, 2017

View reviewed changes

antlarr mentioned this pull request Apr 6, 2017

Add support for multi-valued tags #505

Closed

antlarr commented Apr 12, 2017

View reviewed changes

sampsyo reviewed Apr 12, 2017

View reviewed changes

.travis.yml Outdated Show resolved Hide resolved

antlarr force-pushed the multivalue-genre-tag branch 3 times, most recently from dcf1c93 to 72ab5c9 Compare April 12, 2017 16:17

antlarr added 17 commits May 30, 2017 18:37

Use the new StringList type for genre and store it using multiple tags

3c2c716

genres is removed, but that's not standard, nor needed anymore.

Make lastgenre return the genres as a list

28a508d

This removes the need for a separator parameter, which now should be made global.

Allow None values to be parsed as null StringList values

1ce4c16

Define a null value for StringList, parse '' correctly and use '; ' a…

091e126

…s separator - Define a null value as [] - Parse '' as the null value (and not as ['']). - Use the same separator ('; ') as tmpl_first does by default, so tmpl_first keeps working as usual.

Update tests to test the new genre as a list

645565c

Allow to set a multivalued string to a ListMediaField and a test for it

35c8fa1

Allow to set item.genre = u'Rock; Blues' and automatically convert that to a string, so we keep better backward compatibility. Also added a test for that

Enforce StringList values to be lists and fix some more tests accordi…

6822c52

…ngly

Store empty lists as null in the database

efe7e31

Make to_sql and from_sql return/accept None values

Allow lyricist,composer(_sort),arranger fields to have multiple values

aa8a33a

Fix bpd plugin with the new genre as a list

55a9e3d

Convert it to a string before adding the genres list to a string

Accept None value in ListStorageStyle.set

57d682d

ListStorageStyle.get can return None, so set should accept it

Accept [] in MP3ListStorageStyle.store

f9de5ec

MP3ListStorageStyle.fetch can return [], so store should accept it.

Inherit MP3PeopleStorageStyle from ListStorageStyle

e41a56b

MP3PeopleStorageStyle now gets/sets lists of people. Note that MP3PeopleStorageStyle should only be used with TIPL tags so there's no need to use getall, since it'll always return only one element (or None).

Don't print fields whose value is an empty list

1ac4014

Also remove an unneeded piece of code I added in a previous commit.

Fix tests for lyricist, composer and arranger which are now lists.

5504f45

Replace "if values == []:" with "if not values:"

1465181

antlarr added 13 commits May 30, 2017 18:42

Add multivalue_separator config option so '; ' is not hardcoded any…

63138e0

…more Use the `multivalue_separator` config option whenever splitting or joining multivalue fields

Fix E501 and a small typo in the documentation

53617dc

Fix E501 and E303 errors

9748df2

Convert valid json strings to json arrays too

05f3476

There may be cases where genre contains values like `"Symphonic Rock"` since `JSON_VALID('"Symphonic Rock"')` is True for sqlite, we have to check too that the json value is actually an array.

Change test_library since genre and composer are now json values in db

f0f3943

Fix documentation related to multivalue_separator

8228067

`multivalue_separator` is no longer used to store multivalue fields in the database, so remove the sentence that says that.

antlarr force-pushed the multivalue-genre-tag branch from 7b0b7fc to 06f7c04 Compare May 30, 2017 16:42

Add composer_sort to the list of fields which are string lists

bbd8809

sampsyo mentioned this pull request Aug 26, 2018

Genre is not written to files on update through beets. #3013

Closed

arcresu mentioned this pull request Apr 27, 2019

mediafile: replace with a re-export of beetbox/mediafile #3237

Merged

7 tasks

jtpavlock closed this Jul 10, 2020

Conversation

antlarr commented Apr 4, 2017

Uh oh!

mention-bot commented Apr 4, 2017

Uh oh!

antlarr commented Apr 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sampsyo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antlarr commented Apr 5, 2017

Uh oh!

sampsyo left a comment

Choose a reason for hiding this comment

Uh oh!

sampsyo Apr 5, 2017

Choose a reason for hiding this comment

Uh oh!

antlarr Apr 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sampsyo Apr 5, 2017

Choose a reason for hiding this comment

Uh oh!

antlarr Apr 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antlarr commented Apr 5, 2017

Uh oh!

antlarr Apr 12, 2017

Choose a reason for hiding this comment

Uh oh!

antlarr Apr 12, 2017

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arcresu commented May 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jtpavlock commented Jul 10, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

antlarr commented Apr 4, 2017 •

edited

Loading

antlarr Apr 5, 2017 •

edited

Loading

antlarr Apr 5, 2017 •

edited

Loading

arcresu commented May 19, 2019 •

edited

Loading