Conversation
|
I tested this with flac and mp3 files (that is, with APE tags, where multiple genre tags are written and ID3 tags, where The remaining questions are:
|
sampsyo
left a comment
There was a problem hiding this comment.
Very cool! This is diff is surprisingly small—I expected this to be a far more disruptive change.
A couple of ideas come to mind for improving backwards compatibility, neither of which we necessarily have to use:
- We could keep a field called
genrethat just gets the joined string, and add a newgenresfield for the list. That's what MediaFile previously did. - We could somehow allow a string to be assigned into the
genrefield, which would then be treated as a single-element list.
The only thing that makes me somewhat uncomfortable here is the age-old slippery slope question. How many other fields would it make sense to transform into lists? Should artist be a list? albumartist? If we make too many changes like this, will we disrupt simpler code that just cares about a single value?
Anyway, genre is clearly the best use case for this, so maybe it's not worth worrying too much about other tags.
|
Btw, I guess you'll doubt that getall('TIPL') can be replaced with get('TIPL') (mainly because I doubted about it too :) ). So here is the test I did with mutagen to check that it doesn't allow two TIPL frames: Here, I checked with okteta (an hex editor) that the file included the added TIPL frame. and here, I checked again with an hex editor that the frame wasn't just added, but it replaced the previous TIPL frame. |
sampsyo
left a comment
There was a problem hiding this comment.
Things are coming along nicely! Here are a few more comments on the latest.
| if value is not None: | ||
| if value: | ||
| formatted[key] = u'; '.join(value) | ||
| elif value is not None: |
There was a problem hiding this comment.
This, for what it's worth, is one of the things that worries me about making arbitrary fields into lists: now client code will frequently need to have boilerplate for the form if isinstance(value, list):. This sort of thing puts additional weight on the side of having an explicit way to access lists, or some other foolproof conversion to produce a scalar value.
There was a problem hiding this comment.
I understand what you mean. Note that list fields (genre, arranger ... ) now are always lists, the problem comes with code like that which tries to work with all fields the same way. Would you prefer that I create a class that inherit from lists and reimplements __str__ so it returns '; '.join(self) ? That way code like that could be changed to:
if value:
formatted[key] = str(value)
But still, that's not perfect neither.
| this style. | ||
| """ | ||
| self.set_list(mutagen_file, [value]) | ||
| if value is None: |
There was a problem hiding this comment.
That's odd—I'm a little surprised that ListStorageStyle is allowed to return None from get. That seems like it makes things more complicated for clients in general. Maybe that's worth fixing?
There was a problem hiding this comment.
In fact, I'd say it's ok. get is implemented as:
try:
return self.get_list(mutagen_file)[0]
except IndexError:
return None
So None is returned when get_list returns an empty lists, that is, when the tag doesn't exist in the file. This allows to identify when a tag doesn't exist and allows to remove a tag from a file. Do you think that it would be better to return an empty string and always interpret an empty value as a non-existant tag? (so setting ... say artist to '' would delete the tag entry)
|
I just noticed I had this written in a browser tab but forgot to click comment. I'll post it for the records, even if it's probably old now.
Yes, I also was surprised, but most of the work was already implemented in ListMediaField, the *ListStorageStyle classes and mutagen.
Yes, I saw that, but the genres tag is (afaik) not used by any other application, but genre is.
Have a look at the test_write_genre_list_from_string test I added in test_mediafile.py to test fad5809 :). I think that's even better since it allows assigning a string but automatically converts it to a list.
Hmm, I don't think that would be too much work. I can give it a try if you want. (This is now done) |
| def negated_clause(self): | ||
| clause, subvals, table = self.col_clause() | ||
| _table = self.model_cls._table | ||
| primary_key = '%s.id' % (_table) |
There was a problem hiding this comment.
@sampsyo Do you think it's worth it to add a get_primary_key_field to Model so instead of hardcoding id I call self.model_cls.get_primary_key_field() which would iterate over its fields and search which has a PRIMARY_ID type? I think that would be more elegant, but I'm not sure if it's worth the effort.
| query = query or TrueQuery() # A null query. | ||
| sort = sort or NullSort() # Unsorted. | ||
| where, subvals = query.clause() | ||
| where, subvals, new_tables = query.clause() |
There was a problem hiding this comment.
@sampsyo should I still allow Query objects to maybe return 2-tuples just in case any external plugin still uses the old API? It's easy to check for unpack errors but it's a bit ugly, so I preferred to ask your opinion.
dcf1c93 to
72ab5c9
Compare
A StringList type is a list of strings that can be formatted as a string (using a separator character/string) to be stored in the database or used as path component. By default, I used ';' as separator, but I guess that should be made configurable through a global config parameter (though I guess most people would use ';' or ', '). Also, probably query should be something different from query.SubstringQuery, but that can be changed later.
genres is removed, but that's not standard, nor needed anymore.
This removes the need for a separator parameter, which now should be made global.
…s separator
- Define a null value as []
- Parse '' as the null value (and not as ['']).
- Use the same separator ('; ') as tmpl_first does by default, so tmpl_first
keeps working as usual.
Allow to set item.genre = u'Rock; Blues' and automatically convert that to a string, so we keep better backward compatibility. Also added a test for that
Make to_sql and from_sql return/accept None values
Convert it to a string before adding the genres list to a string
ListStorageStyle.get can return None, so set should accept it
MP3ListStorageStyle.fetch can return [], so store should accept it.
MP3PeopleStorageStyle now gets/sets lists of people. Note that MP3PeopleStorageStyle should only be used with TIPL tags so there's no need to use getall, since it'll always return only one element (or None).
Also remove an unneeded piece of code I added in a previous commit.
…more Use the `multivalue_separator` config option whenever splitting or joining multivalue fields
This adds a very simple TupleMediaField class that inherits ListMediaField and just converts the result of __get__ to a tuple since ListMediaField is also used for ImageListField, and I didn't know if that should be changed to tuples too. Changed all tests to use tuples accordingly.
The StringList type now uses json to store the list of strings in the database. Also, since sampsyo expressed he tried to avoid using database scheme versioning, I added a migration method that I think should be quick enough to be run on every start (although I think I still would prefer to use versioning, which would allow for this to be run once). This iterates on every entry in a StringList field that sqlite recognizes as "not valid json" and converts it to json. After the database is migrated once, this should return 0 rows so the migration would take only the time it takes sqlite to test JSON_VALID() on StringList types.
There may be cases where genre contains values like `"Symphonic Rock"`
since `JSON_VALID('"Symphonic Rock"')` is True for sqlite, we have
to check too that the json value is actually an array.
The new JSonSubstringListQuery class implements a SubstringQuery over a
StringList type using json sqlite functions in the database. For this,
I had to change the clause() method return tuple to add a new
element which is a list of tables to add to the FROM in the final
sql expression. So now, JSonSubstringListQuery returns something like:
```
("json_genre1.value like ? escape '\\'",
('Rock',),
('json_each(genre) as json_genre1',)
)
```
Since json_each is a table that has a column named id, This would
collide with the id column from items/albums, so I changed
Database.fetch from `SELECT * ...` to
`SELECT DISTINCT items.id, items.title, items.artist, items.genre ...`
I added a DISTINCT since using multiple tables can repeat results.
Now it's possible to do things like:
`beet ls genre:Classic`
which returns Classical music as well as 'Classic Rock' music
`beet ls genre:Classic genre:Rock`
which returns music tagged as Classic AND Rock (or Classic Rock)
`beet ls genre:Classic , genre:Rock`
which returns all Classical music and all Rock music
All of that, using json data, so:
beet ls 'genre:["Rock'
doesn't return any result.
Negating json sql queries is not as simple as prefixing the where clause
with NOT, so now NotQuery first checks if the subquery has a
negated_clause function, and uses it if possible.
I implemented JSonSubstringListQuery.negated_clause to return
expressions like:
```
items.id NOT IN (SELECT items.id
FROM items,
JSON_EACH(items.genre) AS json_genre1
WHERE json_genre1.value LIKE 'Rock')
```
This allows to run commands like:
`beet ls genre:Blues ^genre:Rock`
Add a new JSonRegexpListQuery class that implements regular expressions queries on StringList fields. To use it without hardcoding any class names, I had to add more flexibility to the query `prefixes` dictionary. Now, a type defines not only what query class is used with it, but what query class is used for regular expressions and the `prefixes` dictionary not only associates each prefix with a Query class, but can associate a prefix with a function that gets a field name and returns a Query class to use.
Sqlite includes the json extension since version 3.9.0 and there aren't many distributions that don't have it, but there are still a few important distributions that have sqlite < 3.9.0 (Debian stable, centOS, openSUSE Leap and Ubuntu Trusty), so in those cases, fallback to use equivalent python code, which is slower, but at least we don't leave any distribution behind. Btw, Arch, Debian stable backports|testing|unstable|experimental, Fedora 24|25|26|Rawhide, FreeBSD, Gentoo, Mageia, Manjaro, openBSD, openSUSE Tumbleweed and Ubuntu 16.04 LTS|16.10|17.04 already contain at least 3.11.0.
`multivalue_separator` is no longer used to store multivalue fields in the database, so remove the sentence that says that.
If a new field is added to the model, the migration happens before it's added, so it can try to recover the values of a non-existing column. In that case, just ignore the error since there's obviously nothing to migrate there.
7b0b7fc to
06f7c04
Compare
|
@antlarr thanks for your work on this PR. It's been a while since it's had any attention, but it seems that multi-valued tags are still a feature that people would like to see! Just a heads up that we're planning on splitting MediaFile out into a standalone Python module & git repository during the current beets release cycle, and this is clearly going to affect this PR. Your branch is anyway conflicting with master already, but after the split it'll mean that the changes will also need to be split into two separate sets. Are you still interested in working on this? If so it might be sensible to start with the MediaFile changes at beetbox/mediafile. |
|
Closing as no response from the original author. There's still a lot of good info to be gained from this, but any further implementation ideas should be directed to #505. Also, I think it'd be ideal to first implement the basis for multi-value tags, and then add genres as an additional PR after. |
These changes allow beets to store multiple genre tags for multivalues, with the intent of fixing #505 . Note that this is not ready to be merged yet, but it's a beginning of discussion on how to properly fix the issue.
Note that these commits store lists in album.genre and item.genre properties, while storing them as text strings in the database using a separator. Also, I checked that if a user has something like this as path definition:
then $genre is formatted as a string correctly . In my test case, creating a directory called "Rock;Blues" .
This can break some complex configurations. For example, I had this as album_fields definition to add a album_genre_for_path tag that would contain the most used genre in an album:
When evaluating this tag, beet broke sine item.genre is no longer hashable. I think this is a step better than silently breaking the directory structure. I fixed it easily by replacing the code above with:
Then I also tested what happens when reading a file written by another application with multiple genre tags:
Which is what I would have expected.