Full text search migrations extensions#353
Conversation
0c265c8 to
14fe639
Compare
|
Thanks @rwasef1830! I'll take a deeper look as soon as I can, but one comment right off the bat is that it's a bit strange for this to be an extension on Of course, this would require more work. You typically have the model builder extensions add Npgsql-specific annotations (on entities, indices, even the entire model), which are then picked up by NpgsqlMigrationsAnnotationProvider for scaffolding into the migrations. Then, NpgsqlMigrationsSqlGenerator picks up the annotations on the migration operations and renders the SQL accordingly. There are many examples of this, and while it requires changes in several different places it isn't too hard. |
|
I'm going to work on porting it to the model builder interface. I want to get this in 2.1 final. |
|
@rwasef1830 OK, good luck. The preview2 release is probably right around the corner, but I think there's still time before RC. FWIW the important part is what you've already done (the type and operation mapping). The model/migration setup is a bit secondary - users can always use raw SQL in migrations to get the same effect. I also suggest you submit some code for review even if it's not finished, just so we can discuss the general direction before you go too far etc. |
|
I'm envisioning an API like this: What do you think ? @roji As an aside, it feels verbose in the code that lots of classes are prefixed with Npgsql. |
|
@roji quick question, is there a requirement for entity framework annotations I can see that the code was written in 2016, maybe that was a requirement back then ? |
|
Sorry for not being available, will try to find some time tonight. Yes, the point of annotations is that they get serialized into migrations, which are generated C# code. So you basically want to serialize to simple types (bool, int, string). For a complex annotation that gets a bit annoying but that's how it works. |
|
After rethinking it with a clear head, I realized that I can just drop/recreate all triggers and functions whenever anything changes, so I will implement it with that strategy instead. |
|
@rwasef1830, I'm sorry I'm not more available for discussion or guidance - I'm at the end of a busy few weeks and should hopefully have some time to dedicate to Npgsql. The truth is that I need to get more familiar with PostgreSQL's full text support. As I understand it (from the docs), you have two general ways of doing FTS with indices:
Both seem to have their advantages and disadvantages: the separate column is apparently faster when using GiST but not when using GIN, the expression index takes less disk space etc. There doesn't seem to be a clear winner between the two. I'd be careful before going into any EF Core migrations feature which attempts to set up and maintain triggers automatically. As you wrote in a previous comment, this can be quite complex and get out of hand - I may not understand exactly what you have in mind to me it seems very non-trivial. Any migration touching a column may potentially need to check/know about the trigger (e.g. DropColumn), and at the moment there's no concept of a trigger in EF Core migrations. One could be added (much like how sequences are modeled, or how we model PostgreSQL extensions), but that's again a sizable project with little added value over the user just doing things with raw SQL. In addition, calculated/computed columns are (finally!) under development for PostgreSQL 11. This removes many use cases which previously relied on triggers - including probably this one. In other words, if you work hard on a triggered-based solution now, that may very well become obsolete in a few months when PostgreSQL 11 comes out. So I definitely would think carefully before jumping in here. Fully modeling FTS indices or search columns seems like a big task to me. As an example, let's not forget that sometimes we want to search across several tables, so a full API would allow defining the index outside of any specific table, not to mention if you want a column, in which you need to create a table to hold it... I'm really not sure I'd expect EF Core to set all that up. Anyway, let me know what you think... |
|
@roji I thought of only supporting the simple case where there is one (or more) tsvector column with dependent columns all being in the same table and all being simple properties. I store my info as annotations on the IEntity annotations (and corresponding EntityBuilder).. ie: on the table level. I will "serialize" the entire tree of dependents (and their labels, and default values) as a string per tsvector property in an annotation. In case of create table operation that has my annotation, I'll create the trigger + trigger function + run an update to populate the tsvector column, and also have EF Core create an index using the already existing In case someone modifies or drops a dependent column or even the tsvector itself, I will allow the operation to fail (ie: migration will be non-working), it will be up to the user to update the corresponding In theory, if they were using the Expression-based API (ie: specifying column with lambdas) their code would not compile anyway and they would be forced to update it, but for the case where the user was using string property names... I have to let it fail. Thus, by having the user modify my call, the serialized annotation will change... which I am hoping will trigger EF Core to emit an AlterTable operation (please correct me if I'm wrong in my assumptions), in which case I'll just drop the function and trigger and recreate everything again. "Properly" handling it would be too much work, that's why I considered simply closing this pull request, but I thought of getting something minimum workable. To be honest, I'm still on the fence here, but since I had at least my sample migration helper that was working for me, I thought of translating it to something built-in in Npgsql with the same kind of semantics. If you think it's better, we can postpone this and see what Postgres 11 brings to the table, and we can alternatively just post my sample migration helper in a "samples" folder in the repo for people to use or build on instead of continuing with my plan. I'm interested to know your opinion here. |
14fe639 to
dc23138
Compare
dc23138 to
1a9a103
Compare
|
@roji check latest commit for implementation as I mentioned and let me know what you think |
|
@rwasef1830, I'll try to review your code a bit later (don't have time at this moment), but here are some general notes. First off, I'll just say I've already gone down the route of trying to model database features that are a bit too complex in migrations, and it ended up being a complex, brittle component that didn't add that much value (since users always have raw SQL). This is why my initial reaction is a bit negative. It's a bit reinforced by seeing the complexity in the string representation of your annotation - it's certainly possible to go down this route, but the complex mini-language with delimiters and special chars may be telling us that we're overdoing it. More to the point. First, I definitely think this should wait until PG 11 introduces generated columns. I mean, if you had that feature today, you'd surely implement FTS migrations with it rather than managing triggers, right? Second, once generated columns do become a reality, setting up a vector column becomes much simpler, perhaps simple enough to not require a special EF Core API to manage it... Based on the current discussion, it seems like creating a ts_vector column would be as simple as: CREATE TABLE foo (
...
search_vector ts_vector GENERATED ALWAYS AS (ts_vector(...))
)If this becomes a reality, then I think the need for special migration support simply disappears, and we can expect users to do this themselves (with some doc guidance of course). After all, it's just as simple as creating an FTS expression today, but we're not discussing a special migration API for that. What do you think? |
|
@roji I think you're right. Perhaps having my incompletely modeled approach is worse than leaving people with the freedom of defining things directly as they need, and the need will disappear soon anyway. I'll close this pull request. |
|
Ok, let's revisit this after PG 11 is released and the new computed column feature becomes more clear. |
Hello @roji
This is a proposal for full text search migrations extensions. This still lacks unit tests, but I wanted to show you what I had so far (I'm using this code in production for a while since EF6 with no issues). Let me know what you think and what could be done to make this more friendly to use.
This is a follow-up to #315