Skip to content

Full text search migrations extensions#353

Closed
rwasef1830 wants to merge 1 commit intonpgsql:devfrom
rwasef1830:full-text-search-migrations
Closed

Full text search migrations extensions#353
rwasef1830 wants to merge 1 commit intonpgsql:devfrom
rwasef1830:full-text-search-migrations

Conversation

@rwasef1830
Copy link
Copy Markdown
Contributor

@rwasef1830 rwasef1830 commented Mar 29, 2018

Hello @roji
This is a proposal for full text search migrations extensions. This still lacks unit tests, but I wanted to show you what I had so far (I'm using this code in production for a while since EF6 with no issues). Let me know what you think and what could be done to make this more friendly to use.

This is a follow-up to #315

@rwasef1830 rwasef1830 force-pushed the full-text-search-migrations branch 2 times, most recently from 0c265c8 to 14fe639 Compare March 29, 2018 10:21
@roji
Copy link
Copy Markdown
Member

roji commented Mar 29, 2018

Thanks @rwasef1830! I'll take a deeper look as soon as I can, but one comment right off the bat is that it's a bit strange for this to be an extension on MigrationBuilder, as opposed to the model builders. This would require people to open the generated migration code and add your extensions, which isn't great. It would be ideal if people could add extensions onto their model (in OnModelCreating()) which would lead to the appropriate SQL being generated in NpgsqlMigrationsSqlGenerator.

Of course, this would require more work. You typically have the model builder extensions add Npgsql-specific annotations (on entities, indices, even the entire model), which are then picked up by NpgsqlMigrationsAnnotationProvider for scaffolding into the migrations. Then, NpgsqlMigrationsSqlGenerator picks up the annotations on the migration operations and renders the SQL accordingly. There are many examples of this, and while it requires changes in several different places it isn't too hard.

@rwasef1830
Copy link
Copy Markdown
Contributor Author

rwasef1830 commented Apr 8, 2018

I'm going to work on porting it to the model builder interface. I want to get this in 2.1 final.

@roji
Copy link
Copy Markdown
Member

roji commented Apr 8, 2018

@rwasef1830 OK, good luck. The preview2 release is probably right around the corner, but I think there's still time before RC.

FWIW the important part is what you've already done (the type and operation mapping). The model/migration setup is a bit secondary - users can always use raw SQL in migrations to get the same effect.

I also suggest you submit some code for review even if it's not finished, just so we can discuss the general direction before you go too far etc.

@rwasef1830
Copy link
Copy Markdown
Contributor Author

rwasef1830 commented Apr 8, 2018

I'm envisioning an API like this:

modelBuilder.Entity<Customer>().ForNpgsqlHasFullTextSearchVector(
    c => c.SearchVector, // an NpgsqlTsVector property in the model
    c => c.Language, // or "english" and an implicit operator will convert both to a TextSearchRegConfig struct
    b => 
    {
        b.Add(c => c.Name, 'D'); // or maybe make the label an enum ? or const strings.
        b.Add(c => c.Address); // default label
        
        // Gin is public const string to allow flexibility for future PostgreSQL versions.
        b.UseIndexMethod(NpgsqlIndexMethods.Gin);

        // Above would be method chainable of course.
    }
);

What do you think ? @roji

As an aside, it feels verbose in the code that lots of classes are prefixed with Npgsql.
For me NpgsqlFullTextSearch is quite verbose, but I have to use it to fit with the rest of the codebase.

@rwasef1830
Copy link
Copy Markdown
Contributor Author

rwasef1830 commented Apr 10, 2018

@roji quick question, is there a requirement for entity framework annotations Annotations.SetAnnotation value to be "simple" types (eg: serializable). I'm asking because I see that for StorageParameters you went through the trouble of flattening the structure and assembling it back together in GetStorageParameters (btw, that return value should be IReadOnlyDictionary since the returned dictionary will be discarded and changes to it won't make it back to annotations). This is in the NpgsqlEntityTypeAnnotations class.

I can see that the code was written in 2016, maybe that was a requirement back then ?

@roji
Copy link
Copy Markdown
Member

roji commented Apr 10, 2018

Sorry for not being available, will try to find some time tonight.

Yes, the point of annotations is that they get serialized into migrations, which are generated C# code. So you basically want to serialize to simple types (bool, int, string). For a complex annotation that gets a bit annoying but that's how it works.

@rwasef1830 rwasef1830 closed this Apr 20, 2018
@rwasef1830
Copy link
Copy Markdown
Contributor Author

After rethinking it with a clear head, I realized that I can just drop/recreate all triggers and functions whenever anything changes, so I will implement it with that strategy instead.

@rwasef1830 rwasef1830 reopened this Apr 21, 2018
@roji
Copy link
Copy Markdown
Member

roji commented Apr 21, 2018

@rwasef1830, I'm sorry I'm not more available for discussion or guidance - I'm at the end of a busy few weeks and should hopefully have some time to dedicate to Npgsql.

The truth is that I need to get more familiar with PostgreSQL's full text support. As I understand it (from the docs), you have two general ways of doing FTS with indices:

  1. Simply set up an expression index with CREATE INDEX
  2. Set up a separate tsvector column which needs to get updated via triggers as dependent content changes.

Both seem to have their advantages and disadvantages: the separate column is apparently faster when using GiST but not when using GIN, the expression index takes less disk space etc. There doesn't seem to be a clear winner between the two.

I'd be careful before going into any EF Core migrations feature which attempts to set up and maintain triggers automatically. As you wrote in a previous comment, this can be quite complex and get out of hand - I may not understand exactly what you have in mind to me it seems very non-trivial. Any migration touching a column may potentially need to check/know about the trigger (e.g. DropColumn), and at the moment there's no concept of a trigger in EF Core migrations. One could be added (much like how sequences are modeled, or how we model PostgreSQL extensions), but that's again a sizable project with little added value over the user just doing things with raw SQL.

In addition, calculated/computed columns are (finally!) under development for PostgreSQL 11. This removes many use cases which previously relied on triggers - including probably this one. In other words, if you work hard on a triggered-based solution now, that may very well become obsolete in a few months when PostgreSQL 11 comes out.

So I definitely would think carefully before jumping in here. Fully modeling FTS indices or search columns seems like a big task to me. As an example, let's not forget that sometimes we want to search across several tables, so a full API would allow defining the index outside of any specific table, not to mention if you want a column, in which you need to create a table to hold it... I'm really not sure I'd expect EF Core to set all that up.

Anyway, let me know what you think...

@rwasef1830
Copy link
Copy Markdown
Contributor Author

rwasef1830 commented Apr 21, 2018

@roji I thought of only supporting the simple case where there is one (or more) tsvector column with dependent columns all being in the same table and all being simple properties. I store my info as annotations on the IEntity annotations (and corresponding EntityBuilder).. ie: on the table level.

I will "serialize" the entire tree of dependents (and their labels, and default values) as a string per tsvector property in an annotation.

In case of create table operation that has my annotation, I'll create the trigger + trigger function + run an update to populate the tsvector column, and also have EF Core create an index using the already existing ForNgpsqlHasMethod.

In case someone modifies or drops a dependent column or even the tsvector itself, I will allow the operation to fail (ie: migration will be non-working), it will be up to the user to update the corresponding ForNpgsqlHasTextSearchVector call in their DbContext config to remove the offending references.

In theory, if they were using the Expression-based API (ie: specifying column with lambdas) their code would not compile anyway and they would be forced to update it, but for the case where the user was using string property names... I have to let it fail.

Thus, by having the user modify my call, the serialized annotation will change... which I am hoping will trigger EF Core to emit an AlterTable operation (please correct me if I'm wrong in my assumptions), in which case I'll just drop the function and trigger and recreate everything again.

"Properly" handling it would be too much work, that's why I considered simply closing this pull request, but I thought of getting something minimum workable.

To be honest, I'm still on the fence here, but since I had at least my sample migration helper that was working for me, I thought of translating it to something built-in in Npgsql with the same kind of semantics.

If you think it's better, we can postpone this and see what Postgres 11 brings to the table, and we can alternatively just post my sample migration helper in a "samples" folder in the repo for people to use or build on instead of continuing with my plan.

I'm interested to know your opinion here.

@rwasef1830 rwasef1830 force-pushed the full-text-search-migrations branch from 14fe639 to dc23138 Compare April 22, 2018 00:52
@rwasef1830 rwasef1830 force-pushed the full-text-search-migrations branch from dc23138 to 1a9a103 Compare April 22, 2018 00:55
@rwasef1830
Copy link
Copy Markdown
Contributor Author

@roji check latest commit for implementation as I mentioned and let me know what you think

@roji
Copy link
Copy Markdown
Member

roji commented Apr 22, 2018

@rwasef1830, I'll try to review your code a bit later (don't have time at this moment), but here are some general notes.

First off, I'll just say I've already gone down the route of trying to model database features that are a bit too complex in migrations, and it ended up being a complex, brittle component that didn't add that much value (since users always have raw SQL). This is why my initial reaction is a bit negative. It's a bit reinforced by seeing the complexity in the string representation of your annotation - it's certainly possible to go down this route, but the complex mini-language with delimiters and special chars may be telling us that we're overdoing it.

More to the point. First, I definitely think this should wait until PG 11 introduces generated columns. I mean, if you had that feature today, you'd surely implement FTS migrations with it rather than managing triggers, right?

Second, once generated columns do become a reality, setting up a vector column becomes much simpler, perhaps simple enough to not require a special EF Core API to manage it... Based on the current discussion, it seems like creating a ts_vector column would be as simple as:

CREATE TABLE foo (
...
search_vector ts_vector GENERATED ALWAYS AS (ts_vector(...))
)

If this becomes a reality, then I think the need for special migration support simply disappears, and we can expect users to do this themselves (with some doc guidance of course). After all, it's just as simple as creating an FTS expression today, but we're not discussing a special migration API for that.

What do you think?

@rwasef1830
Copy link
Copy Markdown
Contributor Author

@roji I think you're right. Perhaps having my incompletely modeled approach is worse than leaving people with the freedom of defining things directly as they need, and the need will disappear soon anyway. I'll close this pull request.

@rwasef1830 rwasef1830 closed this May 4, 2018
@roji
Copy link
Copy Markdown
Member

roji commented May 4, 2018

Ok, let's revisit this after PG 11 is released and the new computed column feature becomes more clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants