[druid] Updating refresh logic#4655
Conversation
7c69592 to
36bccde
Compare
Codecov Report
@@ Coverage Diff @@
## master #4655 +/- ##
==========================================
+ Coverage 71.37% 71.49% +0.12%
==========================================
Files 190 190
Lines 14918 14911 -7
Branches 1102 1102
==========================================
+ Hits 10648 10661 +13
+ Misses 4267 4247 -20
Partials 3 3
Continue to review full report at Codecov.
|
fabianmenges
left a comment
There was a problem hiding this comment.
Looks good to me. I'll check how the constraints for the table you mentioned are.
There was a problem hiding this comment.
Are we not testing aggregators anymore?
There was a problem hiding this comment.
@fabianmenges I'm not the original author of this code so I don't have complete context, but from what I observed whilst debugging was these aggregators were not being tested. I've re-added this in case I was wrong.
There was a problem hiding this comment.
Running a loop to issue a query for each metric/column means that many queries have to be made, as opposed to just one to grab them all at once. If you've got tons of metrics and such, this adds up and can be reasonably slow
There was a problem hiding this comment.
@Mogball that was one of my concerns and I agree with your comment. I've refactored the logic to do one query per DruidColumn.
51564ee to
75e886e
Compare
There was a problem hiding this comment.
could go col_obj.is_num() that comes from the base column class and includes all these
acd8e5a to
ca91e67
Compare
There was a problem hiding this comment.
I mean the same could apply here as well, where it's possible to combine all of the columns' metrics into one query
ca91e67 to
a334220
Compare
(cherry picked from commit f9d85bd)
| # Add the missing uniqueness constraints. | ||
| for table, column in names.items(): | ||
| with op.batch_alter_table(table, naming_convention=conv) as batch_op: | ||
| batch_op.create_unique_constraint( |
There was a problem hiding this comment.
Hit an issue on this line while upgrading our staging. I wrapped the statement in a try block locally so that I could move forward
There was a problem hiding this comment.
For the record it was something to the effect of the constraint existing already
|
@mistercrunch so you recall if the constraint which existed previously was added manually? As far as I can tell this constraint never existed and thus the upgrade/downgrade logic should be sound. |
|
Can't recall creating it. Could also be that it failed half way through before or timed out and got this error the next time around... Dunno. |
Though the term "refresh" is somewhat vague from a Druid metadata perspective I sense this translates to create or update. Previously we were creating or updates Druid columns but only creating Druid metrics when the Druid metadata was synced/refreshed.
This PR ensures that refreshing is consistent for both Druid columns and metrics and specifically addresses the following:
DruidMetricclass. Previously there was somewhat duplicate logic for both theDruidColumnandDruidMetricclass.generate_metricswithrefresh_metricsto imply that we're both creating and updating.INrather than a series ofORs.columnsandmetricstables which were added in Adding YAML Import-Export for Datasources to CLI #3978. Note to avoid this MySQL issue themetric_namecolumn was reduced from 512 to 255 characters.--mergeoptions for therefresh_druidcommand, which should be a flag (true/false) rather than an option.get_metricsin terms of checking whether said metric already exists to ensure consistency with SQLA, hence why the merging logic is handled inrefresh_metrics.@fabianmenges I only added the missing Druid migrations however I believe there are additional migrations from your PR (#3978) which are missing for the following tables:
table_columnstablesto: @mistercrunch @Mogball