Conversation
8654b03 to
ba2f39a
Compare
|
Added frontend support. |
4ae2fe6 to
80d9ab6
Compare
ea9c225 to
e2e66c9
Compare
jezdez
left a comment
There was a problem hiding this comment.
This goes in the right direction and needs some wording and consistency fixes. I've left some questions where I didn't know what the code is intended to do. I think leaving some code comments and docstrings for the new code would be useful.
There was a problem hiding this comment.
I think this needs to be more descriptive Number of query results to keep (leave empty to not keep more than the recent result) or something like this?
There was a problem hiding this comment.
Kind of a small space. I've added a check so that entering 1 or leaving it blank do the same thing.
There was a problem hiding this comment.
This is the directive used in schedule-dialog.html above?
There was a problem hiding this comment.
I think we should keep the terminology straight, let's use QueryResultSetResource instead of QueryAggregateResultResource to stay consistent with the Python API and 'api/queries/:id/resultset' for the URL. I'm not sure if the term "aggregated" wouldn't confuse people that it's some kind of database aggregation involved while it's really just a list of query results we refer to.
There was a problem hiding this comment.
I think this should just be query, no need for the extra suffix.
There was a problem hiding this comment.
The suffix is to avoid collision with db.Model.query.
There was a problem hiding this comment.
I don't understand why there is another loop here?
4ec4ac6 to
11be740
Compare
|
I've addressed the issues you mentioned. Before merging this we'll need to roll staging db back by one migration. |
11be740 to
16f730a
Compare
emtwo
left a comment
There was a problem hiding this comment.
Should we make the dropdown for number of query results to keep disabled when a refresh schedule isn't set?
I'm also wondering if maybe we can put some constraint on that value because there is no maximum for it at the moment.
There was a problem hiding this comment.
huh. Did this change make it into this PR by accident?
There was a problem hiding this comment.
... that revision shouldn't be in this branch, you're right. rebase gets things right most of the time and I wasn't vigilant last time I pushed this branch.
There was a problem hiding this comment.
Was schedule_keep_results an unused field before? I don't see any other references to it.
There was a problem hiding this comment.
This migration went into master before it should have -- the field was misnamed. We're going to roll back the misnamed field on stage before this PR gets merged.
There was a problem hiding this comment.
Wouldn't offset + total potentially index the query.query_results list outside of its length if offset > 0 since total = len(query.query_results)? Would [offset:] make more sense?
There was a problem hiding this comment.
I see why you copied the first result only here because you later replace the data attribute with the newly computed results above. Perhaps this can be less confusing to read with just a brief comment to explain this?
There was a problem hiding this comment.
I assume when you say queries with the same text you mean same SQL?
There was a problem hiding this comment.
Yes. (Though of course this is used for non-SQL data sources as well.)
There was a problem hiding this comment.
New to sqlalchemy here, how is queries.first() different from queries[0]? Can we use the first_query variable throughout instead?
There was a problem hiding this comment.
[http://docs.sqlalchemy.org/en/rel_1_1/orm/query.html#sqlalchemy.orm.query.Query.first](Return the first result of this Query or None if the result doesn’t contain any row.)
You're right that it's a bit odd to then use queries[0] here; fixed.
There was a problem hiding this comment.
What if schedule_resultset_size was set by a user and we captured a bunch of results. Then it was unset by the user. Would this mean we wouldn't look at this stale data here?
There was a problem hiding this comment.
Correct. Those are handled before this is called, in cleanup_query_results.
ddd9e8f to
49e6c5d
Compare
|
Fixed a few of the things you mentioned. |
| delete_count = 0 | ||
| queries = Query.query.filter(Query.schedule_resultset_size != None).order_by(Query.schedule_resultset_size.desc()) | ||
| # Multiple queries with the same text may request multiple result sets | ||
| # be kept. We start with the one that keeps the most, and delete both |
There was a problem hiding this comment.
Do we start with the one that keeps the most because we want to limit how much deleting is done in cleanup_query_results()? If so, maybe a comment on that? I was confused at first why we only look at the first query but the deleting limit listed in cleanup_query_results() is my best guess at why.
There was a problem hiding this comment.
We start with the one that keeps the most because if we start with any others it'd delete results that need to be kept.
| n_to_delete = resultset_count - first_query.schedule_resultset_size | ||
| r_ids = [r.result_id for r in resultsets][:n_to_delete] | ||
| delete_count = QueryResultSet.query.filter(QueryResultSet.result_id.in_(r_ids)).delete(synchronize_session=False) | ||
| QueryResult.query.filter(QueryResult.id.in_(r_ids)).delete(synchronize_session=False) |
There was a problem hiding this comment.
Is there a reason why these deletes don't contribute to the delete_count but the QueryResultSet deletes do? It seems that in cleanup_query_results() originally only QueryResult deletes were being counted so maybe these need to be counted too?
There was a problem hiding this comment.
No idea how I made that mistake. Fixed.
| delete_count = QueryResultSet.query.filter(QueryResultSet.result_id.in_(r_ids)).delete(synchronize_session=False) | ||
| QueryResult.query.filter(QueryResult.id.in_(r_ids)).delete(synchronize_session=False) | ||
| # Delete unneeded bridge rows for the remaining queries. | ||
| for q in queries[1:]: |
There was a problem hiding this comment.
I'm not sure I follow what's happening in this loop. My understanding is that we look at the query with the largest allowable query results and delete its stale results and bridge rows like you said. That is what happens in the code above.
Since other queries may have pointed to the stale results that were deleted, the bridge rows should be deleted for those too? That's what I would have expected this loop to be doing. But it seems to be looking at whether queries that have more stale results, similar to the check that was done for the first_query?
I think in general there is some code in this loop that seems to be similar to what is done to the first_query and I'm wondering if factoring some of it out into a separate function might make it clearer or if it's possible include the operations on first_query in this loop?
There was a problem hiding this comment.
By this point, when the loop starts, there are no stale result sets left. (I'll add a comment pointing this out.)
So all that has to be deleted are bridge rows for queries that have requested fewer resultsets be kept than the first one.
This provides UI for keeping multiple query results and backend support for storing and aggregating them (and deleting them when no longer needed).
Still needs frontend work for choosing to display aggregated results.
fixes #35