-
Notifications
You must be signed in to change notification settings - Fork 4
[f] VER-261 - Optimize get snippets with labels #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,7 +29,33 @@ BEGIN | |
|
|
||
| user_is_admin := COALESCE('admin' = ANY(user_roles), FALSE); | ||
|
|
||
| CREATE TEMP TABLE filtered_snippets AS | ||
| CREATE TEMP TABLE filtered_snippets AS ( | ||
| -- Pre-compute all label data with upvote counts | ||
| WITH label_data AS ( | ||
| SELECT | ||
| sl.snippet, | ||
| COALESCE(jsonb_agg( | ||
| jsonb_build_object( | ||
| 'id', l.id, | ||
| 'text', CASE | ||
| WHEN p_language = 'spanish' THEN l.text_spanish | ||
| ELSE l.text | ||
| END, | ||
| 'upvote_count', COALESCE(upvote_counts.count, 0), | ||
| 'upvoted_by_me', COALESCE(upvote_counts.upvoted_by_current_user, false) | ||
| ) | ||
| ), '[]'::jsonb) AS labels | ||
| FROM public.snippet_labels sl | ||
| JOIN public.labels l ON sl.label = l.id | ||
| LEFT JOIN LATERAL ( | ||
| SELECT | ||
| COUNT(*) AS count, | ||
| BOOL_OR(lu.upvoted_by = current_user_id) AS upvoted_by_current_user | ||
| FROM public.label_upvotes lu | ||
| WHERE lu.snippet_label = sl.id | ||
| ) upvote_counts ON TRUE | ||
| GROUP BY sl.snippet | ||
| ) | ||
| SELECT | ||
| s.id, | ||
| s.recorded_at, | ||
|
|
@@ -55,24 +81,22 @@ BEGIN | |
| s.confidence_scores, | ||
| s.language, | ||
| s.context, | ||
| (get_snippet_labels(s.id, p_language) -> 'labels') AS labels, | ||
| COALESCE(ld.labels, '[]'::jsonb) AS labels, | ||
| jsonb_build_object( | ||
| 'id', a.id, | ||
| 'radio_station_name', a.radio_station_name, | ||
| 'radio_station_code', a.radio_station_code, | ||
| 'location_state', a.location_state, | ||
| 'location_city', a.location_city | ||
| ) AS audio_file, | ||
| CASE | ||
| WHEN us.id IS NOT NULL THEN true | ||
| ELSE false | ||
| END AS starred_by_user, | ||
| us.id IS NOT NULL AS starred_by_user, | ||
| ul.value AS user_like_status, | ||
| uhs.snippet IS NOT NULL AS hidden, | ||
| like_counts.likes AS like_count, | ||
| like_counts.dislikes AS dislike_count, | ||
| uhs.snippet IS NOT NULL AS hidden | ||
| like_counts.dislikes AS dislike_count | ||
| FROM snippets s | ||
| LEFT JOIN audio_files a ON s.audio_file = a.id | ||
| LEFT JOIN label_data ld ON ld.snippet = s.id | ||
| LEFT JOIN user_star_snippets us ON us.snippet = s.id AND us."user" = current_user_id | ||
| LEFT JOIN user_like_snippets ul ON ul.snippet = s.id AND ul."user" = current_user_id | ||
| LEFT JOIN user_hide_snippets uhs ON uhs.snippet = s.id | ||
|
|
@@ -88,11 +112,7 @@ BEGIN | |
| -- If user is admin, show all snippets (including hidden ones) | ||
| -- If user is not admin, only show non-hidden snippets | ||
| user_is_admin OR | ||
| NOT EXISTS ( | ||
| SELECT 1 | ||
| FROM user_hide_snippets uhs | ||
| WHERE uhs.snippet = s.id | ||
| ) | ||
| uhs.snippet IS NULL | ||
| ) | ||
| AND ( | ||
| p_filter IS NULL OR | ||
|
|
@@ -273,14 +293,13 @@ BEGIN | |
| WHEN p_order_by = 'upvotes' THEN s.upvote_count + s.like_count | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This
You will need to either calculate the total upvotes for a snippet to sort by it (e.g., by processing the |
||
| WHEN p_order_by = 'comments' THEN s.comment_count | ||
| WHEN p_order_by = 'activities' THEN | ||
| CASE | ||
| CASE | ||
| WHEN s.user_last_activity IS NULL THEN 0 | ||
| ELSE EXTRACT(EPOCH FROM s.user_last_activity) | ||
| END | ||
| WHEN p_order_by IS NULL OR p_order_by = 'latest' OR p_order_by = '' THEN EXTRACT(EPOCH FROM s.recorded_at) | ||
| ELSE EXTRACT(EPOCH FROM s.recorded_at) | ||
| END DESC, | ||
| s.recorded_at DESC; | ||
| s.recorded_at DESC -- Default for all other cases, including p_order_by = 'latest' | ||
| ); | ||
|
|
||
| SELECT COUNT(*) INTO total_count | ||
| FROM filtered_snippets; | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| -- Indexes to optimize the CTE-based label aggregation query | ||
|
|
||
| -- 1. Additional index for label_upvotes to optimize the LATERAL join | ||
| CREATE INDEX IF NOT EXISTS idx_label_upvotes_snippet_label_upvoted_by | ||
| ON public.label_upvotes USING btree (snippet_label, upvoted_by); | ||
|
|
||
| -- 2. Index for main snippets filtering | ||
| CREATE INDEX IF NOT EXISTS idx_snippets_status_confidence | ||
| ON public.snippets USING btree (status, (((confidence_scores->>'overall'))::INTEGER)) | ||
| WHERE (status = 'Processed'::processing_status); |
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
label_dataCTE calculates label information for all snippets by scanning the entirepublic.snippet_labelstable. This could be a performance regression compared to the previous implementation, which invoked theget_snippet_labelsfunction only for the snippets that passed the filters. If thesnippet_labelstable is large and the filters are selective, this change could slow down the query.Consider restructuring the query to filter snippets before aggregating label data. One approach is to use a preliminary CTE to select the IDs of filtered snippets and then use those IDs to constrain the data processed in the
label_dataCTE.