[f] VER-261 - Optimize get snippets with labels#18
Conversation
WalkthroughRefactors label aggregation and upvote computation, updates get_snippets to precompute and join label data with visibility flags, adds two indexes to support the new queries, and removes the public search-related snippets function. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Client
participant FN as get_snippets()
participant DB as DB: snippets/joins
participant LD as CTE: label_data
C->>FN: call get_snippets(p_language,…)
FN->>LD: build labels per snippet (text, upvote_count, upvoted_by_me)
FN->>DB: left join snippets with LD, stars, hides
Note over FN,DB: Derive starred_by_user (exists) and hidden (exists)
FN-->>C: rows with labels, starred_by_user, hidden
sequenceDiagram
autonumber
participant FN as get_snippet_labels(snippet_id, language)
participant SL as snippet_labels SL
participant LU as label_upvotes LU (LATERAL)
FN->>SL: select labels for snippet_id
SL->>LU: LATERAL: count upvotes, check current user upvoted
Note over LU: upvote_count, upvoted_by_me
FN-->>FN: build JSON per label without creator/applier fields
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @quancao-ea, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the performance and efficiency of snippet retrieval by optimizing key SQL functions and introducing new database indexes. The changes focus on streamlining how label and upvote data are processed, moving from individual function calls to a more efficient pre-computation approach. Additionally, the codebase is cleaned up by removing an obsolete function, contributing to a more robust and performant data access layer. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed everything up to 27b403a in 1 minute and 30 seconds. Click for details.
- Reviewed
247lines of code in4files - Skipped
0files when reviewing. - Skipped posting
6draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. supabase/database/sql/get_snippet_labels_function.sql:21
- Draft comment:
Refactored upvote aggregation: now returns only 'upvote_count' and 'upvoted_by_me' instead of detailed upvote user info. Ensure that downstream consumers are updated accordingly. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the PR author to ensure that downstream consumers are updated, which violates the rule against asking the author to ensure things are tested or verified. It doesn't provide a specific code suggestion or ask for a specific test to be written.
2. supabase/database/sql/get_snippets_function.sql:32
- Draft comment:
Pre-computing label data via a WITH-cte and joining it into the main query reduces repetitive subquery calls to get_snippet_labels. Verify that the aggregated label structure remains consistent with previous behavior. - Reason this comment was not posted:
Comment looked like it was already resolved.
3. supabase/database/sql/get_snippets_function.sql:292
- Draft comment:
ORDER BY clause now explicitly defaults to 's.recorded_at DESC' for unspecified or 'latest' p_order_by values. Confirm that this change meets sorting requirements. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. supabase/database/sql/optimize_label_indexes.sql:4
- Draft comment:
Creating a composite index on (snippet_label, upvoted_by) in label_upvotes should improve the lateral join performance. Validate that query plans benefit from this index. - Reason this comment was not posted:
Comment looked like it was already resolved.
5. supabase/database/sql/optimize_label_indexes.sql:8
- Draft comment:
The index on snippets for status and confidence_scores extraction appears useful for filtering processed snippets. Ensure that the expression and partial condition match query usage. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =30%<= threshold50%The comment suggests ensuring that the expression and partial condition match query usage, which is a request for confirmation or verification. This violates the rule against asking the PR author to confirm or ensure behavior. However, it does mention the usefulness of the index, which could be considered a suggestion for optimization. Overall, the comment leans more towards asking for confirmation, which is not allowed.
6. supabase/database/sql/search_related_public_snippets.sql:1
- Draft comment:
The public snippet search function has been removed. Confirm that its functionality is either deprecated or replaced and that no dependent services rely on it. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_2k13yI2XBwyaUHVu
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Code Review
This pull request introduces significant optimizations for snippet retrieval by pre-calculating label data using a CTE and adding database indexes. The changes in get_snippet_labels_function.sql are a good improvement. However, I've found a few issues in get_snippets_function.sql. There's a critical issue where the ORDER BY clause references non-existent columns, which will cause the query to fail. Additionally, the placement of the ORDER BY clause can lead to incorrect pagination, and the new CTE for labels might cause a performance regression under certain conditions. My review includes detailed comments on these points.
| @@ -273,14 +293,13 @@ BEGIN | |||
| WHEN p_order_by = 'upvotes' THEN s.upvote_count + s.like_count | |||
There was a problem hiding this comment.
This ORDER BY clause will fail because s.upvote_count and s.like_count are not available as columns in this context.
like_countis available aslike_counts.likesfrom thelike_countslateral join.upvote_countis now calculated per-label inside thelabelsJSON array and is not available as a top-level column on the snippet for sorting.
You will need to either calculate the total upvotes for a snippet to sort by it (e.g., by processing the labels JSON) or remove this sorting option if it's no longer supported.
| WITH label_data AS ( | ||
| SELECT | ||
| sl.snippet, | ||
| COALESCE(jsonb_agg( | ||
| jsonb_build_object( | ||
| 'id', l.id, | ||
| 'text', CASE | ||
| WHEN p_language = 'spanish' THEN l.text_spanish | ||
| ELSE l.text | ||
| END, | ||
| 'upvote_count', COALESCE(upvote_counts.count, 0), | ||
| 'upvoted_by_me', COALESCE(upvote_counts.upvoted_by_current_user, false) | ||
| ) | ||
| ), '[]'::jsonb) AS labels | ||
| FROM public.snippet_labels sl | ||
| JOIN public.labels l ON sl.label = l.id | ||
| LEFT JOIN LATERAL ( | ||
| SELECT | ||
| COUNT(*) AS count, | ||
| BOOL_OR(lu.upvoted_by = current_user_id) AS upvoted_by_current_user | ||
| FROM public.label_upvotes lu | ||
| WHERE lu.snippet_label = sl.id | ||
| ) upvote_counts ON TRUE | ||
| GROUP BY sl.snippet | ||
| ) |
There was a problem hiding this comment.
The label_data CTE calculates label information for all snippets by scanning the entire public.snippet_labels table. This could be a performance regression compared to the previous implementation, which invoked the get_snippet_labels function only for the snippets that passed the filters. If the snippet_labels table is large and the filters are selective, this change could slow down the query.
Consider restructuring the query to filter snippets before aggregating label data. One approach is to use a preliminary CTE to select the IDs of filtered snippets and then use those IDs to constrain the data processed in the label_data CTE.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
supabase/database/sql/get_snippet_labels_function.sql (1)
1-2: SECURITY DEFINER: set a safe search_path.Prevent hijacking via attacker-defined objects by forcing search_path within the function.
BEGIN + PERFORM set_config('search_path', 'public', true);supabase/database/sql/get_snippets_function.sql (4)
102-116: Hidden flag is applied globally; missing per‑user filter.LEFT JOIN user_hide_snippets lacks user scoping, so one user hiding a snippet hides it for all users. Filter by current_user_id.
- LEFT JOIN user_hide_snippets uhs ON uhs.snippet = s.id + LEFT JOIN user_hide_snippets uhs + ON uhs.snippet = s.id AND uhs."user" = current_user_id @@ - user_is_admin OR - uhs.snippet IS NULL + user_is_admin OR uhs.snippet IS NULL
16-23: SECURITY DEFINER: set a safe search_path.Same rationale as in get_snippet_labels().
BEGIN + PERFORM set_config('search_path', 'public', true);
316-324: Guard against division by zero for page_size.Add validation before computing total_pages.
BEGIN current_user_id := auth.uid(); @@ - total_pages := CEIL(total_count::FLOAT / page_size); + IF page_size IS NULL OR page_size <= 0 THEN + RAISE EXCEPTION 'page_size must be > 0'; + END IF; + total_pages := CEIL(total_count::FLOAT / page_size);
1-1: Dropping then recreating the function may reset GRANTs.Prefer CREATE OR REPLACE without DROP to preserve privileges, unless you explicitly re‑grant.
🧹 Nitpick comments (8)
supabase/database/sql/optimize_label_indexes.sql (3)
4-6: Composite index is good; consider enforcing uniqueness to prevent duplicate upvotes.If business rules allow only one upvote per user per snippet_label, add a unique index (after dedup/backfill).
CREATE INDEX IF NOT EXISTS idx_label_upvotes_snippet_label_upvoted_by ON public.label_upvotes USING btree (snippet_label, upvoted_by); + +-- Optional: enforce one upvote per user per snippet_label +-- (run a cleanup migration first if duplicates exist) +CREATE UNIQUE INDEX IF NOT EXISTS uq_label_upvotes_snippet_label_upvoted_by +ON public.label_upvotes (snippet_label, upvoted_by);
8-10: Make the partial index sargable by removing the redundant leading key.Since the index is already partial on status = 'Processed', include only the expression to better match the WHERE and enable smaller index.
-CREATE INDEX IF NOT EXISTS idx_snippets_status_confidence -ON public.snippets USING btree (status, (((confidence_scores->>'overall'))::INTEGER)) -WHERE (status = 'Processed'::processing_status); +CREATE INDEX IF NOT EXISTS idx_snippets_confidence_processed +ON public.snippets USING btree (((confidence_scores->>'overall')::INTEGER)) +WHERE status = 'Processed'::processing_status;
3-10: Avoid blocking writes during index builds.On large tables, prefer CREATE INDEX CONCURRENTLY in a migration that is not wrapped in a single transaction (Supabase migrations default to a single transaction).
supabase/database/sql/get_snippet_labels_function.sql (2)
21-23: Stabilize label array ordering.Add ORDER BY to jsonb_agg so clients get deterministic label order (e.g., by upvotes desc then text).
- 'labels', COALESCE(jsonb_agg(jsonb_build_object( + 'labels', COALESCE(jsonb_agg( + jsonb_build_object( 'id', l.id, 'text', CASE WHEN p_language = 'spanish' THEN l.text_spanish ELSE l.text END, 'upvote_count', COALESCE(upvote_counts.count, 0), 'upvoted_by_me', COALESCE(upvote_counts.upvoted_by_current_user, false) - )), '[]'::jsonb) + ) + ORDER BY COALESCE(upvote_counts.count, 0) DESC, l.text + ), '[]'::jsonb)
18-20: Add language fallback to avoid NULL label text.If text_spanish is NULL, fall back to English.
- 'text', CASE - WHEN p_language = 'spanish' THEN l.text_spanish - ELSE l.text - END, + 'text', CASE + WHEN p_language = 'spanish' THEN COALESCE(l.text_spanish, l.text) + ELSE l.text + END,supabase/database/sql/get_snippets_function.sql (3)
32-58: Precompute snippet‑level upvote flags to remove repeated EXISTS filters.You already aggregate label data; also aggregate per‑snippet flags (any_upvotes, upvoted_by_me, upvoted_by_others) and use them in the labeledBy/upvotedBy filters to cut repeated scans.
- WITH label_data AS ( + WITH upvote_flags AS ( + SELECT + sl.snippet, + COUNT(*) FILTER (WHERE lu.upvoted_by = current_user_id) > 0 AS upvoted_by_me, + COUNT(*) FILTER (WHERE lu.upvoted_by != current_user_id) > 0 AS upvoted_by_others, + COUNT(*) > 0 AS any_upvotes + FROM public.snippet_labels sl + LEFT JOIN public.label_upvotes lu ON lu.snippet_label = sl.id + GROUP BY sl.snippet + ), label_data AS ( SELECT sl.snippet, COALESCE(jsonb_agg( jsonb_build_object( 'id', l.id, 'text', CASE WHEN p_language = 'spanish' THEN l.text_spanish ELSE l.text END, 'upvote_count', COALESCE(upvote_counts.count, 0), 'upvoted_by_me', COALESCE(upvote_counts.upvoted_by_current_user, false) ) ), '[]'::jsonb) AS labels FROM public.snippet_labels sl JOIN public.labels l ON sl.label = l.id LEFT JOIN LATERAL ( SELECT COUNT(*) AS count, BOOL_OR(lu.upvoted_by = current_user_id) AS upvoted_by_current_user FROM public.label_upvotes lu WHERE lu.snippet_label = sl.id ) upvote_counts ON TRUE GROUP BY sl.snippet ) @@ - LEFT JOIN label_data ld ON ld.snippet = s.id + LEFT JOIN upvote_flags uf ON uf.snippet = s.id + LEFT JOIN label_data ld ON ld.snippet = s.idThen rewrite the labeledBy/upvotedBy CASE branches to use uf.any_upvotes / uf.upvoted_by_me / uf.upvoted_by_others instead of EXISTS subqueries.
84-85: Deterministic label order.If client relies on stable order, emit ld.labels sorted (see comment in get_snippet_labels()).
291-302: ORDER BY fallback is fine; consider explicit ELSE to avoid NULL sort key.Optional: set ELSE 0 to make intent explicit and avoid NULLs on unknown p_order_by.
- CASE + CASE WHEN p_order_by = 'upvotes' THEN s.upvote_count + s.like_count WHEN p_order_by = 'comments' THEN s.comment_count WHEN p_order_by = 'activities' THEN CASE WHEN s.user_last_activity IS NULL THEN 0 ELSE EXTRACT(EPOCH FROM s.user_last_activity) END - END DESC, + ELSE 0 + END DESC,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
supabase/database/sql/get_snippet_labels_function.sql(1 hunks)supabase/database/sql/get_snippets_function.sql(4 hunks)supabase/database/sql/optimize_label_indexes.sql(1 hunks)supabase/database/sql/search_related_public_snippets.sql(0 hunks)
💤 Files with no reviewable changes (1)
- supabase/database/sql/search_related_public_snippets.sql
🔇 Additional comments (7)
supabase/database/sql/optimize_label_indexes.sql (1)
8-10: Validate data quality for the integer cast.If confidence_scores->>'overall' can be NULL/non‑numeric, the cast will fail at query time. Consider a CHECK constraint or a backfill that guarantees numeric text.
supabase/database/sql/get_snippet_labels_function.sql (1)
27-34: LATERAL upvote aggregation: LGTM; matches new index.This will leverage (snippet_label, upvoted_by). Good use of BOOL_OR and COUNT with COALESCE.
Please confirm idx_label_upvotes_snippet_label_upvoted_by exists in production before deploying this function.
supabase/database/sql/get_snippets_function.sql (5)
110-111: Index usage: good alignment with new partial index.The filter s.status = 'Processed' AND (s.confidence_scores->>'overall')::INTEGER >= 95 matches the proposed partial index.
32-58: Nice: label precomputation reduces per-row work.The CTE + join to ld is a solid improvement for throughput.
Also applies to: 99-99
283-289: Search operator &@: ensure supporting index exists.If using PGroonga, confirm indexes cover these columns to avoid sequential scans.
95-97: Counts naming: dislike_count vs value = -1.Confirm API semantics (net vs absolute). Looks consistent here; just flagging for client alignment.
102-116: Find all JOINs to user_hide_snippets that lack per-user filteringPrevious rg command failed with a PCRE2 compile error — run the corrected commands below and inspect matches for per-user filters (e.g., current_user_id, uhs.user, uhs.user_id).
#!/bin/bash # list join occurrences with context rg -n -C3 --type=sql 'JOIN\s+user_hide_snippets\s+uhs\b' || true # for each file with a join, show the join and nearby lines that mention per-user filters rg --type=sql -l 'JOIN\s+user_hide_snippets\s+uhs\b' | sort -u | while read -r f; do echo "=== $f ===" rg -n -C5 -- 'JOIN\s+user_hide_snippets\s+uhs\b|current_user_id|uhs\.user|uhs\.user_id|"user"' "$f" || true done
Important
Optimize snippet retrieval by refactoring SQL functions for label data aggregation and adding indexes for improved query performance.
get_snippet_labels_function.sqlto replace user-specific upvote details with aggregated upvote counts and a boolean for current user upvote status.get_snippets_function.sqlto pre-compute label data with upvote counts using a CTE, improving performance by reducing repeated calculations.optimize_label_indexes.sqlto create indexes onlabel_upvotesandsnippetsto optimize query performance.search_related_public_snippets.sql, removing the functionsearch_related_snippets_publicwhich handled snippet similarity searches.This description was created by
for 27b403a. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit