🚨 Lighthouse: [Best Practice/Perf Fix] Optimize concurrent COUNT queries#293
🚨 Lighthouse: [Best Practice/Perf Fix] Optimize concurrent COUNT queries#293ldsgroups225 wants to merge 1 commit intomasterfrom
Conversation
Replaced multiple parallel database `count()` queries in `packages/data-ops/src/queries/analytics.ts` with single aggregated queries utilizing PostgreSQL's `FILTER (WHERE ...)` clause. This change eliminates redundant connection overhead and redundant table scans when fetching complex analytics metrics across the same table for different conditions.
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
📝 WalkthroughWalkthroughThe PR consolidates multiple sequential count queries in analytics functions into a single aggregated PostgreSQL query using SQL Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/data-ops/src/queries/analytics.ts (1)
344-357:⚠️ Potential issue | 🔴 CriticalEnforce mandatory tenant scoping for student aggregates.
These queries run on
studentswith an optionalschoolIdpredicate, which allows unscoped cross-school counts whenschoolIdis missing. MakeschoolIdrequired (or fail early) and always applywhere(eq(students.schoolId, schoolId)).As per coding guidelines: Every query on school-scoped tables MUST include
where(eq(table.schoolId, schoolId))for multi-tenant isolation.Also applies to: 413-423
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/data-ops/src/queries/analytics.ts` around lines 344 - 357, The student aggregation allows unscoped queries when schoolId is missing; update the logic in this function so tenant scoping is mandatory: if schoolId is falsy, throw/return early with an error, and always include where(eq(students.schoolId, schoolId)) (instead of the optional baseConditions) in the db.select and the other student queries (referencing symbols: students, baseConditions, db.select and the second query block around lines 413-423) so every students table query enforces multi-tenant isolation.
🧹 Nitpick comments (1)
packages/data-ops/src/queries/analytics.ts (1)
416-423: Remove single-itemPromise.allingetEnrollmentGrowth.This now wraps only one DB call; replacing it with a direct
awaitmakes intent clearer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/data-ops/src/queries/analytics.ts` around lines 416 - 423, The Promise.all wrapper around a single DB call in getEnrollmentGrowth is unnecessary; replace awaiting Promise.all([...]) with a direct await of db.select(...) and adjust the destructuring accordingly. Specifically, call db.select({...}).from(students).where(baseConditions.length > 0 ? and(...baseConditions) : undefined) with await, and assign the result to growthStats using a single-level destructure (replace the current const [[growthStats]] = await Promise.all([...]) pattern). Keep the same selected fields (current using currentStartDate and previous using previousStartDate) and the same baseConditions/db/select/students symbols.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.jules/bolt.md:
- Line 1: The journal heading currently uses the literal shell expression
"$(date +%Y-%m-%d) - Optimize concurrent COUNT queries in analytics"; replace
the $(date +%Y-%m-%d) token with an explicit ISO date (e.g., 2026-04-12) so the
markdown contains a concrete date string in the heading; update the heading line
that begins with "$(date +%Y-%m-%d)" to the chosen fixed date followed by " -
Optimize concurrent COUNT queries in analytics".
In `@packages/data-ops/src/queries/analytics.ts`:
- Around line 47-57: The DB calls and Promise.all in analytics.ts must be
converted to return ResultAsync<T, DatabaseError> and have errors logged via
tapLogErr; replace the raw await/Promise usage around db.select(...) and the
parallel calls to getDailyActiveUsers, getUserActivityCount,
getMonthlyActiveUsers, getAverageResponseTime with ResultAsync.fromPromise(...)
(or ResultAsync.all/sequence) and chain .mapErr(tapLogErr(logger,
'analytics:<describe operation>')) so the function returns ResultAsync rather
than a raw Promise; ensure each call (e.g., the db.select block referencing
schools.createdAt, getDailyActiveUsers(startDate),
getUserActivityCount(startDate, now), getUserActivityCount(previousStartDate,
startDate), getMonthlyActiveUsers(...), getAverageResponseTime(startDate)) is
wrapped and combined via ResultAsync utilities and that the final returned value
uses .mapErr(tapLogErr(...)) to satisfy the coding guideline.
---
Outside diff comments:
In `@packages/data-ops/src/queries/analytics.ts`:
- Around line 344-357: The student aggregation allows unscoped queries when
schoolId is missing; update the logic in this function so tenant scoping is
mandatory: if schoolId is falsy, throw/return early with an error, and always
include where(eq(students.schoolId, schoolId)) (instead of the optional
baseConditions) in the db.select and the other student queries (referencing
symbols: students, baseConditions, db.select and the second query block around
lines 413-423) so every students table query enforces multi-tenant isolation.
---
Nitpick comments:
In `@packages/data-ops/src/queries/analytics.ts`:
- Around line 416-423: The Promise.all wrapper around a single DB call in
getEnrollmentGrowth is unnecessary; replace awaiting Promise.all([...]) with a
direct await of db.select(...) and adjust the destructuring accordingly.
Specifically, call db.select({...}).from(students).where(baseConditions.length >
0 ? and(...baseConditions) : undefined) with await, and assign the result to
growthStats using a single-level destructure (replace the current const
[[growthStats]] = await Promise.all([...]) pattern). Keep the same selected
fields (current using currentStartDate and previous using previousStartDate) and
the same baseConditions/db/select/students symbols.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: cf2ff56b-3737-4ae0-a661-ba22e78c061d
📒 Files selected for processing (3)
.jules/bolt.mdpackages/data-ops/src/drizzle/auth-schema.tspackages/data-ops/src/queries/analytics.ts
| ## 2025-02-23 - [Parallelize Independent Queries with Promise.all()] | ||
| **Learning:** Sequential `for...of` loops that `await` independent database queries (like iterating over active school years to fetch report card discovery units) create severe N+1 bottlenecks. | ||
| **Action:** Always replace sequential `await` loops for read-only queries with `Promise.all()` mapped over the array to allow the database driver to process the independent queries concurrently, effectively eliminating the N+1 I/O wait. | ||
| ## $(date +%Y-%m-%d) - Optimize concurrent COUNT queries in analytics |
There was a problem hiding this comment.
Use a concrete date in the journal heading.
$(date +%Y-%m-%d) will remain literal text in markdown unless a preprocessing step replaces it. Consider committing an explicit date.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.jules/bolt.md at line 1, The journal heading currently uses the literal
shell expression "$(date +%Y-%m-%d) - Optimize concurrent COUNT queries in
analytics"; replace the $(date +%Y-%m-%d) token with an explicit ISO date (e.g.,
2026-04-12) so the markdown contains a concrete date string in the heading;
update the heading line that begins with "$(date +%Y-%m-%d)" to the chosen fixed
date followed by " - Optimize concurrent COUNT queries in analytics".
| db.select({ | ||
| total: sql<number>`count(*)`, | ||
| currentPeriod: sql<number>`count(*) filter (where ${schools.createdAt} >= ${startDate})`, | ||
| previousPeriod: sql<number>`count(*) filter (where ${schools.createdAt} >= ${previousStartDate} and ${schools.createdAt} < ${startDate})`, | ||
| }).from(schools), | ||
| getDailyActiveUsers(startDate), | ||
| getUserActivityCount(startDate, now), | ||
| getUserActivityCount(previousStartDate, startDate), | ||
| getMonthlyActiveUsers(new Date(now.getTime() - 365 * 24 * 60 * 60 * 1000)), | ||
| getAverageResponseTime(startDate), | ||
| ]) |
There was a problem hiding this comment.
Wrap updated DB operations in ResultAsync with tapLogErr.
The updated query paths still use raw await/Promise chains instead of the required ResultAsync<T, DatabaseError> flow with .mapErr(tapLogErr(logger, context)).
As per coding guidelines: Every query function MUST return ResultAsync<T, DatabaseError> from '@praha/byethrow' and use tapLogErr for error logging.
Also applies to: 124-141, 351-358, 419-423
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/data-ops/src/queries/analytics.ts` around lines 47 - 57, The DB
calls and Promise.all in analytics.ts must be converted to return ResultAsync<T,
DatabaseError> and have errors logged via tapLogErr; replace the raw
await/Promise usage around db.select(...) and the parallel calls to
getDailyActiveUsers, getUserActivityCount, getMonthlyActiveUsers,
getAverageResponseTime with ResultAsync.fromPromise(...) (or
ResultAsync.all/sequence) and chain .mapErr(tapLogErr(logger,
'analytics:<describe operation>')) so the function returns ResultAsync rather
than a raw Promise; ensure each call (e.g., the db.select block referencing
schools.createdAt, getDailyActiveUsers(startDate),
getUserActivityCount(startDate, now), getUserActivityCount(previousStartDate,
startDate), getMonthlyActiveUsers(...), getAverageResponseTime(startDate)) is
wrapped and combined via ResultAsync utilities and that the final returned value
uses .mapErr(tapLogErr(...)) to satisfy the coding guideline.
Impact on performance
FILTER (WHERE ...)clause insidecount(), the database can calculate multiple subsets of counts (e.g., active, inactive, graduated students) in a single pass over the table data.Technical Rationale
Previously, analytics endpoints (e.g.,
getAnalyticsOverview,getSchoolsPerformance,getEnrollmentStats, andgetEnrollmentGrowth) mapped multiple independent count aggregations inside aPromise.allblock. While executed in parallel at the Node layer, this dispatched individual SQL queries, overwhelming the database connection pool and forcing the DB engine to scan the same tables repeatedly. Grouping them into a singleSELECTwith conditional aggregation (count(*) filter (where ...)) brings the O(N) connections down to O(1) and maximizes PostgreSQL query planner efficiency.PR created automatically by Jules for task 16578561623722322682 started by @ldsgroups225
Summary by CodeRabbit
Performance
Chores