Skip to content

🚨 Lighthouse: [Best Practice/Perf Fix] Optimize concurrent COUNT queries#293

Open
ldsgroups225 wants to merge 1 commit intomasterfrom
lighthouse/optimize-analytics-counts-16578561623722322682
Open

🚨 Lighthouse: [Best Practice/Perf Fix] Optimize concurrent COUNT queries#293
ldsgroups225 wants to merge 1 commit intomasterfrom
lighthouse/optimize-analytics-counts-16578561623722322682

Conversation

@ldsgroups225
Copy link
Copy Markdown
Owner

@ldsgroups225 ldsgroups225 commented Apr 12, 2026

Impact on performance

  • Reduced Database Roundtrips: Consolidates multiple concurrent connections to the database into a single query per metric group.
  • Eliminated Redundant Table Scans: By leveraging PostgreSQL's FILTER (WHERE ...) clause inside count(), the database can calculate multiple subsets of counts (e.g., active, inactive, graduated students) in a single pass over the table data.

Technical Rationale

Previously, analytics endpoints (e.g., getAnalyticsOverview, getSchoolsPerformance, getEnrollmentStats, and getEnrollmentGrowth) mapped multiple independent count aggregations inside a Promise.all block. While executed in parallel at the Node layer, this dispatched individual SQL queries, overwhelming the database connection pool and forcing the DB engine to scan the same tables repeatedly. Grouping them into a single SELECT with conditional aggregation (count(*) filter (where ...)) brings the O(N) connections down to O(1) and maximizes PostgreSQL query planner efficiency.


PR created automatically by Jules for task 16578561623722322682 started by @ldsgroups225

Summary by CodeRabbit

  • Performance

    • Optimized analytics query performance to compute metrics more efficiently with reduced database load.
  • Chores

    • Updated internal schema formatting and documentation.

Replaced multiple parallel database `count()` queries in `packages/data-ops/src/queries/analytics.ts` with single aggregated queries utilizing PostgreSQL's `FILTER (WHERE ...)` clause. This change eliminates redundant connection overhead and redundant table scans when fetching complex analytics metrics across the same table for different conditions.
@google-labs-jules
Copy link
Copy Markdown

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

📝 Walkthrough

Walkthrough

The PR consolidates multiple sequential count queries in analytics functions into a single aggregated PostgreSQL query using SQL FILTER (WHERE ...) expressions through Drizzle, while also standardizing quote formatting in the auth schema definitions.

Changes

Cohort / File(s) Summary
Analytics Query Consolidation
packages/data-ops/src/queries/analytics.ts, .jules/bolt.md
Refactored three analytics query functions (schools, students, growth stats) to replace multiple concurrent Promise.all count queries with single aggregated queries using SQL count(*) filter (where ...) expressions, reducing database round-trips and eliminating separate table scans.
Auth Schema Formatting
packages/data-ops/src/drizzle/auth-schema.ts
Standardized quote style from double to single quotes throughout table/column/index names, reordered imports from drizzle-orm, and reformatted pgTable definitions without altering schema structure or exported entity signatures.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 One hop beats many hops, we've learned,
Aggregate queries—efficiency earned!
Single scans where many once thrived,
Database bliss—our metrics now arrived! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title references optimizing concurrent COUNT queries, which is the central change across analytics.ts and the journal entry. However, it uses an emoji and categorical label that add noise rather than clarity.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch lighthouse/optimize-analytics-counts-16578561623722322682

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/data-ops/src/queries/analytics.ts (1)

344-357: ⚠️ Potential issue | 🔴 Critical

Enforce mandatory tenant scoping for student aggregates.

These queries run on students with an optional schoolId predicate, which allows unscoped cross-school counts when schoolId is missing. Make schoolId required (or fail early) and always apply where(eq(students.schoolId, schoolId)).

As per coding guidelines: Every query on school-scoped tables MUST include where(eq(table.schoolId, schoolId)) for multi-tenant isolation.

Also applies to: 413-423

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/data-ops/src/queries/analytics.ts` around lines 344 - 357, The
student aggregation allows unscoped queries when schoolId is missing; update the
logic in this function so tenant scoping is mandatory: if schoolId is falsy,
throw/return early with an error, and always include where(eq(students.schoolId,
schoolId)) (instead of the optional baseConditions) in the db.select and the
other student queries (referencing symbols: students, baseConditions, db.select
and the second query block around lines 413-423) so every students table query
enforces multi-tenant isolation.
🧹 Nitpick comments (1)
packages/data-ops/src/queries/analytics.ts (1)

416-423: Remove single-item Promise.all in getEnrollmentGrowth.

This now wraps only one DB call; replacing it with a direct await makes intent clearer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/data-ops/src/queries/analytics.ts` around lines 416 - 423, The
Promise.all wrapper around a single DB call in getEnrollmentGrowth is
unnecessary; replace awaiting Promise.all([...]) with a direct await of
db.select(...) and adjust the destructuring accordingly. Specifically, call
db.select({...}).from(students).where(baseConditions.length > 0 ?
and(...baseConditions) : undefined) with await, and assign the result to
growthStats using a single-level destructure (replace the current const
[[growthStats]] = await Promise.all([...]) pattern). Keep the same selected
fields (current using currentStartDate and previous using previousStartDate) and
the same baseConditions/db/select/students symbols.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.jules/bolt.md:
- Line 1: The journal heading currently uses the literal shell expression
"$(date +%Y-%m-%d) - Optimize concurrent COUNT queries in analytics"; replace
the $(date +%Y-%m-%d) token with an explicit ISO date (e.g., 2026-04-12) so the
markdown contains a concrete date string in the heading; update the heading line
that begins with "$(date +%Y-%m-%d)" to the chosen fixed date followed by " -
Optimize concurrent COUNT queries in analytics".

In `@packages/data-ops/src/queries/analytics.ts`:
- Around line 47-57: The DB calls and Promise.all in analytics.ts must be
converted to return ResultAsync<T, DatabaseError> and have errors logged via
tapLogErr; replace the raw await/Promise usage around db.select(...) and the
parallel calls to getDailyActiveUsers, getUserActivityCount,
getMonthlyActiveUsers, getAverageResponseTime with ResultAsync.fromPromise(...)
(or ResultAsync.all/sequence) and chain .mapErr(tapLogErr(logger,
'analytics:<describe operation>')) so the function returns ResultAsync rather
than a raw Promise; ensure each call (e.g., the db.select block referencing
schools.createdAt, getDailyActiveUsers(startDate),
getUserActivityCount(startDate, now), getUserActivityCount(previousStartDate,
startDate), getMonthlyActiveUsers(...), getAverageResponseTime(startDate)) is
wrapped and combined via ResultAsync utilities and that the final returned value
uses .mapErr(tapLogErr(...)) to satisfy the coding guideline.

---

Outside diff comments:
In `@packages/data-ops/src/queries/analytics.ts`:
- Around line 344-357: The student aggregation allows unscoped queries when
schoolId is missing; update the logic in this function so tenant scoping is
mandatory: if schoolId is falsy, throw/return early with an error, and always
include where(eq(students.schoolId, schoolId)) (instead of the optional
baseConditions) in the db.select and the other student queries (referencing
symbols: students, baseConditions, db.select and the second query block around
lines 413-423) so every students table query enforces multi-tenant isolation.

---

Nitpick comments:
In `@packages/data-ops/src/queries/analytics.ts`:
- Around line 416-423: The Promise.all wrapper around a single DB call in
getEnrollmentGrowth is unnecessary; replace awaiting Promise.all([...]) with a
direct await of db.select(...) and adjust the destructuring accordingly.
Specifically, call db.select({...}).from(students).where(baseConditions.length >
0 ? and(...baseConditions) : undefined) with await, and assign the result to
growthStats using a single-level destructure (replace the current const
[[growthStats]] = await Promise.all([...]) pattern). Keep the same selected
fields (current using currentStartDate and previous using previousStartDate) and
the same baseConditions/db/select/students symbols.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cf2ff56b-3737-4ae0-a661-ba22e78c061d

📥 Commits

Reviewing files that changed from the base of the PR and between 81815b8 and 4d08318.

📒 Files selected for processing (3)
  • .jules/bolt.md
  • packages/data-ops/src/drizzle/auth-schema.ts
  • packages/data-ops/src/queries/analytics.ts

Comment thread .jules/bolt.md
## 2025-02-23 - [Parallelize Independent Queries with Promise.all()]
**Learning:** Sequential `for...of` loops that `await` independent database queries (like iterating over active school years to fetch report card discovery units) create severe N+1 bottlenecks.
**Action:** Always replace sequential `await` loops for read-only queries with `Promise.all()` mapped over the array to allow the database driver to process the independent queries concurrently, effectively eliminating the N+1 I/O wait.
## $(date +%Y-%m-%d) - Optimize concurrent COUNT queries in analytics
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use a concrete date in the journal heading.

$(date +%Y-%m-%d) will remain literal text in markdown unless a preprocessing step replaces it. Consider committing an explicit date.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.jules/bolt.md at line 1, The journal heading currently uses the literal
shell expression "$(date +%Y-%m-%d) - Optimize concurrent COUNT queries in
analytics"; replace the $(date +%Y-%m-%d) token with an explicit ISO date (e.g.,
2026-04-12) so the markdown contains a concrete date string in the heading;
update the heading line that begins with "$(date +%Y-%m-%d)" to the chosen fixed
date followed by " - Optimize concurrent COUNT queries in analytics".

Comment on lines +47 to 57
db.select({
total: sql<number>`count(*)`,
currentPeriod: sql<number>`count(*) filter (where ${schools.createdAt} >= ${startDate})`,
previousPeriod: sql<number>`count(*) filter (where ${schools.createdAt} >= ${previousStartDate} and ${schools.createdAt} < ${startDate})`,
}).from(schools),
getDailyActiveUsers(startDate),
getUserActivityCount(startDate, now),
getUserActivityCount(previousStartDate, startDate),
getMonthlyActiveUsers(new Date(now.getTime() - 365 * 24 * 60 * 60 * 1000)),
getAverageResponseTime(startDate),
])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wrap updated DB operations in ResultAsync with tapLogErr.

The updated query paths still use raw await/Promise chains instead of the required ResultAsync<T, DatabaseError> flow with .mapErr(tapLogErr(logger, context)).

As per coding guidelines: Every query function MUST return ResultAsync<T, DatabaseError> from '@praha/byethrow' and use tapLogErr for error logging.

Also applies to: 124-141, 351-358, 419-423

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/data-ops/src/queries/analytics.ts` around lines 47 - 57, The DB
calls and Promise.all in analytics.ts must be converted to return ResultAsync<T,
DatabaseError> and have errors logged via tapLogErr; replace the raw
await/Promise usage around db.select(...) and the parallel calls to
getDailyActiveUsers, getUserActivityCount, getMonthlyActiveUsers,
getAverageResponseTime with ResultAsync.fromPromise(...) (or
ResultAsync.all/sequence) and chain .mapErr(tapLogErr(logger,
'analytics:<describe operation>')) so the function returns ResultAsync rather
than a raw Promise; ensure each call (e.g., the db.select block referencing
schools.createdAt, getDailyActiveUsers(startDate),
getUserActivityCount(startDate, now), getUserActivityCount(previousStartDate,
startDate), getMonthlyActiveUsers(...), getAverageResponseTime(startDate)) is
wrapped and combined via ResultAsync utilities and that the final returned value
uses .mapErr(tapLogErr(...)) to satisfy the coding guideline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant