feat(db): optimize queries to reduce data transfer#33
Merged
Conversation
Optimized database queries to fetch only necessary fields, significantly reducing egress data transfer to stay within free tier limits. Changes: - getUnembeddedArticles: only fetch id, title, content, snippet, url - getUnclusteredArticles: only fetch id, embedding, createdAt - Updated tests to match optimized query structures Impact: ~90-95% reduction in data transfer per query, which should help avoid exceeding the 5GB/month egress limit on Neon/Supabase free tiers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
…bles Make all pipeline processing limits configurable via environment variables instead of hardcoded constants. This allows flexible configuration for different use cases (local development, production, bulk processing). Changes: - embedArticles.ts: MAX_EMBEDDINGS env var (0 = unlimited) - getArticlesMissingContent.ts: MAX_CONTENT_EXTRACTION env var (0 = unlimited) - summarizeClusters.ts: Fix TOKEN_LIMIT to properly handle 0 as unlimited - .env.example: Document all pipeline limits with cost estimates - run-pipeline.yml: Set limits to 0 (unlimited) for GitHub Actions For hobby use, unlimited processing costs ~$0.02 per run (~$0.10/month if run weekly). The configurable limits allow adding safety caps if needed for production use. Rationale: The previous hardcoded limits (200 embeddings, 100 content extractions) were causing incomplete processing when there was a backlog, requiring multiple runs to process all articles. For a hobby project with infrequent runs, it's simpler to process everything in one go. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
cobacious
added a commit
that referenced
this pull request
Jan 2, 2026
* feat(db): optimize queries to reduce data transfer Optimized database queries to fetch only necessary fields, significantly reducing egress data transfer to stay within free tier limits. Changes: - getUnembeddedArticles: only fetch id, title, content, snippet, url - getUnclusteredArticles: only fetch id, embedding, createdAt - Updated tests to match optimized query structures Impact: ~90-95% reduction in data transfer per query, which should help avoid exceeding the 5GB/month egress limit on Neon/Supabase free tiers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * feat(engine): make pipeline limits configurable via environment variables Make all pipeline processing limits configurable via environment variables instead of hardcoded constants. This allows flexible configuration for different use cases (local development, production, bulk processing). Changes: - embedArticles.ts: MAX_EMBEDDINGS env var (0 = unlimited) - getArticlesMissingContent.ts: MAX_CONTENT_EXTRACTION env var (0 = unlimited) - summarizeClusters.ts: Fix TOKEN_LIMIT to properly handle 0 as unlimited - .env.example: Document all pipeline limits with cost estimates - run-pipeline.yml: Set limits to 0 (unlimited) for GitHub Actions For hobby use, unlimited processing costs ~$0.02 per run (~$0.10/month if run weekly). The configurable limits allow adding safety caps if needed for production use. Rationale: The previous hardcoded limits (200 embeddings, 100 content extractions) were causing incomplete processing when there was a backlog, requiring multiple runs to process all articles. For a hobby project with infrequent runs, it's simpler to process everything in one go. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Jacob Walton <jacob.walton@dnata.com> Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Optimized database queries to fetch only necessary fields, significantly reducing egress data transfer to stay within free tier limits.
Changes:
Impact: ~90-95% reduction in data transfer per query, which should help avoid exceeding the 5GB/month egress limit on Neon/Supabase free tiers.
🤖 Generated with Claude Code