feat(db): optimize queries to reduce data transfer by cobacious · Pull Request #33 · cobacious/neus

cobacious · 2025-12-16T14:29:12Z

Optimized database queries to fetch only necessary fields, significantly reducing egress data transfer to stay within free tier limits.

Changes:

getUnembeddedArticles: only fetch id, title, content, snippet, url
getUnclusteredArticles: only fetch id, embedding, createdAt
Updated tests to match optimized query structures

Impact: ~90-95% reduction in data transfer per query, which should help avoid exceeding the 5GB/month egress limit on Neon/Supabase free tiers.

🤖 Generated with Claude Code

Optimized database queries to fetch only necessary fields, significantly reducing egress data transfer to stay within free tier limits. Changes: - getUnembeddedArticles: only fetch id, title, content, snippet, url - getUnclusteredArticles: only fetch id, embedding, createdAt - Updated tests to match optimized query structures Impact: ~90-95% reduction in data transfer per query, which should help avoid exceeding the 5GB/month egress limit on Neon/Supabase free tiers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

vercel · 2025-12-16T14:29:17Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
neus	Ready	Preview, Comment	Dec 19, 2025 4:45pm

…bles Make all pipeline processing limits configurable via environment variables instead of hardcoded constants. This allows flexible configuration for different use cases (local development, production, bulk processing). Changes: - embedArticles.ts: MAX_EMBEDDINGS env var (0 = unlimited) - getArticlesMissingContent.ts: MAX_CONTENT_EXTRACTION env var (0 = unlimited) - summarizeClusters.ts: Fix TOKEN_LIMIT to properly handle 0 as unlimited - .env.example: Document all pipeline limits with cost estimates - run-pipeline.yml: Set limits to 0 (unlimited) for GitHub Actions For hobby use, unlimited processing costs ~$0.02 per run (~$0.10/month if run weekly). The configurable limits allow adding safety caps if needed for production use. Rationale: The previous hardcoded limits (200 embeddings, 100 content extractions) were causing incomplete processing when there was a backlog, requiring multiple runs to process all articles. For a hobby project with infrequent runs, it's simpler to process everything in one go. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

* feat(db): optimize queries to reduce data transfer Optimized database queries to fetch only necessary fields, significantly reducing egress data transfer to stay within free tier limits. Changes: - getUnembeddedArticles: only fetch id, title, content, snippet, url - getUnclusteredArticles: only fetch id, embedding, createdAt - Updated tests to match optimized query structures Impact: ~90-95% reduction in data transfer per query, which should help avoid exceeding the 5GB/month egress limit on Neon/Supabase free tiers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * feat(engine): make pipeline limits configurable via environment variables Make all pipeline processing limits configurable via environment variables instead of hardcoded constants. This allows flexible configuration for different use cases (local development, production, bulk processing). Changes: - embedArticles.ts: MAX_EMBEDDINGS env var (0 = unlimited) - getArticlesMissingContent.ts: MAX_CONTENT_EXTRACTION env var (0 = unlimited) - summarizeClusters.ts: Fix TOKEN_LIMIT to properly handle 0 as unlimited - .env.example: Document all pipeline limits with cost estimates - run-pipeline.yml: Set limits to 0 (unlimited) for GitHub Actions For hobby use, unlimited processing costs ~$0.02 per run (~$0.10/month if run weekly). The configurable limits allow adding safety caps if needed for production use. Rationale: The previous hardcoded limits (200 embeddings, 100 content extractions) were causing incomplete processing when there was a backlog, requiring multiple runs to process all articles. For a hobby project with infrequent runs, it's simpler to process everything in one go. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Jacob Walton <jacob.walton@dnata.com> Co-authored-by: Claude <noreply@anthropic.com>

vercel Bot deployed to Preview December 16, 2025 14:29 View deployment

vercel Bot deployed to Preview December 19, 2025 16:45 View deployment

cobacious merged commit a96d938 into main Jan 2, 2026
3 checks passed

cobacious deleted the optimize-data-transfer branch January 2, 2026 10:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(db): optimize queries to reduce data transfer#33

feat(db): optimize queries to reduce data transfer#33
cobacious merged 2 commits intomainfrom
optimize-data-transfer

cobacious commented Dec 16, 2025

Uh oh!

vercel Bot commented Dec 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cobacious commented Dec 16, 2025

Uh oh!

vercel Bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Dec 16, 2025 •

edited

Loading