fix: fix benchmarks configuration#101
Conversation
📝 WalkthroughWalkthroughBenchmarks switched from .json to .log inputs; k6 scripts now assert HTTP 200 responses; seq-db Docker Compose bumped to v0.61.0 and switched to config-file startup with a new benchmarks config; Elastic compose healthcheck and port mapping adjusted; README links updated; .dockerignore consolidated. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Assessment against linked issues
Possibly related PRs
Suggested labels
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)benchmarks/k6/seq-db-fetch-5k-fulltext-2.js (6)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (2)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
benchmarks/dataset/distribute.py (2)
4-16: rawcount undercounts files missing a trailing newline.If the last line doesn’t end with “\n”, it’s dropped from the count, cascading into distribution errors. Safer chunked counting below.
-def rawcount(filename): - lines = 0 - - with open(filename, 'rb') as f: - buf_size = 1024 * 1024 - read_f = f.raw.read - - buf = read_f(buf_size) - while buf: - lines += buf.count(b'\n') - buf = read_f(buf_size) - - return lines +def rawcount(filename): + lines = 0 + last = None + buf_size = 1024 * 1024 + with open(filename, 'rb') as f: + while True: + buf = f.read(buf_size) + if not buf: + break + lines += buf.count(b'\n') + last = buf[-1:] + if last not in (b'\n', None): + lines += 1 + return lines
33-41: Remainder lines are dropped; potential IndexError and file descriptor leaks.
- base_chunk_size ignores input_count % num_files, so extra lines never get written.
- When undercounted, advancing past the last input file can raise IndexError.
- Input/output files aren’t managed via context managers (ruff SIM115).
Refactor to stream lines once, distribute evenly (including remainder), and manage FDs safely.
- input = [] - input_count = 0 - - for filename in os.listdir(args.directory): - if filename.endswith('.log'): - full_path = os.path.join(args.directory, filename) - input_count += rawcount(full_path) - input.append(open(full_path, 'r')) + paths = [] + input_count = 0 + for filename in os.listdir(args.directory): + if filename.endswith('.log'): + full_path = os.path.join(args.directory, filename) + input_count += rawcount(full_path) + paths.append(full_path) @@ - base_chunk_size = input_count // args.num_files + base_chunk_size = input_count // args.num_files + remainder = input_count % args.num_files @@ - index = 0 - for i in range(args.num_files): - written = 0 - - output = open(f"docs-{i}.log", "w+") - file = input[index] - - while written < base_chunk_size: - line = file.readline() - - if line == "": - file.close() - - index += 1 - file = input[index] - - continue - - output.write(line) - written += 1 - - output.close() + def iter_lines(paths): + for p in paths: + with open(p, 'r') as infile: + for line in infile: + yield line + + line_iter = iter_lines(paths) + + for i in range(args.num_files): + target = base_chunk_size + (1 if i < remainder else 0) + with open(f"docs-{i}.log", "w") as output: + for _ in range(target): + try: + output.write(next(line_iter)) + except StopIteration: + breakAlso applies to: 48-71
♻️ Duplicate comments (4)
benchmarks/k6/seq-db-fetch-5k.js (1)
35-35: Use strict equality and avoid param shadowing (same nit as other k6 scripts).
See seq-db-aggs-min-by-status.js Line 37 suggestion.Apply:
- check(res, { "200-ok": (res) => res.status == 200}); + check(res, { "200-ok": (r) => r.status === 200 });benchmarks/k6/es-aggs-min-by-status.js (1)
38-38: Use strict equality and avoid param shadowing in the check.
Same nit as other scripts.- check(res, { "200-ok": (res) => res.status == 200}); + check(res, { "200-ok": (r) => r.status === 200 });benchmarks/k6/es-fetch-5k.js (1)
30-30: Use strict equality and avoid param shadowing in the check.
Same nit as above.- check(res, { "200-ok": (res) => res.status == 200}); + check(res, { "200-ok": (r) => r.status === 200 });benchmarks/k6/es-fetch-5k-fulltext.js (1)
30-30: Use strict equality and avoid param shadowing in the check.
Same nit as other files.- check(res, { "200-ok": (res) => res.status == 200}); + check(res, { "200-ok": (r) => r.status === 200 });
🧹 Nitpick comments (8)
benchmarks/k6/seq-db-aggs.js (1)
35-36: Nit: use strict equality in the k6 check- check(res, { "200-ok": (res) => res.status == 200}); + check(res, { "200-ok": (res) => res.status === 200 });Also consider adding a
checksthreshold inoptionsso check failures fail the run (see similar note in es-aggs.js).benchmarks/k6/es-fetch-5k-range.js (1)
30-30: Nit: use strict equality- check(res, { "200-ok": (res) => res.status == 200}); + check(res, { "200-ok": (res) => res.status === 200 });benchmarks/k6/seq-db-fetch-5k-range.js (1)
35-35: Nit: use strict equality- check(res, { "200-ok": (res) => res.status == 200}); + check(res, { "200-ok": (res) => res.status === 200 });benchmarks/k6/es-fetch-5k-fulltext-2.js (1)
30-30: Nit: use strict equality- check(res, { "200-ok": (res) => res.status == 200}); + check(res, { "200-ok": (res) => res.status === 200 });benchmarks/k6/seq-db-aggs-min-by-status.js (1)
37-37: Use strict equality and avoid param shadowing in the check.
Prefer===and rename the lambda arg to avoid shadowingres.Apply:
- check(res, { "200-ok": (res) => res.status == 200}); + check(res, { "200-ok": (r) => r.status === 200 });Additionally (optional, outside these lines), add thresholds so benchmarks fail fast when checks fail:
export let options = { vus: 1, iterations: 5, thresholds: { checks: ['rate==1.0'], http_req_failed: ['rate==0'], }, };benchmarks/docker-compose-elastic.yml (1)
21-22: Avoid host port clash; make 9200 mapping configurable.Multiple stacks commonly use 9200. Map to a high host port by default and allow override via env.
- ports: - - '9200:9200' + ports: + - '${ELASTIC_PORT:-19200}:9200'benchmarks/configs/seqdb/config.yaml (1)
12-16: Confirm parity with previously used CLI flagsEarlier compose passed flags like --hot-stores, --max-token-size, and --requests-limit. The config now sets query_rate/search_requests/bulk_requests/inflight_bulks, but not the others. Verify these semantics cover prior runtime tuning or add missing fields.
Would you like me to draft a config matching the prior CLI options (1:1) if those knobs still exist in v0.61.0?
benchmarks/docker-compose-seqdb.yml (1)
20-33: Optional: add healthcheck and wait-for-healthy dependencydepends_on doesn’t wait for readiness. A healthcheck on seq-db plus condition makes file.d start after seq-db is ready, reducing early POST failures in k6/file.d.
seq-db: + healthcheck: + test: ["CMD", "wget", "-qO-", "http://localhost:9002/healthz"] + interval: 10s + timeout: 3s + retries: 10 filed: depends_on: - - seq-db + seq-db: + condition: service_healthy
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (24)
.dockerignore(1 hunks)benchmarks/README.md(2 hunks)benchmarks/configs/file.d/elastic.yaml(1 hunks)benchmarks/configs/file.d/seq-db.yaml(1 hunks)benchmarks/configs/seqdb/config.yaml(1 hunks)benchmarks/configs/vmsingle/vmsingle.yml(1 hunks)benchmarks/dataset/distribute.py(3 hunks)benchmarks/dataset/logs/download.sh(1 hunks)benchmarks/docker-compose-elastic.yml(1 hunks)benchmarks/docker-compose-seqdb.yml(1 hunks)benchmarks/k6/es-aggs-min-by-status.js(2 hunks)benchmarks/k6/es-aggs.js(2 hunks)benchmarks/k6/es-fetch-5k-fulltext-2.js(2 hunks)benchmarks/k6/es-fetch-5k-fulltext.js(2 hunks)benchmarks/k6/es-fetch-5k-range.js(2 hunks)benchmarks/k6/es-fetch-5k.js(2 hunks)benchmarks/k6/es-paging.js(2 hunks)benchmarks/k6/seq-db-aggs-min-by-status.js(2 hunks)benchmarks/k6/seq-db-aggs.js(2 hunks)benchmarks/k6/seq-db-fetch-5k-fulltext-2.js(3 hunks)benchmarks/k6/seq-db-fetch-5k-fulltext.js(2 hunks)benchmarks/k6/seq-db-fetch-5k-range.js(2 hunks)benchmarks/k6/seq-db-fetch-5k.js(2 hunks)benchmarks/k6/seq-db-paging.js(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
benchmarks/k6/seq-db-aggs.js (3)
benchmarks/k6/es-aggs.js (1)
res(23-27)benchmarks/k6/seq-db-aggs-min-by-status.js (1)
res(31-35)benchmarks/k6/seq-db-fetch-5k.js (1)
res(29-33)
benchmarks/k6/seq-db-paging.js (2)
benchmarks/k6/seq-db-aggs-min-by-status.js (3)
res(31-35)BASE_URL(4-4)query(13-29)benchmarks/k6/seq-db-fetch-5k.js (3)
res(29-33)BASE_URL(4-4)query(17-27)
benchmarks/k6/seq-db-fetch-5k-fulltext.js (4)
benchmarks/k6/seq-db-aggs-min-by-status.js (3)
res(31-35)BASE_URL(4-4)query(13-29)benchmarks/k6/seq-db-fetch-5k-fulltext-2.js (3)
res(29-33)BASE_URL(4-4)query(17-27)benchmarks/k6/seq-db-fetch-5k.js (3)
res(29-33)BASE_URL(4-4)query(17-27)benchmarks/k6/seq-db-paging.js (3)
res(29-33)BASE_URL(4-4)query(17-27)
benchmarks/k6/seq-db-fetch-5k-range.js (2)
benchmarks/k6/seq-db-aggs-min-by-status.js (3)
res(31-35)BASE_URL(4-4)query(13-29)benchmarks/k6/seq-db-fetch-5k.js (3)
res(29-33)BASE_URL(4-4)query(17-27)
benchmarks/k6/seq-db-fetch-5k.js (3)
benchmarks/k6/seq-db-aggs-min-by-status.js (3)
res(31-35)BASE_URL(4-4)query(13-29)benchmarks/k6/seq-db-aggs.js (3)
res(29-33)BASE_URL(4-4)query(13-27)benchmarks/k6/seq-db-fetch-5k-fulltext-2.js (3)
res(29-33)BASE_URL(4-4)query(17-27)
benchmarks/k6/seq-db-fetch-5k-fulltext-2.js (1)
benchmarks/k6/seq-db-fetch-5k-fulltext.js (3)
res(29-33)BASE_URL(4-4)query(17-27)
🪛 Ruff (0.12.2)
benchmarks/dataset/distribute.py
54-54: Use a context manager for opening files
(SIM115)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test
🔇 Additional comments (17)
benchmarks/k6/seq-db-aggs-min-by-status.js (1)
2-2: Good addition: import check from k6.
This aligns the script with the new 200-status assertions.benchmarks/k6/seq-db-fetch-5k.js (1)
2-2: Good addition: import check from k6.
Consistent with other scripts.benchmarks/k6/es-aggs-min-by-status.js (1)
2-2: Good addition: import check from k6.benchmarks/k6/es-fetch-5k.js (1)
2-2: Good addition: import check from k6.benchmarks/k6/es-fetch-5k-fulltext.js (1)
2-2: Good addition: import check from k6.benchmarks/k6/seq-db-fetch-5k-fulltext-2.js (1)
2-2: Import looks goodImporting
checkalongsidesleepaligns this benchmark with the rest.benchmarks/k6/seq-db-paging.js (1)
2-2: Import update is correctbenchmarks/k6/seq-db-fetch-5k-fulltext.js (1)
2-2: Import change LGTMbenchmarks/k6/es-paging.js (1)
2-2: Import addition is appropriatebenchmarks/docker-compose-elastic.yml (1)
27-27: Relax Elasticsearch healthcheck to accept yellow status
Single-node clusters default to yellow; query withwait_for_status=yellowand match either status. The official elasticsearch:8.17.4 image already includes a statically builtcurlbinary (github.com)- test: [ "CMD-SHELL", "curl -f -s http://localhost:9200/_cluster/health | grep 'green'" ] + test: [ "CMD-SHELL", "curl -fsS 'http://localhost:9200/_cluster/health?wait_for_status=yellow&timeout=0s' | grep -E '\"status\":\"(green|yellow)\"'" ]benchmarks/configs/file.d/elastic.yaml (1)
14-14: .log glob switch — LGTM.Matches the rest of the PR (.log everywhere). No concerns.
benchmarks/configs/file.d/seq-db.yaml (1)
14-14: .log glob switch — LGTM.Consistent with dataset and pipeline updates.
benchmarks/dataset/distribute.py (1)
21-21: Help text update — LGTM.Accurately reflects the .log input change.
benchmarks/README.md (1)
64-64: Good: updated results linkThe new docs URL looks correct and non-dead.
.dockerignore (1)
14-15: Approve ignoring/benchmarksin.dockerignore
NoCOPY/ADDofbenchmarksin any Dockerfile or build context in Compose files—excluding/benchmarkswon’t impact Docker builds.benchmarks/configs/seqdb/config.yaml (1)
6-8: mapping.yaml file exists
mapping.yaml is present at benchmarks/configs/seqdb/mapping.yaml; no further action needed.benchmarks/docker-compose-seqdb.yml (1)
10-10: LGTM: upgraded seq-db imageBumping to v0.61.0 aligns with the new config-driven startup.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #101 +/- ##
==========================================
+ Coverage 71.83% 71.86% +0.03%
==========================================
Files 200 200
Lines 18076 18076
==========================================
+ Hits 12984 12990 +6
+ Misses 4371 4367 -4
+ Partials 721 719 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-authored-by: Timofey Sedov <42837378+moflotas@users.noreply.github.com>
Fixes #69
Summary by CodeRabbit