Skip to content

Implement the batch request feature#11

Open
chakravarthik27 wants to merge 5 commits intomainfrom
implement-the-batch-request-feature
Open

Implement the batch request feature#11
chakravarthik27 wants to merge 5 commits intomainfrom
implement-the-batch-request-feature

Conversation

@chakravarthik27
Copy link
Copy Markdown

This pull request adds support for batch processing of requests in the benchmarking system, enabling more efficient execution when the underlying model client supports batch APIs (notably for OpenAI endpoints). The changes introduce a batch_size parameter throughout the execution and CLI layers, implement batch request logic in the executor, and add batch request support to relevant model clients. This allows multiple requests to be grouped and sent together, reducing overhead and improving performance.

Batch execution support in benchmarking:

  • Added a batch_size parameter to ExecutionSpec, the executor, and the CLI (--batch-size), allowing users to specify how many requests to process together when supported. The executor now processes requests in batches if batch_size is set. (src/helm/benchmark/executor.py [1] [2] [3] [4]; src/helm/benchmark/run.py [5] [6] [7] [8]

Batch request API in model clients:

  • Introduced a make_batch_request method to the base Client class, with a default implementation that raises NotImplementedError. (src/helm/clients/client.py src/helm/clients/client.pyR23-R26)
  • Implemented make_batch_request in AutoClient, which delegates batch requests to the appropriate underlying client and includes retry logic. (src/helm/clients/auto_client.py src/helm/clients/auto_client.pyR135-R160)
  • Added batch request support to OpenAIClient and OpenAIResponsesClient, including logic to prepare JSONL files, upload them, poll for completion, and parse batch results. (src/helm/clients/openai_client.py [1]; src/helm/clients/openai_responses_client.py [2] [3]

OpenAI batch API integration:

  • For both chat and responses endpoints, implemented logic to serialize requests as JSONL, upload to OpenAI, create and poll batch jobs, and parse output files to reconstruct individual RequestResult objects. (src/helm/clients/openai_client.py [1]; src/helm/clients/openai_responses_client.py [2]

Minor improvements:

  • Updated imports to support new batch-related types. (src/helm/benchmark/executor.py [1]; src/helm/clients/auto_client.py [2]
  • Added a prompt_cache_retention field to requests in OpenAIResponsesClient for batch compatibility. (src/helm/clients/openai_responses_client.py src/helm/clients/openai_responses_client.pyR115)

These changes collectively enable efficient batch processing throughout the benchmarking system, especially for OpenAI models, reducing request overhead and improving throughput.

Copy link
Copy Markdown

@blidiselalin blidiselalin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

…max retries

Co-authored-by: Copilot <copilot@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants