Guidellm fails for max-req 60 with streaming

**Describe the bug**
When I use the following profile with command `python -m guidellm benchmark --scenario profile.yaml --output-path here.json --request-type text_completions`, I keep getting errors as follows. But when I reduce max-reqs to say 25, the errors are gone

profile
```yaml
target: "url"
rate-type: sweep
max-requests: 60
rate: 5
random-seed: 42
data:
  prefix_tokens: 256
  prompt_tokens: 256
  prompt_tokens_stdev: 100
  prompt_tokens_min: 2
  prompt_tokens_max: 800
  output_tokens: 256
  output_tokens_stdev: 100
  output_tokens_min: 1
  output_tokens_max: 1024
```
err:
```
[run-workload]   File "/workspace/data/guidellm/httpx/_transports/default.py", line 101, in map_httpcore_exceptions
[run-workload]     yield
[run-workload]   File "/workspace/data/guidellm/httpx/_transports/default.py", line 394, in handle_async_request
[run-workload]     resp = await self._pool.handle_async_request(req)
[run-workload]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[run-workload]   File "/workspace/data/guidellm/httpcore/_async/connection_pool.py", line 256, in handle_async_request
[run-workload]     raise exc from None
[run-workload]   File "/workspace/data/guidellm/httpcore/_async/connection_pool.py", line 236, in handle_async_request
[run-workload]     response = await connection.handle_async_request(
[run-workload]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[run-workload]   File "/workspace/data/guidellm/httpcore/_async/connection.py", line 101, in handle_async_request
[run-workload]     raise exc
[run-workload]   File "/workspace/data/guidellm/httpcore/_async/connection.py", line 78, in handle_async_request
[run-workload]     stream = await self._connect(request)
[run-workload]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[run-workload]   File "/workspace/data/guidellm/httpcore/_async/connection.py", line 124, in _connect
[run-workload]     stream = await self._network_backend.connect_tcp(**kwargs)
[run-workload]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[run-workload]   File "/workspace/data/guidellm/httpcore/_backends/auto.py", line 31, in connect_tcp
[run-workload]     return await self._backend.connect_tcp(
[run-workload]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[run-workload]   File "/workspace/data/guidellm/httpcore/_backends/anyio.py", line 113, in connect_tcp
[run-workload]     with map_exceptions(exc_map):
[run-workload]          ^^^^^^^^^^^^^^^^^^^^^^^
[run-workload]   File "/workspace/data/guidellm/httpx/_transports/default.py", line 101, in map_httpcore_exceptions
[run-workload]     yield
[run-workload]   File "/usr/local/lib/python3.12/contextlib.py", line 158, in __exit__
[run-workload]     self.gen.throw(value)
[run-workload]   File "/workspace/data/guidellm/httpcore/_exceptions.py", line 14, in map_exceptions
[run-workload]     raise to_exc(exc) from exc
[run-workload]   File "/workspace/data/guidellm/httpx/_transports/default.py", line 394, in handle_async_request
[run-workload]     resp = await self._pool.handle_async_request(req)
[run-workload]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[run-workload]   File "/workspace/data/guidellm/httpcore/_async/connection_pool.py", line 256, in handle_async_request
[run-workload]     raise exc from None
[run-workload]   File "/workspace/data/guidellm/httpcore/_async/connection_pool.py", line 236, in handle_async_request
[run-workload]     response = await connection.handle_async_request(
[run-workload]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[run-workload]   File "/workspace/data/guidellm/httpcore/_async/connection.py", line 101, in handle_async_request
[run-workload]     raise exc
[run-workload]   File "/workspace/data/guidellm/httpx/_transports/default.py", line 101, in map_httpcore_exceptions
[run-workload]     yield
[run-workload]   File "/workspace/data/guidellm/httpcore/_async/connection.py", line 78, in handle_async_request
[run-workload]     stream = await self._connect(request)
[run-workload]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[run-workload]   File "/workspace/data/guidellm/httpx/_transports/default.py", line 101, in map_httpcore_exceptions
[run-workload]     yield
[run-workload]   File "/workspace/data/guidellm/httpx/_transports/default.py", line 101, in map_httpcore_exceptions
[run-workload]     yield
[run-workload]   File "/workspace/data/guidellm/httpcore/_async/connection.py", line 124, in _connect
[run-workload]     stream = await self._network_backend.connect_tcp(**kwargs)
[run-workload]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[run-workload] httpcore.ConnectError: All connection attempts failed
```


**Environment**
Include all relevant environment information:
I build guidellm from main




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Guidellm fails for max-req 60 with streaming #516

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Guidellm fails for max-req 60 with streaming #516

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions