chore: added async mysql db driver dependency#243
chore: added async mysql db driver dependency#243rohansen856 wants to merge 13 commits intoopenml:mainfrom
Conversation
Signed-off-by: rohansen856 <rohansen856@gmail.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThe PR converts the codebase to async DB usage: adds aiomysql and pytest-asyncio/pytest-xdist to project config, switches engine creation to AsyncEngine using mysql+aiomysql, replaces sqlalchemy Connection with AsyncConnection across database modules and routers, converts numerous functions and FastAPI route handlers to async/await, updates dependency helpers and tests to async fixtures, and adds an async application lifespan that closes databases on shutdown. Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: rohansen856 <rohansen856@gmail.com>
for more information, see https://pre-commit.ci
…nd users Signed-off-by: rohansen856 <rohansen856@gmail.com>
for more information, see https://pre-commit.ci
Signed-off-by: rohansen856 <rohansen856@gmail.com>
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- Several dependencies are annotated as
AsyncConnection | Noneand then immediately asserted non-None(e.g. indatasets,study,tasktype,mldcat_aprouters); consider making the dependency providers return a non-optionalAsyncConnectionand updating the type hints so theassertstatements are unnecessary. - In places where
user.get_groups()is awaited multiple times within a single request (e.g.datasets.update_dataset_status,study._get_study_raise_otherwise,study.attach_to_study), consider awaiting once and reusing the result to avoid repeated coroutine calls and keep the code clearer.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Several dependencies are annotated as `AsyncConnection | None` and then immediately asserted non-`None` (e.g. in `datasets`, `study`, `tasktype`, `mldcat_ap` routers); consider making the dependency providers return a non-optional `AsyncConnection` and updating the type hints so the `assert` statements are unnecessary.
- In places where `user.get_groups()` is awaited multiple times within a single request (e.g. `datasets.update_dataset_status`, `study._get_study_raise_otherwise`, `study.attach_to_study`), consider awaiting once and reusing the result to avoid repeated coroutine calls and keep the code clearer.
## Individual Comments
### Comment 1
<location path="src/database/users.py" line_range="34-38" />
<code_context>
-def get_user_groups_for(*, user_id: int, connection: Connection) -> list[UserGroup]:
- row = connection.execute(
+async def get_user_groups_for(*, user_id: int, connection: AsyncConnection) -> list[UserGroup]:
+ row = await connection.execute(
text(
</code_context>
<issue_to_address>
**suggestion:** Avoid re-wrapping UserGroup values and keep return type consistently primitive or enum.
`get_user_groups_for` currently builds `[UserGroup(group) for (group,) in rows]`, and `User.get_groups` then wraps those again via `[UserGroup(group_id) for group_id in groups]`. This double conversion is redundant and hides the actual return type of `get_user_groups_for`. Either return `list[int]` (raw IDs) and convert only in `User.get_groups`, or return `list[UserGroup]` here and remove the extra wrapping in `User.get_groups`.
Suggested implementation:
```python
async def get_user_groups_for(*, user_id: int, connection: AsyncConnection) -> list[int]:
```
```python
rows = result.fetchall()
return [group for (group,) in rows]
```
Changing `get_user_groups_for` to return `list[int]` means any call sites (other than `User.get_groups`, which already wraps raw group IDs) that expect `list[UserGroup]` will need to be updated to either:
1. Adjust their type hints to `list[int]`, or
2. Wrap the returned IDs via `UserGroup(group_id)` where enum instances are required.
Please search for all usages of `get_user_groups_for` and update their expectations and type hints accordingly.
</issue_to_address>
### Comment 2
<location path="tests/routers/openml/users_test.py" line_range="21" />
<code_context>
-def test_fetch_user(api_key: str, user: User, user_test: Connection) -> None:
- db_user = fetch_user(api_key, user_data=user_test)
+async def test_fetch_user(api_key: str, user: User, user_test: AsyncConnection) -> None:
+ db_user = await fetch_user(api_key, user_data=user_test)
assert db_user is not None
assert user.user_id == db_user.user_id
- assert user.groups == db_user.groups
+ assert user._groups == db_user._groups # noqa: SLF001
</code_context>
<issue_to_address>
**suggestion (testing):** Prefer exercising the public async `get_groups()` API instead of comparing the private `_groups` attribute.
Because the test now inspects `_groups` directly, it no longer validates the async group-fetching behavior and is more brittle to internal changes. Instead, await `get_groups()` on `db_user` and compare that result against the expected groups, e.g. by also using `get_groups()` on your reference user or on the parametrized `user`. This keeps the test aligned with the public interface and the new async flow.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
tests/routers/openml/users_test.py
Outdated
| db_user = await fetch_user(api_key, user_data=user_test) | ||
| assert db_user is not None | ||
| assert user.user_id == db_user.user_id | ||
| assert user.groups == db_user.groups |
There was a problem hiding this comment.
suggestion (testing): Prefer exercising the public async get_groups() API instead of comparing the private _groups attribute.
Because the test now inspects _groups directly, it no longer validates the async group-fetching behavior and is more brittle to internal changes. Instead, await get_groups() on db_user and compare that result against the expected groups, e.g. by also using get_groups() on your reference user or on the parametrized user. This keeps the test aligned with the public interface and the new async flow.
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
tests/routers/openml/datasets_test.py (1)
79-91:⚠️ Potential issue | 🟠 MajorTest may fail with
AssertionErrorinstead of expectedHTTPException.The test passes
user_db=None, but looking atget_datasetinsrc/routers/openml/datasets.py(from relevant snippets), it hasassert user_db is not Noneat line 397. This assertion will fire before any access check occurs, causing anAssertionErrorrather than the expectedHTTPExceptionwith statusFORBIDDEN.If the intent is to test the "no access" path,
user_dbshould be provided (possibly via theuser_testfixture), so the access check logic is actually exercised.Proposed fix
async def test_private_dataset_no_access( user: User | None, expdb_test: AsyncConnection, + user_test: AsyncConnection, ) -> None: with pytest.raises(HTTPException) as e: await get_dataset( dataset_id=130, user=user, - user_db=None, + user_db=user_test, expdb_db=expdb_test, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/datasets_test.py` around lines 79 - 91, The test test_private_dataset_no_access is passing user_db=None which triggers the assert user_db is not None in get_dataset before access logic runs; update the test to supply a valid user_db (e.g., use the user_test fixture or the existing user fixture) so that get_dataset's access checks execute and the HTTPException FORBIDDEN is raised—locate test_private_dataset_no_access and change the call to get_dataset to pass a populated user_db instead of None.tests/routers/openml/flows_test.py (1)
27-33:⚠️ Potential issue | 🔴 CriticalAll three test functions in this file require
AsyncMockto properly mock the asyncget_by_namefunction.
database.flows.get_by_nameis async, but all three tests mock it withmocker.patch()which creates a regularMagicMock. Whenflow_existsawaits the mocked function, it will fail withTypeError: object ... can't be used in 'await' expression.Apply this fix to all three test functions:
Proposed fixes
- mocked_db = mocker.patch("database.flows.get_by_name") + mocked_db = mocker.patch("database.flows.get_by_name", new_callable=mocker.AsyncMock) await flow_exists(name, external_version, expdb_test)fake_flow = mocker.MagicMock(id=flow_id) - mocker.patch( + mocked_db = mocker.patch( "database.flows.get_by_name", + new_callable=mocker.AsyncMock, return_value=fake_flow, )- mocker.patch("database.flows.get_by_name", return_value=None) + mocker.patch("database.flows.get_by_name", new_callable=mocker.AsyncMock, return_value=None)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/flows_test.py` around lines 27 - 33, The tests call the async function flow_exists which awaits database.flows.get_by_name, but the tests currently patch get_by_name with a regular MagicMock; change the patches in all three test functions to use an AsyncMock so the awaited call works: replace mocker.patch("database.flows.get_by_name") with a patch that supplies an AsyncMock (e.g., mocker.patch("database.flows.get_by_name", new=AsyncMock()) or mocker.patch("database.flows.get_by_name", new_callable=AsyncMock)) and keep the same assertions against the mocked_db returned by the patch in each test that references mocked_db, flow_exists, and database.flows.get_by_name.src/routers/mldcat_ap/dataset.py (1)
170-183:⚠️ Potential issue | 🟠 MajorValidate
feature_nobounds before indexing.Line 182 can raise
IndexError, and negative indices currently allow unintended reverse indexing. Return a controlled client error for out-of-range values.Proposed fix
async def get_distribution_feature( @@ - feature = features[feature_no] + if feature_no < 0 or feature_no >= len(features): + raise HTTPException(status_code=404, detail="Feature not found.") + feature = features[feature_no]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routers/mldcat_ap/dataset.py` around lines 170 - 183, In get_distribution_feature, validate feature_no against the features list before indexing: check that feature_no is an integer >= 0 and < len(features) (use the local features variable returned by get_dataset_features) and if it is out of range raise a controlled client error (e.g., raise fastapi.HTTPException with a 400/404 and a clear message) so negative values cannot trigger reverse indexing and IndexError is avoided; perform this bounds check immediately before using features[feature_no] in get_distribution_feature.
🧹 Nitpick comments (7)
pyproject.toml (1)
20-21: Remove themysqlclientdependency frompyproject.toml.The codebase has been fully migrated to async operations using
sqlalchemy.ext.asyncioandaiomysql. The sync MySQL drivermysqlclientis not imported or used anywhere. It can be safely removed to reduce unnecessary dependencies.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` around lines 20 - 21, Remove the unused synchronous MySQL driver by deleting the "mysqlclient" entry from the dependencies list in pyproject.toml; since the code uses async DB access via aiomysql and sqlalchemy.ext.asyncio, ensure only "aiomysql" remains, then update the project lock (e.g., poetry lock / poetry update) to reflect the removed dependency.tests/routers/openml/study_test.py (2)
547-576: Same redundant pattern in other tests.Lines 551 and 567 also have redundant UPDATE statements since
_attach_tasks_to_studyalready performs this update. Consider removing these duplicate calls for consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/study_test.py` around lines 547 - 576, Remove the redundant explicit DB update calls before calling _attach_tasks_to_study in both test_attach_task_to_study_already_linked_raises and test_attach_task_to_study_but_task_not_exist_raises: since _attach_tasks_to_study already performs the "UPDATE study SET status = 'in_preparation' WHERE id = 1" step, delete the await expdb_test.execute(...) lines in those two tests so the setup only happens inside _attach_tasks_to_study and avoid duplicate state changes.
533-544: Redundant database update.Line 536 executes the same UPDATE statement that
_attach_tasks_to_studyexecutes on line 514. The helper already sets the study status to 'in_preparation', so this pre-update is unnecessary.♻️ Suggested fix
async def test_attach_task_to_study_needs_owner( py_api: TestClient, expdb_test: AsyncConnection ) -> None: - await expdb_test.execute(text("UPDATE study SET status = 'in_preparation' WHERE id = 1")) response = await _attach_tasks_to_study( study_id=1, task_ids=[2, 3, 4], api_key="00000000000000000000000000000000", py_api=py_api, expdb_test=expdb_test, ) assert response.status_code == HTTPStatus.FORBIDDEN🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/study_test.py` around lines 533 - 544, The test test_attach_task_to_study_needs_owner contains a redundant manual UPDATE via expdb_test.execute that duplicates the state setup already performed by the helper _attach_tasks_to_study; remove the explicit expdb_test.execute(text("UPDATE study SET status = 'in_preparation' WHERE id = 1")) call so the test relies on the helper to set the study status to 'in_preparation' before asserting the forbidden response, leaving the rest of the call to _attach_tasks_to_study and the HTTPStatus.FORBIDDEN assertion unchanged.src/routers/openml/tasktype.py (1)
29-40: Consider makingexpdbnon-optional to matchget_task_type.The
expdbparameter is typed asAsyncConnection | Nonewith a runtime assertion, whileget_task_type(line 46) uses non-optionalAsyncConnection. Since FastAPI's dependency injection will always provide a connection, the| Noneand assertion pattern adds unnecessary complexity.This is consistent with other endpoints in this PR, so it may be an intentional pattern, but worth noting for consistency within this file.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routers/openml/tasktype.py` around lines 29 - 40, The list_task_types endpoint declares expdb as Annotated[AsyncConnection | None, Depends(expdb_connection)] and then asserts it's not None; change the parameter to Annotated[AsyncConnection, Depends(expdb_connection)] (make expdb non-optional to match get_task_type and get_task_types) and remove the runtime assert; confirm _normalize_task_type and await get_task_types(expdb) still use the non-optional AsyncConnection signature and update any related type hints if needed.src/database/users.py (1)
61-65: Avoid redundantUserGroupre-wrapping inUser.get_groups.Line 64 re-casts values that are already returned as
UserGroupbyget_user_groups_for, which is unnecessary.Proposed simplification
async def get_groups(self) -> list[UserGroup]: if self._groups is None: - groups = await get_user_groups_for(user_id=self.user_id, connection=self._database) - self._groups = [UserGroup(group_id) for group_id in groups] + self._groups = await get_user_groups_for( + user_id=self.user_id, + connection=self._database, + ) return self._groups🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/database/users.py` around lines 61 - 65, The get_groups method is re-wrapping values returned by get_user_groups_for into new UserGroup instances unnecessarily; update User.get_groups to assign the returned list directly to self._groups when get_user_groups_for already returns UserGroup objects (or otherwise only wrap when items are plain IDs). Locate the method get_groups and replace the unconditional list comprehension [UserGroup(group_id) for group_id in groups] with logic that detects/assumes the returned items' type (e.g., if isinstance(items[0], UserGroup) then self._groups = groups else map to UserGroup), preserving the existing caching behavior using self._groups, self.user_id, and self._database.src/routers/mldcat_ap/dataset.py (1)
43-47: Make DB dependencies required instead ofOptional+assert.Lines 46-47, Line 149, and Line 176 use
assertfor non-None checks.assertcan be stripped in optimized runs and is brittle for request-path guarantees; prefer required dependency types directly.Proposed pattern (apply similarly to all affected handlers)
async def get_mldcat_ap_distribution( distribution_id: int, user: Annotated[User | None, Depends(fetch_user)] = None, - user_db: Annotated[AsyncConnection | None, Depends(userdb_connection)] = None, - expdb: Annotated[AsyncConnection | None, Depends(expdb_connection)] = None, + user_db: Annotated[AsyncConnection, Depends(userdb_connection)], + expdb: Annotated[AsyncConnection, Depends(expdb_connection)], ) -> JsonLDGraph: - assert user_db is not None # noqa: S101 - assert expdb is not None # noqa: S101Also applies to: 147-150, 174-177
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routers/mldcat_ap/dataset.py` around lines 43 - 47, The handlers currently declare dependencies as optional (Annotated[AsyncConnection | None, Depends(...)] = None) and then use assert to check non-None; change the dependency parameters to be required by removing the "| None" and the "= None" default (e.g. use Annotated[AsyncConnection, Depends(userdb_connection)] and Annotated[AsyncConnection, Depends(expdb_connection)]), and remove the corresponding assert statements (e.g. the assert user_db is not None / assert expdb is not None checks). Apply the same change to the other affected handlers referenced (the ones using userdb_connection and expdb_connection at the other locations) so all DB deps are required and no asserts are used for request validation.src/routers/openml/datasets.py (1)
37-40: Prefer required DB dependency types overOptional+assertguards.Line 39, Line 107, Line 298, Line 399, and Line 400 all rely on
assertto narrow optional dependencies. For request handlers, required dependency annotations are safer and clearer.Proposed pattern (apply to all affected handlers)
async def tag_dataset( @@ - expdb_db: Annotated[AsyncConnection | None, Depends(expdb_connection)] = None, + expdb_db: Annotated[AsyncConnection, Depends(expdb_connection)], ) -> dict[str, dict[str, Any]]: - assert expdb_db is not None # noqa: S101Also applies to: 105-108, 296-299, 396-401
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routers/openml/datasets.py` around lines 37 - 40, The handlers currently annotate the DB dependency as optional (e.g., expdb_db: Annotated[AsyncConnection | None, Depends(expdb_connection)] = None) and then use assert to narrow it; change those parameters to required dependency types by removing the None union and default (e.g., use Annotated[AsyncConnection, Depends(expdb_connection)]) and remove the subsequent assert checks; apply this to all affected handlers that reference expdb_db (and similar params) so callers receive a guaranteed AsyncConnection and the code can call database methods like database.datasets.get_tags_for without runtime asserts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/database/setup.py`:
- Around line 10-22: _create_engine mutates the shared configuration by calling
database_configuration[database_name].pop("echo", False) and setting
database_configuration[database_name]["drivername"], which can break subsequent
calls; fix by copying the per-database config (e.g., cfg =
dict(load_database_configuration()[database_name])) then read echo using
cfg.pop("echo", False) or cfg.get("echo") on the copy, set cfg["drivername"] =
"mysql+aiomysql" on the copy, and pass that copy into URL.create(...) so
load_database_configuration and its returned dict are never mutated.
In `@src/database/studies.py`:
- Around line 171-177: attach_runs currently raises NotImplementedError so
calling database.studies.attach_runs(study_id, run_ids, user, connection) will
crash; implement the function to actually attach runs (or change the caller, but
prefer implementing). In the attach_runs(study_id: int, run_ids: list[int],
user: User, connection: AsyncConnection) implementation: validate the study is
RUN-type and user has permission, begin a transaction on the provided
connection, upsert entries into the study_runs linking table (or equivalent) for
each run_id while avoiding duplicates, update study metadata (e.g., modified_at)
as needed, and commit/rollback on error; remove the NotImplementedError so the
router call for RUN studies succeeds.
In `@src/database/users.py`:
- Around line 56-59: User.fetch currently treats falsy user_id values (e.g., 0)
as missing by using a truthiness check; change the guard to an explicit None
check: call get_user_id_for(api_key=api_key, connection=user_db), assign to
user_id, and return cls(user_id, _database=user_db) only if user_id is not None
(otherwise return None). This affects the async classmethod User.fetch and
references get_user_id_for, APIKey, AsyncConnection and the cls constructor.
In `@src/routers/dependencies.py`:
- Around line 12-23: The current expdb_connection and userdb_connection
generators always call await connection.commit() after yield, which will commit
even if the downstream endpoint raised an exception; change each generator to
use SQLAlchemy's transactional context (e.g., use engine.begin() / async with
engine.begin() as connection) so that the transaction is committed only on
successful exit and automatically rolled back on exceptions, or alternatively
wrap the yield in try/except/finally and call await connection.rollback() when
an exception is caught; update expdb_connection and userdb_connection
accordingly to ensure rollback on errors instead of unconditional commit.
In `@src/routers/mldcat_ap/dataset.py`:
- Around line 143-152: get_distribution_quality currently calls quality = next(q
for q in qualities if q.name == quality_name) which will raise StopIteration for
unknown quality_name and produce a 500; change this to use a default (e.g.
next((q for q in qualities if q.name == quality_name), None)) and if the result
is None raise a controlled API error (e.g. raise HTTPException(status_code=404,
detail=f"Quality '{quality_name}' not found for distribution
{distribution_id}")) so callers get a proper 4xx response; update references to
get_qualities, quality_name, and the quality variable accordingly.
In `@src/routers/openml/flows.py`:
- Around line 61-67: The list comprehension for subflows uses an await which is
invalid; replace it with an explicit async loop that iterates over subflow_rows,
awaits get_flow for each subflow.child_id, constructs a
Subflow(identifier=subflow.identifier, flow=flow) and appends it to subflows (or
builds the list incrementally) so all awaits occur inside the async loop; refer
to the symbols subflows, Subflow, get_flow, and subflow_rows to locate and
update the code.
In `@src/routers/openml/tasks.py`:
- Around line 142-144: The code calling next(rows) on result.mappings() can
raise StopIteration when the query returns no rows (e.g., task_inputs[table] ID
missing); update the block around rows = result.mappings() in the function
handling task inputs to safely handle empty results by checking for a row before
using it (for example use next(rows, None) or iterate and test for None/empty)
and only set fetched_data[f"{table}.{column}"] when a row exists; ensure you
keep the variable names rows, fetched_data, task_inputs and table intact so the
change is localized and clear.
In `@tests/conftest.py`:
- Around line 51-64: The py_api fixture should yield a context-managed
TestClient so the FastAPI lifespan handler runs deterministically: instead of
returning TestClient(app) directly, create the app with create_api(), set
app.dependency_overrides for expdb_connection and userdb_connection using the
async override_expdb and override_userdb functions, then instantiate TestClient
inside a with block (with TestClient(app) as client:) and yield client so
startup/shutdown (including create_api()'s lifespan that calls
close_databases()) is executed; update the py_api fixture to use this
context-managed pattern around TestClient.
In `@tests/routers/openml/flows_test.py`:
- Around line 45-51: The test patches database.flows.get_by_name with a regular
MagicMock which doesn't support awaiting; update the patch in the test for
flow_exists to use an async-compatible mock (e.g.,
mocker.AsyncMock(return_value=fake_flow)) or otherwise ensure the mocked
get_by_name is awaitable so awaiting inside flow_exists works; keep the
fake_flow = mocker.MagicMock(id=flow_id) and replace the mocker.patch call that
targets "database.flows.get_by_name" with an AsyncMock variant.
- Around line 54-61: The test is patching database.flows.get_by_name
synchronously but flow_exists awaits it; change the patch to provide an async
mock (e.g., use new_callable=AsyncMock or new=AsyncMock(return_value=None)) so
get_by_name is awaitable and returns None; update the mocker.patch call that
targets "database.flows.get_by_name" in test_flow_exists_handles_flow_not_found
to use AsyncMock, then run the test to assert HTTPException with
HTTPStatus.NOT_FOUND and detail "Flow not found." as before.
In `@tests/routers/openml/users_test.py`:
- Around line 17-21: The assertion in test_fetch_user is a false positive
because user._groups and db_user._groups can both be None; update the test to
await the async get_groups() on both user and db_user (or compare the awaited
results) before asserting equality so group membership is actually populated and
checked—modify test_fetch_user to call await user.get_groups() and await
db_user.get_groups() (or compare await user.get_groups() to await
db_user.get_groups()) prior to asserting on _groups.
---
Outside diff comments:
In `@src/routers/mldcat_ap/dataset.py`:
- Around line 170-183: In get_distribution_feature, validate feature_no against
the features list before indexing: check that feature_no is an integer >= 0 and
< len(features) (use the local features variable returned by
get_dataset_features) and if it is out of range raise a controlled client error
(e.g., raise fastapi.HTTPException with a 400/404 and a clear message) so
negative values cannot trigger reverse indexing and IndexError is avoided;
perform this bounds check immediately before using features[feature_no] in
get_distribution_feature.
In `@tests/routers/openml/datasets_test.py`:
- Around line 79-91: The test test_private_dataset_no_access is passing
user_db=None which triggers the assert user_db is not None in get_dataset before
access logic runs; update the test to supply a valid user_db (e.g., use the
user_test fixture or the existing user fixture) so that get_dataset's access
checks execute and the HTTPException FORBIDDEN is raised—locate
test_private_dataset_no_access and change the call to get_dataset to pass a
populated user_db instead of None.
In `@tests/routers/openml/flows_test.py`:
- Around line 27-33: The tests call the async function flow_exists which awaits
database.flows.get_by_name, but the tests currently patch get_by_name with a
regular MagicMock; change the patches in all three test functions to use an
AsyncMock so the awaited call works: replace
mocker.patch("database.flows.get_by_name") with a patch that supplies an
AsyncMock (e.g., mocker.patch("database.flows.get_by_name", new=AsyncMock()) or
mocker.patch("database.flows.get_by_name", new_callable=AsyncMock)) and keep the
same assertions against the mocked_db returned by the patch in each test that
references mocked_db, flow_exists, and database.flows.get_by_name.
---
Nitpick comments:
In `@pyproject.toml`:
- Around line 20-21: Remove the unused synchronous MySQL driver by deleting the
"mysqlclient" entry from the dependencies list in pyproject.toml; since the code
uses async DB access via aiomysql and sqlalchemy.ext.asyncio, ensure only
"aiomysql" remains, then update the project lock (e.g., poetry lock / poetry
update) to reflect the removed dependency.
In `@src/database/users.py`:
- Around line 61-65: The get_groups method is re-wrapping values returned by
get_user_groups_for into new UserGroup instances unnecessarily; update
User.get_groups to assign the returned list directly to self._groups when
get_user_groups_for already returns UserGroup objects (or otherwise only wrap
when items are plain IDs). Locate the method get_groups and replace the
unconditional list comprehension [UserGroup(group_id) for group_id in groups]
with logic that detects/assumes the returned items' type (e.g., if
isinstance(items[0], UserGroup) then self._groups = groups else map to
UserGroup), preserving the existing caching behavior using self._groups,
self.user_id, and self._database.
In `@src/routers/mldcat_ap/dataset.py`:
- Around line 43-47: The handlers currently declare dependencies as optional
(Annotated[AsyncConnection | None, Depends(...)] = None) and then use assert to
check non-None; change the dependency parameters to be required by removing the
"| None" and the "= None" default (e.g. use Annotated[AsyncConnection,
Depends(userdb_connection)] and Annotated[AsyncConnection,
Depends(expdb_connection)]), and remove the corresponding assert statements
(e.g. the assert user_db is not None / assert expdb is not None checks). Apply
the same change to the other affected handlers referenced (the ones using
userdb_connection and expdb_connection at the other locations) so all DB deps
are required and no asserts are used for request validation.
In `@src/routers/openml/datasets.py`:
- Around line 37-40: The handlers currently annotate the DB dependency as
optional (e.g., expdb_db: Annotated[AsyncConnection | None,
Depends(expdb_connection)] = None) and then use assert to narrow it; change
those parameters to required dependency types by removing the None union and
default (e.g., use Annotated[AsyncConnection, Depends(expdb_connection)]) and
remove the subsequent assert checks; apply this to all affected handlers that
reference expdb_db (and similar params) so callers receive a guaranteed
AsyncConnection and the code can call database methods like
database.datasets.get_tags_for without runtime asserts.
In `@src/routers/openml/tasktype.py`:
- Around line 29-40: The list_task_types endpoint declares expdb as
Annotated[AsyncConnection | None, Depends(expdb_connection)] and then asserts
it's not None; change the parameter to Annotated[AsyncConnection,
Depends(expdb_connection)] (make expdb non-optional to match get_task_type and
get_task_types) and remove the runtime assert; confirm _normalize_task_type and
await get_task_types(expdb) still use the non-optional AsyncConnection signature
and update any related type hints if needed.
In `@tests/routers/openml/study_test.py`:
- Around line 547-576: Remove the redundant explicit DB update calls before
calling _attach_tasks_to_study in both
test_attach_task_to_study_already_linked_raises and
test_attach_task_to_study_but_task_not_exist_raises: since
_attach_tasks_to_study already performs the "UPDATE study SET status =
'in_preparation' WHERE id = 1" step, delete the await expdb_test.execute(...)
lines in those two tests so the setup only happens inside _attach_tasks_to_study
and avoid duplicate state changes.
- Around line 533-544: The test test_attach_task_to_study_needs_owner contains a
redundant manual UPDATE via expdb_test.execute that duplicates the state setup
already performed by the helper _attach_tasks_to_study; remove the explicit
expdb_test.execute(text("UPDATE study SET status = 'in_preparation' WHERE id =
1")) call so the test relies on the helper to set the study status to
'in_preparation' before asserting the forbidden response, leaving the rest of
the call to _attach_tasks_to_study and the HTTPStatus.FORBIDDEN assertion
unchanged.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (29)
pyproject.tomlsrc/core/access.pysrc/database/datasets.pysrc/database/evaluations.pysrc/database/flows.pysrc/database/qualities.pysrc/database/setup.pysrc/database/studies.pysrc/database/tasks.pysrc/database/users.pysrc/main.pysrc/routers/dependencies.pysrc/routers/mldcat_ap/dataset.pysrc/routers/openml/datasets.pysrc/routers/openml/estimation_procedure.pysrc/routers/openml/evaluations.pysrc/routers/openml/flows.pysrc/routers/openml/qualities.pysrc/routers/openml/study.pysrc/routers/openml/tasks.pysrc/routers/openml/tasktype.pytests/conftest.pytests/database/flows_test.pytests/routers/openml/dataset_tag_test.pytests/routers/openml/datasets_test.pytests/routers/openml/flows_test.pytests/routers/openml/qualities_test.pytests/routers/openml/study_test.pytests/routers/openml/users_test.py
Signed-off-by: rohansen856 <rohansen856@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/routers/mldcat_ap/dataset.py (1)
182-187:⚠️ Potential issue | 🟡 MinorHandle out-of-bounds
feature_nowithout raisingIndexError.Line 187 directly indexes
features[feature_no]which will raiseIndexErrorand return a 500 if the index is invalid. This should return a controlled 404 response, similar to howget_distribution_qualityhandles missing qualities.Proposed fix
features = await get_dataset_features( dataset_id=distribution_id, user=user, expdb=expdb, ) - feature = features[feature_no] + if feature_no < 0 or feature_no >= len(features): + raise HTTPException( + status_code=404, + detail=f"Feature {feature_no} not found for distribution {distribution_id}.", + ) + feature = features[feature_no]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routers/mldcat_ap/dataset.py` around lines 182 - 187, The code directly indexes features[feature_no] which can raise IndexError; update the handler after calling get_dataset_features to check that 0 <= feature_no < len(features) and if not raise an HTTPException with status_code=404 (or return the same controlled 404 response pattern used by get_distribution_quality) including a clear detail message, otherwise proceed to assign feature = features[feature_no]; reference get_dataset_features, features, feature_no, and mirror the error/response shape used in get_distribution_quality.src/database/studies.py (1)
157-159:⚠️ Potential issue | 🟠 MajorHarden DB error parsing; single-value unpack of
e.argsis brittle.Lines 158 and 189 assume exactly one arg via
(msg,) = e.args. With SQLAlchemy-wrapped DB errors from aiomysql, this assumption breaks: SQLAlchemy'sIntegrityError.argsis not standardized and may not be a 1-tuple, causing a secondaryValueErrorthat masks your intended error handling for duplicate entries and foreign key violations.Catch
IntegrityErrorspecifically and extract the message from the wrapped driver exception:Proposed fix
+from sqlalchemy.exc import IntegrityError @@ - except Exception as e: - (msg,) = e.args + except IntegrityError as e: + msg = str(getattr(e, "orig", e)) @@ - except Exception as e: - (msg,) = e.args + except IntegrityError as e: + msg = str(getattr(e, "orig", e))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/database/studies.py` around lines 157 - 159, The except block that currently does "except Exception as e" and unpacks "(msg,) = e.args" is brittle; change to catch SQLAlchemy's IntegrityError (from sqlalchemy.exc import IntegrityError), then extract the DB message robustly (e.g., inspect e.orig or fallback to str(e)) instead of assuming a single-element e.args; use that extracted msg when applying the re.search(r"Duplicate entry '(\d+)-(\d+)' for key 'task_study.PRIMARY'") check and for the foreign-key violation handling so the duplicate-entry and FK branches hit reliably without causing a secondary ValueError.
🧹 Nitpick comments (2)
src/routers/openml/study.py (1)
57-59: Makeexpdbdependency non-null in the signature and drop assert guards.Line 57, Line 99, and Line 142 already type
expdbasAsyncConnection, but defaults toNoneand then asserts non-null (Line 59, Line 101, Line 144). This is clearer and safer if the dependency is required directly by the function signature.♻️ Suggested cleanup
async def attach_to_study( @@ - expdb: Annotated[AsyncConnection, Depends(expdb_connection)] = None, + expdb: Annotated[AsyncConnection, Depends(expdb_connection)], ) -> AttachDetachResponse: - assert expdb is not None # noqa: S101 async def create_study( @@ - expdb: Annotated[AsyncConnection, Depends(expdb_connection)] = None, + expdb: Annotated[AsyncConnection, Depends(expdb_connection)], ) -> dict[Literal["study_id"], int]: - assert expdb is not None # noqa: S101 async def get_study( @@ - expdb: Annotated[AsyncConnection, Depends(expdb_connection)] = None, + expdb: Annotated[AsyncConnection, Depends(expdb_connection)], ) -> Study: - assert expdb is not None # noqa: S101Also applies to: 99-101, 142-144
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routers/openml/study.py` around lines 57 - 59, The expdb dependency parameter should be made required in the function signatures instead of defaulting to None and asserting non-null; change the parameter declaration from "expdb: Annotated[AsyncConnection, Depends(expdb_connection)] = None" to "expdb: Annotated[AsyncConnection, Depends(expdb_connection)]" in each affected function and remove the subsequent "assert expdb is not None" guard; apply this change for the three occurrences that declare the expdb parameter and assert it (the parameter name expdb, the type Annotated[AsyncConnection, Depends(expdb_connection)], and the assert lines).src/routers/openml/datasets.py (1)
37-40: Remove non-optional dependency defaults and runtime asserts.Lines 37–40, 105–108, 296–299, and 396–400 use
= Nonedefaults withassert ... is not Nonechecks forAsyncConnectiondependencies. Sinceassertstatements are stripped in optimized Python runs, this pattern should be replaced by making the dependency non-optional in the signature (remove= None), eliminating the runtime assert.The pattern is already correctly applied elsewhere (e.g., line 336), demonstrating the preferred approach.
Pattern to apply
async def tag_dataset( data_id: Annotated[int, Body()], tag: Annotated[str, SystemString64], user: Annotated[User | None, Depends(fetch_user)] = None, - expdb_db: Annotated[AsyncConnection, Depends(expdb_connection)] = None, + expdb_db: Annotated[AsyncConnection, Depends(expdb_connection)], ) -> dict[str, dict[str, Any]]: - assert expdb_db is not None # noqa: S101 tags = await database.datasets.get_tags_for(data_id, expdb_db)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routers/openml/datasets.py` around lines 37 - 40, The code uses Optional-style dependency defaults (expdb_db: Annotated[AsyncConnection, Depends(expdb_connection)] = None) paired with runtime asserts (assert expdb_db is not None); remove the "= None" default from the AsyncConnection parameters (e.g., expdb_db) in each affected function signature and delete the corresponding "assert ... is not None" checks so the dependency is required at call time (match the non-optional pattern already used elsewhere in this file); ensure calls like database.datasets.get_tags_for(data_id, expdb_db) continue to use the parameter directly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/routers/openml/study.py`:
- Around line 117-123: The pre-check using study.alias and
database.studies.get_by_alias is racy; remove reliance on it and make create()
enforce uniqueness atomically at the persistence layer, then catch
duplicate-alias errors from database.studies.create and raise
HTTPException(status_code=HTTPStatus.CONFLICT, detail="Study alias already
exists.") when the DB reports a unique-key/duplicate constraint violation; you
can still keep an optional pre-check for fast-fail, but ensure
database.studies.create is the authoritative enforcement and map its
unique-constraint error to the 409 response.
---
Outside diff comments:
In `@src/database/studies.py`:
- Around line 157-159: The except block that currently does "except Exception as
e" and unpacks "(msg,) = e.args" is brittle; change to catch SQLAlchemy's
IntegrityError (from sqlalchemy.exc import IntegrityError), then extract the DB
message robustly (e.g., inspect e.orig or fallback to str(e)) instead of
assuming a single-element e.args; use that extracted msg when applying the
re.search(r"Duplicate entry '(\d+)-(\d+)' for key 'task_study.PRIMARY'") check
and for the foreign-key violation handling so the duplicate-entry and FK
branches hit reliably without causing a secondary ValueError.
In `@src/routers/mldcat_ap/dataset.py`:
- Around line 182-187: The code directly indexes features[feature_no] which can
raise IndexError; update the handler after calling get_dataset_features to check
that 0 <= feature_no < len(features) and if not raise an HTTPException with
status_code=404 (or return the same controlled 404 response pattern used by
get_distribution_quality) including a clear detail message, otherwise proceed to
assign feature = features[feature_no]; reference get_dataset_features, features,
feature_no, and mirror the error/response shape used in
get_distribution_quality.
---
Nitpick comments:
In `@src/routers/openml/datasets.py`:
- Around line 37-40: The code uses Optional-style dependency defaults (expdb_db:
Annotated[AsyncConnection, Depends(expdb_connection)] = None) paired with
runtime asserts (assert expdb_db is not None); remove the "= None" default from
the AsyncConnection parameters (e.g., expdb_db) in each affected function
signature and delete the corresponding "assert ... is not None" checks so the
dependency is required at call time (match the non-optional pattern already used
elsewhere in this file); ensure calls like
database.datasets.get_tags_for(data_id, expdb_db) continue to use the parameter
directly.
In `@src/routers/openml/study.py`:
- Around line 57-59: The expdb dependency parameter should be made required in
the function signatures instead of defaulting to None and asserting non-null;
change the parameter declaration from "expdb: Annotated[AsyncConnection,
Depends(expdb_connection)] = None" to "expdb: Annotated[AsyncConnection,
Depends(expdb_connection)]" in each affected function and remove the subsequent
"assert expdb is not None" guard; apply this change for the three occurrences
that declare the expdb parameter and assert it (the parameter name expdb, the
type Annotated[AsyncConnection, Depends(expdb_connection)], and the assert
lines).
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
src/database/setup.pysrc/database/studies.pysrc/database/users.pysrc/routers/dependencies.pysrc/routers/mldcat_ap/dataset.pysrc/routers/openml/datasets.pysrc/routers/openml/flows.pysrc/routers/openml/study.pysrc/routers/openml/tasks.pysrc/routers/openml/tasktype.pytests/conftest.pytests/routers/openml/flows_test.pytests/routers/openml/users_test.py
| if study.alias and await database.studies.get_by_alias(study.alias, expdb): | ||
| raise HTTPException( | ||
| status_code=HTTPStatus.CONFLICT, | ||
| detail="Study alias already exists.", | ||
| ) | ||
| study_id = database.studies.create(study, user, expdb) | ||
| study_id = await database.studies.create(study, user, expdb) | ||
| if study.main_entity_type == StudyType.TASK: |
There was a problem hiding this comment.
Alias uniqueness check is vulnerable to a race condition.
At Line 117 you do a pre-check, then create at Line 122. Two concurrent requests can pass the pre-check and race. Enforce uniqueness atomically at create time and map duplicate-alias failures to 409 CONFLICT.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/routers/openml/study.py` around lines 117 - 123, The pre-check using
study.alias and database.studies.get_by_alias is racy; remove reliance on it and
make create() enforce uniqueness atomically at the persistence layer, then catch
duplicate-alias errors from database.studies.create and raise
HTTPException(status_code=HTTPStatus.CONFLICT, detail="Study alias already
exists.") when the DB reports a unique-key/duplicate constraint violation; you
can still keep an optional pre-check for fast-fail, but ensure
database.studies.create is the authoritative enforcement and map its
unique-constraint error to the 409 response.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/routers/openml/datasets_test.py (1)
79-89:⚠️ Potential issue | 🟠 Major
test_private_dataset_no_accesspassesuser_db=None, which violatesget_datasetpreconditions.
get_datasetassertsuser_db is not None(src/routers/openml/datasets.py, Line [396]), so this test can fail withAssertionErrorbefore the expectedHTTPException.💡 Proposed fix
async def test_private_dataset_no_access( user: User | None, expdb_test: AsyncConnection, + user_test: AsyncConnection, ) -> None: with pytest.raises(HTTPException) as e: await get_dataset( dataset_id=130, user=user, - user_db=None, + user_db=user_test, expdb_db=expdb_test, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/datasets_test.py` around lines 79 - 89, The test test_private_dataset_no_access currently passes user_db=None which violates the precondition in get_dataset (it asserts user_db is not None), so update the test to supply a valid/mocked AsyncConnection for user_db (or a fixture) instead of None so the function reaches its HTTPException path; locate the call to get_dataset in the test and replace user_db=None with a proper user_db fixture or mock AsyncConnection that simulates no access.
🧹 Nitpick comments (2)
src/database/users.py (1)
30-42: Consider renamingrowtoresultfor clarity.The variable
rowholds aCursorResultfromexecute(), not an actual row. This naming pattern is repeated inget_user_groups_foras well. Usingresultwould better communicate the type.♻️ Proposed refactor for clarity
async def get_user_id_for(*, api_key: APIKey, connection: AsyncConnection) -> int | None: - row = await connection.execute( + result = await connection.execute( text( """ SELECT * FROM users WHERE session_hash = :api_key """, ), parameters={"api_key": api_key}, ) - user = row.one_or_none() + user = result.one_or_none() return user.id if user else None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/database/users.py` around lines 30 - 42, The variable named `row` in get_user_id_for (and the same pattern in get_user_groups_for) is misleading because it holds a CursorResult from connection.execute(); rename `row` to `result` in both functions, update any subsequent uses (e.g., change `row.one_or_none()` to `result.one_or_none()` and `row.fetchall()` to `result.fetchall()` as appropriate) so the name reflects the type and avoids confusion when reading or modifying get_user_id_for and get_user_groups_for.tests/routers/openml/study_test.py (1)
534-569: Remove duplicated study-status setup in attach tests.Lines [537], [552], and [568] repeat the same DB update already done in
_attach_tasks_to_study(Line [515]). Keep this setup in one place to reduce noise.♻️ Proposed simplification
async def test_attach_task_to_study_needs_owner( py_api: TestClient, expdb_test: AsyncConnection ) -> None: - await expdb_test.execute(text("UPDATE study SET status = 'in_preparation' WHERE id = 1")) response = await _attach_tasks_to_study( study_id=1, task_ids=[2, 3, 4], api_key=ApiKey.OWNER_USER, py_api=py_api, expdb_test=expdb_test, @@ async def test_attach_task_to_study_already_linked_raises( py_api: TestClient, expdb_test: AsyncConnection, ) -> None: - await expdb_test.execute(text("UPDATE study SET status = 'in_preparation' WHERE id = 1")) response = await _attach_tasks_to_study( study_id=1, task_ids=[1, 3, 4], api_key=ApiKey.ADMIN, py_api=py_api, expdb_test=expdb_test, @@ async def test_attach_task_to_study_but_task_not_exist_raises( py_api: TestClient, expdb_test: AsyncConnection, ) -> None: - await expdb_test.execute(text("UPDATE study SET status = 'in_preparation' WHERE id = 1")) response = await _attach_tasks_to_study( study_id=1, task_ids=[80123, 78914], api_key=ApiKey.ADMIN, py_api=py_api, expdb_test=expdb_test,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/study_test.py` around lines 534 - 569, The tests test_attach_task_to_study_needs_owner, test_attach_task_to_study_already_linked_raises, and test_attach_task_to_study_but_task_not_exist_raises each repeat the UPDATE study SET status = 'in_preparation' setup that is already performed inside the helper _attach_tasks_to_study; remove those duplicate expdb_test.execute(...) calls from those test functions so the DB setup remains only in _attach_tasks_to_study, leaving the tests to call _attach_tasks_to_study with the same parameters and assertions unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/routers/openml/study_test.py`:
- Around line 506-519: The helper _attach_tasks_to_study currently hardcodes the
DB update to WHERE id = 1; change the UPDATE to use the study_id argument by
executing a parameterized query (e.g. text("UPDATE study SET status =
'in_preparation' WHERE id = :study_id")) and pass {"study_id": study_id} to
expdb_test.execute so the setup updates the correct study id; update the call
site in _attach_tasks_to_study that invokes expdb_test.execute (and keep using
the existing study_id parameter) to avoid SQL injection and ensure the helper
works for any input.
---
Outside diff comments:
In `@tests/routers/openml/datasets_test.py`:
- Around line 79-89: The test test_private_dataset_no_access currently passes
user_db=None which violates the precondition in get_dataset (it asserts user_db
is not None), so update the test to supply a valid/mocked AsyncConnection for
user_db (or a fixture) instead of None so the function reaches its HTTPException
path; locate the call to get_dataset in the test and replace user_db=None with a
proper user_db fixture or mock AsyncConnection that simulates no access.
---
Nitpick comments:
In `@src/database/users.py`:
- Around line 30-42: The variable named `row` in get_user_id_for (and the same
pattern in get_user_groups_for) is misleading because it holds a CursorResult
from connection.execute(); rename `row` to `result` in both functions, update
any subsequent uses (e.g., change `row.one_or_none()` to `result.one_or_none()`
and `row.fetchall()` to `result.fetchall()` as appropriate) so the name reflects
the type and avoids confusion when reading or modifying get_user_id_for and
get_user_groups_for.
In `@tests/routers/openml/study_test.py`:
- Around line 534-569: The tests test_attach_task_to_study_needs_owner,
test_attach_task_to_study_already_linked_raises, and
test_attach_task_to_study_but_task_not_exist_raises each repeat the UPDATE study
SET status = 'in_preparation' setup that is already performed inside the helper
_attach_tasks_to_study; remove those duplicate expdb_test.execute(...) calls
from those test functions so the DB setup remains only in
_attach_tasks_to_study, leaving the tests to call _attach_tasks_to_study with
the same parameters and assertions unchanged.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/database/users.pytests/routers/openml/datasets_test.pytests/routers/openml/study_test.py
| async def _attach_tasks_to_study( | ||
| study_id: int, | ||
| task_ids: list[int], | ||
| api_key: str, | ||
| py_api: TestClient, | ||
| expdb_test: Connection, | ||
| expdb_test: AsyncConnection, | ||
| ) -> httpx.Response: | ||
| # Adding requires the study to be in preparation, | ||
| # but the current snapshot has no in-preparation studies. | ||
| expdb_test.execute(text("UPDATE study SET status = 'in_preparation' WHERE id = 1")) | ||
| await expdb_test.execute(text("UPDATE study SET status = 'in_preparation' WHERE id = 1")) | ||
| return py_api.post( | ||
| f"/studies/attach?api_key={api_key}", | ||
| json={"study_id": study_id, "entity_ids": task_ids}, | ||
| ) |
There was a problem hiding this comment.
Use study_id in the setup update instead of hardcoded id = 1.
Line [515] ignores the helper argument and always updates study 1, which makes the helper incorrect for any other input.
💡 Proposed fix
async def _attach_tasks_to_study(
study_id: int,
task_ids: list[int],
api_key: str,
py_api: TestClient,
expdb_test: AsyncConnection,
) -> httpx.Response:
# Adding requires the study to be in preparation,
# but the current snapshot has no in-preparation studies.
- await expdb_test.execute(text("UPDATE study SET status = 'in_preparation' WHERE id = 1"))
+ await expdb_test.execute(
+ text("UPDATE study SET status = 'in_preparation' WHERE id = :study_id"),
+ {"study_id": study_id},
+ )
return py_api.post(
f"/studies/attach?api_key={api_key}",
json={"study_id": study_id, "entity_ids": task_ids},
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/routers/openml/study_test.py` around lines 506 - 519, The helper
_attach_tasks_to_study currently hardcodes the DB update to WHERE id = 1; change
the UPDATE to use the study_id argument by executing a parameterized query (e.g.
text("UPDATE study SET status = 'in_preparation' WHERE id = :study_id")) and
pass {"study_id": study_id} to expdb_test.execute so the setup updates the
correct study id; update the call site in _attach_tasks_to_study that invokes
expdb_test.execute (and keep using the existing study_id parameter) to avoid SQL
injection and ensure the helper works for any input.
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/routers/openml/datasets_test.py`:
- Around line 5-6: The test references the undefined type Connection and is
synchronous while calling async fixtures: replace the type annotation Connection
with the imported AsyncConnection, change the test function signature to async
def (making it an async test), and await any asynchronous fixture calls inside
the test (e.g., calls to async client/DB fixtures or methods) so all async
operations are properly awaited; update any function/method annotations and
usages that mention Connection to AsyncConnection to avoid the NameError.
Signed-off-by: rohansen856 <rohansen856@gmail.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/routers/openml/datasets_test.py (1)
278-293:⚠️ Potential issue | 🟡 MinorAdd explicit commit before the py_api request to ensure inserted rows are visible.
Both
py_apiandexpdb_testuse the sameAsyncConnectionobject (thepy_apifixture depends onexpdb_testand yields it viadependency_overrides). However, the connection is wrapped inautomatic_rollback, which starts a transaction but does not commit. While uncommitted changes within a single transaction are visible to subsequent queries on the same connection, the test should explicitly callawait expdb_test.commit()after the inserts and beforepy_api.get(...)to match the pattern used elsewhere in the codebase (e.g.,persisted_flowfixture) and ensure the regression fix is robustly tested:Suggested fix
async def test_dataset_no_500_with_multiple_processing_entries( py_api: TestClient, expdb_test: AsyncConnection, ) -> None: """Regression test for issue `#145`: multiple processing entries caused 500.""" await expdb_test.execute( text("INSERT INTO evaluation_engine(id, name, description) VALUES (99, 'test_engine', '')"), ) await expdb_test.execute( text( "INSERT INTO data_processed(did, evaluation_engine_id, user_id, processing_date) " "VALUES (1, 99, 2, '2020-01-01 00:00:00')", ), ) await expdb_test.commit() response = py_api.get("/datasets/1") assert response.status_code == HTTPStatus.OK🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/datasets_test.py` around lines 278 - 293, The test function test_dataset_no_500_with_multiple_processing_entries inserts rows using the AsyncConnection fixture expdb_test but does not commit them before calling py_api.get("/datasets/1"); add an explicit await expdb_test.commit() after the two await expdb_test.execute(...) calls and before the py_api.get(...) call (mirroring the persisted_flow pattern) so the inserted rows are durably visible to the request.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@tests/routers/openml/datasets_test.py`:
- Around line 278-293: The test function
test_dataset_no_500_with_multiple_processing_entries inserts rows using the
AsyncConnection fixture expdb_test but does not commit them before calling
py_api.get("/datasets/1"); add an explicit await expdb_test.commit() after the
two await expdb_test.execute(...) calls and before the py_api.get(...) call
(mirroring the persisted_flow pattern) so the inserted rows are durably visible
to the request.
Solves: #229
Replacing the current implementation of
mysqlclientwithaiomysqlfor async db connection support. would need to replace the db drivers underdatabase/androutes/with async implementation and convert the functions toasync deffor async support