Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
212 changes: 119 additions & 93 deletions docs/contributing/tests.md
Original file line number Diff line number Diff line change
@@ -1,98 +1,124 @@
# Writing Tests

tl;dr:
- Setting up the `py_api` fixture to test directly against a REST API endpoint is really slow, only use it for migration/integration tests.
- Getting a database fixture and doing a database call is slow, consider mocking if appropriate.

## Overhead from Fixtures
Sometimes, you want to interact with the REST API through the `py_api` fixture,
or want access to a database with `user_test` or `expdb_test` fixtures.
Be warned that these come with considerable relative overhead, which adds up when running thousands of tests.

```python
@pytest.mark.parametrize('execution_number', range(5000))
def test_private_dataset_owner_access(
execution_number,
expdb_test: Connection,
user_test: Connection,
py_api: TestClient,
) -> None:
fetch_user(ApiKey.REGULAR_USER, user_test) # accesses only the user db
get_estimation_procedures(expdb_test) # accesses only the experiment db
py_api.get("/does/not/exist") # only queries the api
pass
This page documents the current testing strategy in this repository.
It is intentionally descriptive: it explains which test layers exist today and when each layer is used.

## Quick summary

- Use the lightest test layer that verifies the behavior you are changing.
- `py_api` (`fastapi.testclient.TestClient`) is intentionally used for integration and migration checks.
- Direct database tests verify SQL/database behavior.
- Direct function tests verify application logic with minimal fixture overhead.
- Mocking is used selectively to keep tests fast, while still validating real database behavior in dedicated tests.

## Test infrastructure in this repository

The core fixtures are defined in `tests/conftest.py`:

- `expdb_test` and `user_test` provide transactional database connections.
- `py_api` creates a FastAPI `TestClient` and overrides dependencies to use those transactional connections.
- `php_api` provides an HTTP client to the legacy PHP API for migration comparisons.

The transactional fixtures use rollback semantics, so most tests can mutate data without persisting changes.

## Test categories

### 1) Migration tests

Migration tests compare Python API responses against the legacy PHP API for equivalent endpoints.
These tests live under `tests/routers/openml/migration/`.

Characteristics:

- Use both `py_api` and `php_api` fixtures.
- Compare response status and response body (with explicit normalization where old/new formats differ).
- Focus on compatibility guarantees during migration.

Typical examples include dataset, flow, task, study, and evaluation migration checks.

### 2) Integration tests (FastAPI TestClient)

Integration tests call Python API endpoints through `py_api` and assert end-to-end behavior from routing to serialization.
Most endpoint-focused tests under `tests/routers/openml/` use this style.

Characteristics:

- Exercise request/response handling via HTTP calls to the in-process FastAPI app.
- Use real dependency wiring (with test database connections injected via fixture overrides).
- Validate returned status codes and payloads as clients see them.

This layer is broader than direct function/database tests, but also has higher execution cost.

### 3) Direct database tests

Direct database tests call functions in `src/database/*` with `expdb_test`/`user_test` connections.
Examples are in `tests/database/`.

Characteristics:

- Focus on query behavior and returned records.
- Avoid HTTP/TestClient overhead.
- Validate persistence-layer behavior directly against the test database.

Use this layer when the change is primarily in SQL access or data retrieval logic.

### 4) Direct function tests

Direct function tests call router or dependency functions directly (without HTTP requests), often with lightweight fixtures and selective mocks.
Examples include tests that call functions such as `flow_exists(...)` or `get_dataset(...)` directly.

Characteristics:

- Validate function-level control flow and error handling.
- Can mock lower-level calls where appropriate.
- Keep runtime low compared with full TestClient tests.

These tests are useful for fast feedback on logic that does not require full HTTP-level verification.

## Performance tradeoffs

Fixture setup has measurable cost.
In existing measurements, creating `py_api` is significantly more expensive than direct function/database-level testing, and database fixtures also add overhead.

Practical implications:

- Prefer direct function or direct database tests when they can validate the behavior sufficiently.
- Reserve `py_api` usage for cases where endpoint-level integration behavior is the target.
- Keep migration tests focused, because they combine multiple expensive dependencies.

This keeps local feedback cycles fast while preserving endpoint and compatibility coverage where required.

## Design philosophy: limited mocking

Mocking is used to reduce runtime and isolate logic when full database interaction is not required.
At the same time, this repository keeps mocking limited by pairing it with real database coverage for the same entities/paths.

Why this balance is used:

- Mock-based tests are fast and targeted.
- Database-backed tests verify actual query/schema behavior.
- Together they reduce risk that mocked behavior diverges from real database behavior.

In short: mock for speed and focus, but keep real database tests for behavioral truth.

## Running tests

Run all tests (from the Python API container):

```bash
python -m pytest tests
```

When individually adding/removing components, we measure (for 5000 repeats, n=1):

| expdb | user | api | exp call | user call | api get | time (s) |
|-------|------|-----|----------|-----------|---------|----------:|
| ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | 1.78 |
| ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | 3.45 |
| ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | 3.22 |
| ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | 298.48 |
| ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | 4.44 |
| ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | 285.69 |
| ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | 4.91 |
| ❌ | ✅ | ❌ | ❌ | ✅ | ❌ | 5.81 |
| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 307.91 |

Adding a fixture that just returns some value adds only minimal overhead (1.91s),
so the burden comes from establishing the database connection itself.

We make the following observations:

- Adding a database fixture adds the same overhead as instantiating an entirely new test.
- Overhead of adding multiple database fixtures is not free
- The `py_api` fixture adds two orders of magnitude more overhead

We want our tests to be fast, so we want to avoid using these fixtures when we reasonably can.
We restrict usage of `py_api` fixtures to integration/migration tests, since it is very slow.
These only run on CI before merges.
For database fixtures

We will write some fixtures that can be used to e.g., get a `User` without accessing the database.
The validity of these users will be tested against the database in only a single test.

### Mocking
Mocking can help us reduce the reliance on database connections in tests.
A mocked function can prevent accessing the database, and instead return a predefined value instead.

It has a few upsides:
- It's faster than using a database fixture (see below).
- The test is not dependent on the database: you can run the test without a database.

But it also has downsides:
- Behavior changes in the database, such as schema changes, are not automatically reflected in the tests.
- The database layer (e.g., queries) are not actually tested.

Basically, the mocked behavior may not match real behavior when executed on a database.
For this reason, for each mocked entity, we should add a test that verifies that if the database layer
is invoked with the database, it returns the expected output that matches the mock.
This is additional overhead in development, but hopefully it pays back in more granular test feedback and faster tests.

On the speed of mocks, consider these two tests:

```diff
@pytest.mark.parametrize('execution_number', range(5000))
def test_private_dataset_owner_access(
execution_number,
admin,
+ mocker,
- expdb_test: Connection,
) -> None:
+ mock = mocker.patch('database.datasets.get')
+ class Dataset(NamedTuple):
+ uploader: int
+ visibility: Visibility
+ mock.return_value = Dataset(uploader=1, visibility=Visibility.PRIVATE)

_get_dataset_raise_otherwise(
dataset_id=1,
user=admin,
- expdb=expdb_test,
+ expdb=None,
)
Run a focused test module:

```bash
python -m pytest tests/routers/openml/datasets_test.py
```
There is only a single database call in the test. It fetches a record on an indexed field and does not require any joins.
Despite the database call being very light, the database-included test is ~50% slower than the mocked version (3.50s vs 5.04s).

Run by marker expression (example):

```bash
python -m pytest -m "not slow"
```

See `pyproject.toml` for current marker definitions (including `slow` and `mut`).
4 changes: 3 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,16 @@ theme:
- content.code.copy

nav:
- Intro: index.md
- OpenML Server: index.md
- Getting Started: installation.md

- Contributing:
- contributing/index.md
- Development: contributing/contributing.md
- Tests: contributing/tests.md
- Documentation: contributing/documentation.md
- Project Overview: contributing/project_overview.md

- Changes: migration.md

markdown_extensions:
Expand Down
Loading