Skip to content

Build thorough integration test suite#25

Merged
jackparnell merged 5 commits intomainfrom
thorough-integration-tests
Apr 9, 2026
Merged

Build thorough integration test suite#25
jackparnell merged 5 commits intomainfrom
thorough-integration-tests

Conversation

@ColonistOne
Copy link
Copy Markdown
Collaborator

Summary

Replaces the previous 6 integration tests (covering 8 of ~37 SDK methods) with a 67-test suite under tests/integration/ that exercises the full ColonyClient and AsyncColonyClient surface against the real https://thecolony.cc API.

The integration suite is intentionally not on CI — every test auto-skips when COLONY_TEST_API_KEY is unset, and the existing unit-test CI matrix is unchanged. The intent is to run them locally before every release.

COLONY_TEST_API_KEY=col_xxx \
COLONY_TEST_API_KEY_2=col_yyy \
    pytest tests/integration/ -v

What's covered

File Methods exercised
test_auth.py get_me, token cache, forced refresh, opt-in register, opt-in rotate_key
test_posts.py create_post, get_post, update_post, delete_post, get_posts (filter, sort, post_type)
test_comments.py create_comment, threaded replies, get_comments, get_all_comments, iter_comments, error paths
test_voting.py vote_post, vote_comment (up/down/clear), react_post, react_comment toggles, invalid value rejection
test_polls.py get_poll against an existing poll; vote_poll opt-in via env var
test_messages.py send_message + get_conversation round trip from both sides; receiver unread count
test_notifications.py get_notifications, count, mark_read, plus a cross-user comment-creates-notification end-to-end
test_profile.py get_me, get_user, update_profile round trip, search
test_pagination.py iter_posts and iter_comments crossing page boundaries with no duplicates — guards the PaginatedList envelope handling that mocks can't fully exercise
test_async.py AsyncColonyClient parallel coverage incl. token refresh, native async pagination, asyncio.gather fan-out, async DMs
test_colonies.py join_colony / leave_colony (was test_integration_colonies) plus get_colonies catalogue check
test_follow.py follow / unfollow (was test_integration_follow) — target now derived from second_me fixture instead of the hard-coded ColonistOne UUID
test_webhooks.py create_webhook, get_webhooks, delete_webhook (was test_integration_webhooks) plus short-secret rejection

Design notes

  • Two test accounts: COLONY_TEST_API_KEY (primary) plus optional COLONY_TEST_API_KEY_2 (secondary, used by tests that need a second user for DMs, follow target, cross-user notifications). Tests that depend on the second key skip cleanly when it's unset.
  • Destructive endpoints gated: COLONY_TEST_REGISTER=1 opts into ColonyClient.register() (creates real accounts) and COLONY_TEST_ROTATE_KEY=1 opts into rotate_key() (invalidates the key the suite is using). A normal pre-release run won't trigger either by accident.
  • All writes target test-posts so test traffic stays out of the main feed.
  • Auto-skip via pytest_collection_modifyitems hook: every test in the directory is auto-marked with @pytest.mark.integration and the entire tree skips when COLONY_TEST_API_KEY is unset. The integration marker is registered in pyproject.toml so no PytestUnknownMarkWarning.
  • Shared fixtures in conftest.py: client, second_client, aclient, second_aclient, me, second_me, test_post (auto-creates and tears down), test_comment. The test_post fixture suppresses ColonyAPIError on cleanup because the server's 15-minute edit window may close on slow tests.
  • Old top-level files removed: test_integration_colonies.py, test_integration_follow.py, and test_integration_webhooks.py are reorganised into tests/integration/ and dropped the test_integration_ prefix. The follow test no longer hard-codes COLONIST_ONE_ID — it uses second_me["id"] so the suite is fully self-contained.

Test plan

  • pytest tests/215 passed, 67 skipped (no API key set)
  • pytest tests/ -m "not integration"215 passed, 67 deselected
  • pytest tests/ -m integration67 skipped, 215 deselected
  • ruff check src/ tests/ — clean
  • ruff format --check src/ tests/ — clean
  • mypy src/ — clean
  • Run with real COLONY_TEST_API_KEY + COLONY_TEST_API_KEY_2 against thecolony.cc — pending Jack's local run before next release

🤖 Generated with Claude Code

Replaces the previous 6 integration tests (covering 8 of ~37 SDK
methods) with a 67-test suite under tests/integration/ that exercises
the full ColonyClient and AsyncColonyClient surface against the real
https://thecolony.cc API.

Per-area files:

  test_auth.py            get_me, token cache, forced refresh, plus
                          opt-in register and rotate_key (gated behind
                          extra env vars so a normal pre-release run
                          can't accidentally invalidate the test key)
  test_posts.py           CRUD lifecycle, update within edit window,
                          listing, sort orders, post_type filtering
  test_comments.py        CRUD, threaded replies, get_comments,
                          get_all_comments, iter_comments, error paths
  test_voting.py          vote_post / vote_comment up/down/clear,
                          react_post / react_comment toggle behaviour,
                          invalid value rejection
  test_polls.py           get_poll against an existing poll (best-effort
                          discovery), vote_poll opt-in via env var
  test_messages.py        send_message + get_conversation round trip
                          from both sides; receiver unread count
                          (requires the secondary test account)
  test_notifications.py   get_notifications, count, mark_read, plus a
                          cross-user comment-creates-notification e2e
  test_profile.py         get_me, get_user, update_profile round trip,
                          search smoke + short-query rejection
  test_pagination.py      iter_posts and iter_comments crossing page
                          boundaries with no duplicates — guards the
                          PaginatedList envelope handling that mocks
                          can't fully exercise
  test_async.py           AsyncColonyClient parallel coverage incl.
                          token refresh, native async pagination,
                          asyncio.gather fan-out, async DMs
  test_colonies.py        join/leave (was test_integration_colonies)
                          plus get_colonies catalogue check
  test_follow.py          follow/unfollow (was test_integration_follow)
                          — target derived from second_me fixture
                          instead of the hard-coded ColonistOne UUID
  test_webhooks.py        create/list/delete (was test_integration_
                          webhooks) plus short-secret rejection

Shared fixtures in conftest.py: client, second_client, aclient,
second_aclient, me, second_me, test_post (auto-creates and tears
down in the test-posts colony), test_comment.

A pytest_collection_modifyitems hook auto-marks every test in this
directory with @pytest.mark.integration and skips the lot when
COLONY_TEST_API_KEY is unset, so `pytest` from a clean checkout still
runs only the unit suite (215 pass, 67 skip cleanly).

The two pre-existing top-level test_integration_*.py files have been
deleted; their contents are reorganised into tests/integration/.

Documentation: tests/integration/README.md (full env-var matrix,
per-file scope, troubleshooting), top-level README "Testing" section,
CHANGELOG Unreleased entry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

ColonistOne and others added 4 commits April 9, 2026 15:59
…ASING.md

The new integration suite caught a critical SDK bug on its first real
run: iter_posts and iter_comments looked for "posts" / "comments" keys
in the response, but the server's PaginatedList envelope is
{"items": [...], "total": N}. Both iterators silently yielded zero
items in production. Both sync and async versions are fixed and
accept either key for back-compat.

It also surfaced two structural issues with the test fixtures themselves:

1. POST /posts is rate-limited at 10 per hour per agent. The original
   function-scoped test_post fixture would burn through the budget on
   any non-trivial run. Now session-scoped, with a fallback to the
   secondary account if the primary is exhausted. The few tests that
   need their own post (CRUD lifecycle, update window, async round
   trip, cross-user notifications) now create posts inline and are
   the only callers that count against the budget.

2. POST /auth/token is rate-limited at 30 per hour per IP. Function-
   scoped async fixtures were creating fresh AsyncColonyClients per
   test, each triggering its own token fetch — a full async run blew
   the budget. Fixed with a process-wide JWT cache in conftest.py
   that lets every client (sync, async, primary, secondary) share
   one token per account. A full integration run now consumes 2
   token fetches instead of one per test.

All integration clients are also constructed with
RetryConfig(max_retries=0) so a 429 from the auth endpoint surfaces
immediately instead of multiplying into more requests.

Other fixes from the first real run:

- All envelope-key assumptions in test code now go through items_of()
  which accepts items / posts / comments / results / notifications /
  messages / users / colonies. The SDK returns different shapes from
  different endpoints (e.g. get_colonies and get_notifications are
  bare lists, get_post is a dict, search is {items, total, users}).
- test_messages.py now skips when sender karma < 5 (server enforces a
  karma threshold on send_message to discourage spam from new accounts)
- test_notifications.py uses is_read instead of read
- test_get_comments_for_nonexistent_post handles either 404 or empty
  200 response (actual behaviour is empty 200)
- test_refresh_token_clears_cache now matches actual SDK behaviour
  (refresh_token clears the cache; the next call lazily refetches)
- test_colonies.py uses items_of for the bare-list response

Documentation:

- New RELEASING.md with the full pre-release checklist, marking the
  integration test run as the most important step
- README "Testing" section now points at RELEASING.md
- Release workflow YAML header updated with the same pre-release step
  (so the manual requirement is documented in three places)
- CHANGELOG Unreleased entry covers the iter_posts bug fix and the
  test infra improvements

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cross-checking the SDK against GET /api/v1/instructions surfaced four
methods that were calling endpoints that don't exist on the server:

  react_post(post_id, emoji)
    SDK was calling:  POST /posts/{id}/react  body {emoji}
    API actually has: POST /reactions/toggle  body {emoji, post_id}

  react_comment(comment_id, emoji)
    SDK was calling:  POST /comments/{id}/react  body {emoji}
    API actually has: POST /reactions/toggle  body {emoji, comment_id}

  get_poll(post_id)
    SDK was calling:  GET /posts/{id}/poll
    API actually has: GET /polls/{id}/results

  vote_poll(post_id, option_id)
    SDK was calling:  POST /posts/{id}/poll/vote  body {option_id}
    API actually has: POST /polls/{id}/vote      body {option_ids: [...]}

The reaction methods also had the wrong emoji format. The server uses
short string keys (thumbs_up, heart, laugh, thinking, fire, eyes,
rocket, clap), not Unicode emoji. Both the docstrings and the
integration tests are updated to use the keys.

vote_poll now accepts either a single option ID or a list of option
IDs (for multi-choice polls), wrapping single strings into a one-item
list before sending. The body field name is option_ids in both cases.

All four fixes apply to both ColonyClient and AsyncColonyClient.

Other changes:

- Added test-posts to colony_sdk.colonies.COLONIES so callers can use
  the canonical name (`colony="test-posts"`) instead of having to know
  the UUID. Updated test_colonies_complete to expect 10 entries.
- Unit tests for react_post, react_comment, get_poll, vote_poll
  rewritten to assert the new endpoints. New test_vote_poll_multiple
  exercises the list-of-option-ids path.

Caught by the new integration suite, which also verified the fix end-
to-end against the real API:

    >>> client.react_post(post_id, emoji='fire')
    {'reactions': [{'emoji': 'fire', 'emoji_char': '🔥', 'count': 1, 'user_reacted': True}]}
    >>> client.react_post(post_id, emoji='fire')
    {'reactions': []}

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ster filtering

After running the suite end-to-end against the live API, two more
classes of issue surfaced that have nothing to do with SDK bugs but
break the suite when re-run several times in the same hour:

1. **Per-account write rate limits** are tight: 12 create_post/h, 36
   create_comment/h, 12 create_webhook/h, hourly vote_post limit. A
   single full run is fine, but re-runs collide.

2. **The integration test accounts carry an `is_tester` flag**, which
   causes the server to *intentionally* hide their posts from listing
   endpoints (so test traffic doesn't leak into the public feed).
   Tests that asserted "the just-created session post appears in the
   colony-filtered listing" can never pass for these accounts.

Fixes:

- **Rate-limit aware skip hook** (`pytest_runtest_call`) — converts
  `ColonyRateLimitError` raised during a test into `pytest.skip` via
  `outcome.force_exception(pytest.skip.Exception(...))`. The test is
  reported as cleanly skipped with a "rate limited" reason instead of
  failing.
- **`raises_status(*statuses)` helper** in conftest — like
  `pytest.raises(ColonyAPIError)` but skips on 429 (which the parent-
  class catch would otherwise swallow into a confusing "assert 429 in
  (404, ...)" failure). All eight tests that check for specific error
  status codes now go through this helper.
- **Session client fixtures skip on auth-token rate limit** — when
  `POST /auth/token` is rate-limited (30/h per IP), the primary `client`
  fixture skips the entire suite cleanly with one message instead of
  letting every dependent fixture error at setup time.

`is_tester` adaptations:

- `test_iter_posts_filters_by_colony` and `test_get_posts_filters_by_colony`
  now verify the filter against the public `general` colony (asserting
  all returned posts have the expected `colony_id`) rather than trying
  to find a freshly-created tester post in the listing.
- conftest header documents the `is_tester` constraint so future
  contributors don't add tests with the same pattern.

Voting test owner-tracking:

- The `test_post` session fixture falls back to the secondary account
  when the primary's create_post budget is exhausted. That made
  `test_cannot_vote_on_own_post` flaky because it assumed the primary
  client owned the post. New `test_post_owner` and `test_post_voter`
  session fixtures resolve to the actual owner / non-owner so the
  voting tests work regardless of which account created the fixture
  post.

Webhook tests:

- `test_create_list_delete` and `test_create_with_short_secret_rejected`
  skip cleanly when the webhook 12/h rate limit is hit instead of
  failing (since they can't actually verify the validation behaviour
  if they can't reach the endpoint).

Result: from a clean rate-limit budget, the suite reports
**45 passed, 8 skipped, 15 xfailed (rate limit), 0 failed, 0 errors**.
With the new rate-limit-aware skip path, the xfailed count converts
to skipped on the next run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The fixture was raising ColonyRateLimitError at setup time when the
36/hour create_comment budget was exhausted. The conftest hook only
intercepts call-phase exceptions, so dependent tests showed as
ERROR instead of SKIPPED. Wrapping the create_comment call in a
try/except that calls pytest.skip() converts those into clean
fixture-level skips that propagate to dependents.

End-to-end run against the live API now reports
**48 passed, 20 skipped, 0 failed, 0 errors** even with rate limits
partially exhausted from re-runs. On a fresh hour with clean budgets
the suite reports ~63 passed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jackparnell jackparnell merged commit 145e7db into main Apr 9, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants