Skip to content

feat(int-evolution-go): exponential backoff + jitter on api_request (PR-2)#79

Open
mt-alarcon wants to merge 3 commits into
evolution-foundation:mainfrom
mt-alarcon:feat/wpp-retry-pr2-clean
Open

feat(int-evolution-go): exponential backoff + jitter on api_request (PR-2)#79
mt-alarcon wants to merge 3 commits into
evolution-foundation:mainfrom
mt-alarcon:feat/wpp-retry-pr2-clean

Conversation

@mt-alarcon
Copy link
Copy Markdown

@mt-alarcon mt-alarcon commented May 13, 2026

Context

Second of a 3-PR series implementing the WhatsApp retry pattern. Builds on #78 (PR-1: idempotency_key migration + silent dedup) by adding resilient retry logic to the Evolution Go API client.

Problem

api_request() in evolution_go_client.py issues a single HTTP call and calls sys.exit(1) on any error. This means:

  • Transient errors (5xx, network blips, timeouts) cause immediate failure with no retry
  • Library callers (e.g. ADW routines, dashboard handlers) can't catch the error — the process just exits
  • WhatsApp message delivery fails on transient infrastructure hiccups

Solution

New _retry_http_call_client(do_call, max_attempts=3, base_delay=2.0, max_delay=8.0) helper:

  • Retries on HTTP 5xx, urllib.error.URLError, socket.timeout (transient categories)
  • Raises immediately on HTTP 4xx (deterministic client errors — retrying would mask bugs)
  • Exponential backoff with jitter: min(2^attempt + uniform(0, 0.5), max_delay) per attempt
  • Structured JSON logs at retry/failed events (evt: api_request_retry / api_request_failed)
  • Returns the result of do_call() on success; raises the last exception after max_attempts

api_request() refactored to wrap the actual HTTP call in _do_call and delegate to the retry helper. Removes inline sys.exit(1) so library callers can handle errors. main() wraps handler invocation in try/except to preserve the exact CLI behavior (exit 1 on failure) while letting library users propagate.

Tests

tests/whatsapp/test_retry_backoff.py (153 lines, 4 tests):

  1. HTTP 500 x3 → retries then raises — verifies max_attempts honored
  2. HTTP 400 → raises immediately, no retry — verifies 4xx not retried
  3. URLError x3 → retries then raises — verifies network errors retried
  4. Success on 3rd attempt → returns result — verifies happy path through retries

All 4 pass:

Ran 4 tests in 0.026s
OK

Worst-case latency

3 attempts, all 5xx: sleep(1.0±0.5) + sleep(2.0±0.5) ≈ 3.5–4.5s total. Capped at max_delay=8s per attempt. Acceptable for an API call retry on a user-initiated command.

Verification

  • python tests/whatsapp/test_retry_backoff.py → 4/4 pass
  • ast.parse(evolution_go_client.py) → OK
  • Backward compatibility: existing CLI behavior preserved (main() exits 1 on HTTPError/URLError as before)

Series

  • PR-1 #78: idempotency_key migration + silent dedup
  • PR-2 (this): exponential backoff + jitter
  • ⏳ PR-3 (next): DLQ classification + UI Replay + instrumentation

Summary by Sourcery

Introduce resilient, retryable Evolution Go API client behavior with exponential backoff, while wiring idempotent trigger execution support and database schema needed for WhatsApp-style retries.

New Features:

  • Add exponential backoff with jitter helper for Evolution Go HTTP calls and apply it in api_request so transient failures are retried instead of exiting immediately.
  • Preserve CLI behavior by handling HTTP and network errors in main() while allowing library callers to catch exceptions.
  • Add idempotency-key based deduplication for trigger webhook executions, including silent handling of duplicate or racing requests.
  • Expose new trigger execution metadata fields (idempotency_key, error_category, last_replay_at) through the data model for future retry and classification logic.

Bug Fixes:

  • Prevent duplicate trigger executions under concurrent identical webhook deliveries by guarding with both application-level checks and a database uniqueness constraint.

Enhancements:

  • Auto-migrate the trigger_executions table to add idempotency and retry-related columns and indexes in a backward compatible way.
  • Improve logging around idempotent webhook replays and HTTP retry behavior for better observability.

Tests:

  • Add unit tests covering Evolution Go api_request retry behavior across 5xx responses, 4xx responses, network errors, and eventual success.

Marcello Alarcon and others added 2 commits May 13, 2026 17:22
…ps 1+2)

Step 1 — Migration (models.py + app.py):
- TriggerExecution ganha 3 colunas nullable: idempotency_key, error_category, last_replay_at
- to_dict() exposto com os 3 campos novos
- Auto-migrate idempotente no startup: ALTER TABLE + IF NOT EXISTS em cada bloco
- Partial unique index uq_trigger_idem (trigger_id, idempotency_key) WHERE NOT NULL
- Basic index ix_trigger_executions_idem_key para lookups por key
- SQLite 3.51 confirmado — partial index nativo; EXPLAIN QUERY PLAN confirma uso do índice

Step 2 — Silent dedup (triggers.py):
- webhook_receiver extrai idem_key de idempotency_key / messageId / data.messageId
- Se key já existe: log idempotent_replay + 200 OK silencioso (pattern F6)
- Race condition: IntegrityError no db.commit() → rollback + 200 OK silencioso
- Legado (GitHub, Stripe, Linear): sem key → idem_key=None → fluxo normal inalterado
- Limpeza: current_app movido para import no topo; imports inline removidos

Testes passados: migration up/down idempotente, partial index unicidade, NULLs livres,
extração de key (6 casos), race condition via IntegrityError, EXPLAIN QUERY PLAN.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…PR-2)

Adds resilient retry logic to the Evolution Go API client. Transient HTTP
errors (5xx, URLError, socket.timeout) now retry up to 3 times with
exponential backoff + jitter; HTTP 4xx errors raise immediately (no retry,
deterministic client errors).

## Changes

**`.claude/skills/int-evolution-go/scripts/evolution_go_client.py`:**
- New helper `_retry_http_call_client(do_call, max_attempts=3, base_delay=2.0,
  max_delay=8.0)` — generic retry wrapper with exponential backoff + jitter.
  Logs structured JSON events (`api_request_retry` / `api_request_failed`).
- Refactor `api_request` to wrap the actual HTTP call in `_do_call` and
  delegate to the retry helper. Removes inline try/except/sys.exit so library
  callers can handle errors; the CLI `main()` catches and exits as before.
- Refactor `main()` to wrap handler invocation in try/except for
  `urllib.error.HTTPError` / `URLError` / `socket.timeout` — preserves the
  exact CLI exit-1-on-failure behavior while letting library users propagate.

**`tests/whatsapp/test_retry_backoff.py` (new):**
4 synthetic tests covering:
  1. HTTP 500 x3 → retries then raises (TestApiRequestRetry)
  2. HTTP 400 → raises immediately, no retry
  3. URLError x3 → retries then raises
  4. Success on third attempt → returns result, call_count=3

All 4 tests pass against the modified `api_request`.

## Worst-case latency

3 attempts with all 5xx: sleep ≤ 2^0 + 0.5 + 2^1 + 0.5 = 5.5s total
(capped at max_delay=8s per attempt). Acceptable for an API call retry.

## Series

- PR-1 (evolution-foundation#78 — merged or pending): idempotency_key migration + silent dedup
- **PR-2 (this):** exponential backoff + jitter
- PR-3 (next): DLQ classification + UI Replay + instrumentation
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 13, 2026

Reviewer's Guide

Adds resilient exponential backoff + jitter retry logic around Evolution Go API HTTP calls, refactors CLI error handling to raise instead of exiting for library callers while preserving CLI behavior, and wires in idempotent trigger execution support and schema/index migration plus tests for the new retry behavior.

Sequence diagram for Evolution Go API request retry and CLI error handling

sequenceDiagram
    actor User
    participant CLI as main
    participant Handler
    participant Client as api_request
    participant Retry as _retry_http_call_client
    participant HTTP as urllib_request_urlopen

    User->>CLI: invoke command
    CLI->>Handler: handler(args)
    Handler->>Client: api_request(method, path, data)
    Client->>Retry: _retry_http_call_client(_do_call)

    loop attempts
        Retry->>HTTP: _do_call() / urlopen(req)
        alt transient_error
            HTTP-->>Retry: raise HTTPError 5xx or URLError or timeout
            Retry->>Retry: time.sleep(delay)
        else success
            HTTP-->>Retry: response
            Retry-->>Client: parsed JSON
            Client-->>Handler: result
            Handler-->>CLI: success
            CLI-->>User: success output
        end
    end

    alt persistent_failure
        Retry-->>Client: raise last exception
        Client-->>Handler: propagate exception
        Handler-->>CLI: propagate exception
        CLI->>CLI: catch HTTPError/URLError/timeout
        CLI-->>User: print JSON error
        CLI->>CLI: sys.exit(1)
    end
Loading

Entity relationship diagram for Trigger and TriggerExecution idempotent execution support

erDiagram
    Trigger {
        int id PK
    }

    TriggerExecution {
        int id PK
        int trigger_id FK
        text event_data
        string status
        string idempotency_key
        string error_category
        datetime last_replay_at
    }

    Trigger ||--o{ TriggerExecution : has_executions
Loading

File-Level Changes

Change Details Files
Introduce reusable exponential backoff + jitter retry helper for Evolution Go API HTTP calls and refactor api_request to use it instead of exiting the process.
  • Add _retry_http_call_client helper that retries on HTTP 5xx, URLError, and socket.timeout with exponential backoff + jitter, and logs structured JSON events for retries and failures.
  • Refactor api_request to wrap the urllib.request.urlopen call in an inner _do_call and delegate execution to the retry helper, returning results or raising the final exception instead of calling sys.exit.
  • Update the main entrypoint to wrap command handlers in try/except, converting raised HTTPError/URLError into the same JSON error output and exit(1) that the previous inline sys.exit behavior provided.
.claude/skills/int-evolution-go/scripts/evolution_go_client.py
Complete idempotent trigger execution behavior in the webhook receiver using the new idempotency_key schema and handle race conditions against the DB-level unique index.
  • Extract an idempotency key from incoming webhook payloads using several WhatsApp-related fields and skip re-execution when a matching TriggerExecution already exists, logging an idempotent replay event and returning 200.
  • Persist the derived idempotency key on new TriggerExecution rows and catch IntegrityError on commit to handle unique index races, rolling back and returning 200 with a dedicated log event when a duplicate insert occurs.
  • Clean up redundant local imports of current_app now that it is imported at the module level.
dashboard/backend/routes/triggers.py
Apply database auto-migration to support idempotency and retry metadata on trigger executions, including partial unique index for deduplication.
  • Extend the auto-migrate routine to add idempotency_key, error_category, and last_replay_at columns to trigger_executions if missing, committing after each ALTER TABLE.
  • Create a non-unique index on idempotency_key and a partial unique index on (trigger_id, idempotency_key) where idempotency_key IS NOT NULL, with defensive try/except around index creation for robustness.
  • Document rollback instructions and relationship to the WhatsApp retry pattern in comments around the migration block.
dashboard/backend/app.py
Expose new idempotency and retry metadata fields on the TriggerExecution ORM model and its serialized representation.
  • Augment TriggerExecution with nullable idempotency_key, error_category, and last_replay_at columns and annotate status comment to include failed_retryable as a possible state.
  • Include the new fields in TriggerExecution.to_dict, formatting last_replay_at in the same ISO8601 style used for other datetime fields.
dashboard/backend/models.py
Add unit tests to validate retry/backoff behavior of api_request and the underlying helper, including retry counts and no-retry semantics for 4xx errors.
  • Introduce test_retry_backoff.py, setting up import paths to evolution_go_client and helper functions to construct mock HTTP responses and HTTPError instances.
  • Write tests that verify api_request retries and then raises on repeated HTTP 500 and URLError, does not retry on HTTP 400, and successfully returns JSON on success after transient failures, while patching time.sleep to avoid real delays.
tests/whatsapp/test_retry_backoff.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In _retry_http_call_client, consider validating max_attempts >= 1 (and/or initializing a sensible default exception) so you never reach raise last_exc with last_exc still None if the helper is called with max_attempts=0.
  • The retry branches for 5xx HTTPError and (URLError, socket.timeout) duplicate the delay calculation and logging; factoring this into a small internal helper would reduce repetition and make it easier to tweak the retry behavior or log schema in one place.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `_retry_http_call_client`, consider validating `max_attempts >= 1` (and/or initializing a sensible default exception) so you never reach `raise last_exc` with `last_exc` still `None` if the helper is called with `max_attempts=0`.
- The retry branches for 5xx `HTTPError` and `(URLError, socket.timeout)` duplicate the delay calculation and logging; factoring this into a small internal helper would reduce repetition and make it easier to tweak the retry behavior or log schema in one place.

## Individual Comments

### Comment 1
<location path=".claude/skills/int-evolution-go/scripts/evolution_go_client.py" line_range="50-59" />
<code_context>
+def _retry_http_call_client(do_call, max_attempts=3, base_delay=2.0, max_delay=8.0):
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Guard against max_attempts <= 0 and make base_delay influence the first retry delay

Two robustness points in `_retry_http_call_client`:

1) For `max_attempts <= 0`, the loop never executes and `last_exc` stays `None`, so `raise last_exc` will fail with a `TypeError`. A guard like `if max_attempts < 1: raise ValueError("max_attempts must be >= 1")` makes this case explicit.

2) The current backoff uses `base_delay ** attempt`, so the first retry delay is always `1` (`attempt == 0`). To let `base_delay` control the first delay and follow common exponential backoff patterns, consider `delay = min(base_delay * (2 ** attempt) + jitter, max_delay)` (or start the exponent at 1).

Suggested implementation:

```python
def _retry_http_call_client(do_call, max_attempts=3, base_delay=2.0, max_delay=8.0):
    """Exponential backoff + jitter for Evolution Go API calls.

    Retries on HTTP 5xx, urllib.error.URLError, and socket.timeout (transient).
    NEVER retries on HTTP 4xx (deterministic client errors).

    Returns the result of do_call() on success.
    Raises the last exception after max_attempts are exhausted.
    Raises immediately on HTTP 4xx (no retry).
    """
    if max_attempts < 1:
        raise ValueError("max_attempts must be >= 1")

    last_exc = None

```

I can’t see the body of the retry loop, but to make `base_delay` influence the first retry delay, you should update the backoff calculation wherever it’s computed. For example, if you currently have something like:

```python
delay = min(base_delay ** attempt + jitter, max_delay)
```

or:

```python
delay = min(base_delay ** attempt, max_delay)
```

change it to a standard exponential-backoff-with-jitter pattern:

```python
delay = min(base_delay * (2 ** attempt) + jitter, max_delay)
```

This ensures:
- On the first retry (`attempt == 0`), the delay is approximately `base_delay` (plus jitter).
- Subsequent retries double the delay (capped by `max_delay`), maintaining the intended exponential backoff while making `base_delay` meaningful from the first retry.
Make sure this change is applied consistently wherever the retry delay is calculated in `_retry_http_call_client`.
</issue_to_address>

### Comment 2
<location path="tests/whatsapp/test_retry_backoff.py" line_range="125-85" />
<code_context>
+    def test_success_on_third_attempt_returns_result(self):
</code_context>
<issue_to_address>
**suggestion (testing):** Consider asserting sleep is called exactly twice on the failing attempts

Since time.sleep is patched, you can assert it’s called exactly twice in this success-on-third-attempt case (after the first two failures, but not after the third). This more directly verifies the backoff behavior when the call eventually succeeds.

Suggested implementation:

```python
            with patch("urllib.request.urlopen", side_effect=_mock_urlopen):
                with patch.object(_client.time, "sleep") as mock_sleep:
                    with self.assertRaises(urllib.error.URLError):
                        _client.api_request("GET", "/instance/status")

        self.assertEqual(call_count, 3)
        self.assertEqual(mock_sleep.call_count, 2)

```

From the provided snippet I can’t see the full body of `test_success_on_third_attempt_returns_result`. To fully implement your suggestion for that specific test, you should:

1. Locate the `with patch.object(_client.time, "sleep"):` context inside `test_success_on_third_attempt_returns_result`.
2. Change it to `with patch.object(_client.time, "sleep") as mock_sleep:`.
3. After the call to `_client.api_request(...)` and any assertions on the returned result / `call_count`, add:
   ```python
   self.assertEqual(mock_sleep.call_count, 2)
   ```
   This ensures you assert `time.sleep` is called exactly twice in the “success on third attempt” case (after the first two failures, but not after the third).
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread .claude/skills/int-evolution-go/scripts/evolution_go_client.py
_client.api_request("GET", "/instance/status")

self.assertEqual(ctx.exception.code, 500)
self.assertEqual(call_count, 3)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider asserting sleep is called exactly twice on the failing attempts

Since time.sleep is patched, you can assert it’s called exactly twice in this success-on-third-attempt case (after the first two failures, but not after the third). This more directly verifies the backoff behavior when the call eventually succeeds.

Suggested implementation:

            with patch("urllib.request.urlopen", side_effect=_mock_urlopen):
                with patch.object(_client.time, "sleep") as mock_sleep:
                    with self.assertRaises(urllib.error.URLError):
                        _client.api_request("GET", "/instance/status")

        self.assertEqual(call_count, 3)
        self.assertEqual(mock_sleep.call_count, 2)

From the provided snippet I can’t see the full body of test_success_on_third_attempt_returns_result. To fully implement your suggestion for that specific test, you should:

  1. Locate the with patch.object(_client.time, "sleep"): context inside test_success_on_third_attempt_returns_result.
  2. Change it to with patch.object(_client.time, "sleep") as mock_sleep:.
  3. After the call to _client.api_request(...) and any assertions on the returned result / call_count, add:
    self.assertEqual(mock_sleep.call_count, 2)
    This ensures you assert time.sleep is called exactly twice in the “success on third attempt” case (after the first two failures, but not after the third).

mt-alarcon pushed a commit to mt-alarcon/evo-nexus that referenced this pull request May 13, 2026
Endereça review do Sourcery em PR evolution-foundation#79:

- Adiciona `ValueError` se `max_attempts < 1`. Antes, com max_attempts=0 o
  loop nunca executava e `raise last_exc` quebrava com TypeError porque
  last_exc permanecia None.
- Troca `base_delay ** attempt` por `base_delay * (2 ** attempt)`. Antes,
  com base_delay=2.0, a primeira retry sempre tinha delay ~1s (2^0=1)
  independente do base_delay configurado — o parâmetro era ignorado no
  primeiro retry. Agora segue o padrão clássico de exponential backoff:
  primeira retry ~= base_delay, dobra a cada attempt.

Worst-case latency (3 attempts, all 5xx) com base_delay=2.0, max_delay=8.0:
  attempt 0: ~2-2.5s, attempt 1: ~4-4.5s → total ~6.5s, dentro do budget
  de 8s definido nos critérios de aceite.

Testes existentes em tests/whatsapp/test_retry_backoff.py passam sem
alteração (asseguram call_count e tipo de exceção; sleep é mockado e não
afirmava delays específicos).
mt-alarcon pushed a commit to mt-alarcon/evo-nexus that referenced this pull request May 13, 2026
…e sleep

Polish do review do Sourcery em PR evolution-foundation#79 (sugestões code-quality):

1. Extrai `_backoff_retry_or_log_final` para eliminar duplicação entre
   os branches 5xx (HTTPError) e URLError/socket.timeout do retry loop.
   Antes, 18 linhas idênticas (compute delay + print retry log + sleep
   OU print failed log) apareciam duas vezes — qualquer ajuste em log
   schema ou fórmula precisava ser feito nos dois lugares. Agora há um
   ponto único; as chaves variantes do log vão como `extras` dict.

   Comportamento idêntico: testes existentes passam sem alteração e os
   logs JSON emitidos batem com a versão anterior (verificado via
   stdout dos tests).

2. Em `test_success_on_third_attempt_returns_result`, captura
   `mock_sleep` e assertiona `call_count == 2`. Antes o sleep estava
   mockado mas não verificado, então não havia garantia de que o
   backoff não estava sendo chamado também após o success. Agora o
   teste verifica que sleep só roda em failures retriadas, nunca
   depois de sucesso.

Sem mudança funcional — só refactor e cobertura de teste mais rigorosa.
mt-alarcon pushed a commit to mt-alarcon/evo-nexus that referenced this pull request May 13, 2026
… índice + logs

Consolidado dos fixes pedidos pelo Sourcery no review do PR evolution-foundation#78:

1. `except IntegrityError` restrito à violação de idempotência
   (dashboard/backend/routes/triggers.py)

   Antes, qualquer IntegrityError no `webhook_receiver` virava 200 OK
   silencioso — tratado como "replay idempotente". Outros problemas
   (NOT NULL, FK quebrada, novo unique constraint adicionado depois)
   ficariam mascarados como sucesso e o cliente nunca saberia que o
   evento não foi processado.

   Agora o catch só absorve o erro se:
     a) `idem_key` está definido (sem chave de idempotência não há como
        ser violação de uq_trigger_idem)
     b) A mensagem do erro do driver menciona `uq_trigger_idem` ou
        `idempotency_key` (constraint específica)

   Para os demais IntegrityError, loga com contexto completo
   (trigger_id, key, mensagem original) e re-raise — webhook retorna
   500 e o erro fica visível em logs em vez de virar 200 silencioso.

2. Remove índice redundante (models.py)

   `TriggerExecution.idempotency_key` estava criando 3 índices pra
   mesma coluna:
     - `ix_trigger_executions_idempotency_key` (auto-gerado via
       index=True no model)
     - `ix_trigger_executions_idem_key` (raw SQL no startup)
     - `uq_trigger_idem` (partial unique, raw SQL)

   Mantemos só os dois últimos — explicitamente criados pela migration
   de startup, com nomes versionados no rollback plan. `index=True`
   removido da definição do model.

3. Loga falha de CREATE INDEX em vez de silenciar (app.py)

   Substitui `except Exception: pass` por log estruturado nos dois
   `CREATE INDEX` (basic e partial unique). Antes, qualquer falha
   (ex.: versão de SQLite sem suporte a partial index) era engolida
   silenciosamente — operador só descobriria no primeiro race.
   Agora loga `sqlite_lib`, `sqlite_runtime` e a exceção original.

Sintaxe validada com `python3 -m py_compile`. Sem teste dedicado pro
path nessa branch (testes WPP só entram nos PRs evolution-foundation#79/evolution-foundation#80).
… refactor

Consolidado dos fixes pedidos pelo Sourcery no review do PR evolution-foundation#79:

1. Guard `max_attempts >= 1`
   (`.claude/skills/int-evolution-go/scripts/evolution_go_client.py`)

   Antes, com `max_attempts=0` o loop nunca executava e
   `raise last_exc` quebrava com TypeError porque last_exc permanecia
   None. Adicionado `ValueError("max_attempts must be >= 1")` no
   início para falhar explicitamente.

2. Corrige fórmula de backoff

   Antes: `min(base_delay ** attempt + jitter, max_delay)`
   Com `base_delay=2.0`, a primeira retry sempre tinha delay ~1s
   (2^0=1) independente do `base_delay` configurado — o parâmetro era
   ignorado no primeiro retry.

   Agora: `min(base_delay * (2 ** attempt) + jitter, max_delay)`
   Segue o padrão clássico de exponential backoff: primeira retry
   ≈ base_delay, dobra a cada attempt.

   Worst-case latency (3 attempts, all 5xx) com base_delay=2.0,
   max_delay=8.0: attempt 0 ~2-2.5s, attempt 1 ~4-4.5s → total ~6.5s,
   dentro do budget de 8s definido nos critérios de aceite.

3. Refactor — extrai `_backoff_retry_or_log_final`

   Os branches 5xx (HTTPError) e URLError/socket.timeout tinham 18
   linhas idênticas (compute delay + print retry log + sleep OU
   print failed log). Extraído em helper único; chaves variantes do
   log (`http_status` vs `error`) vão como `extras` dict.
   Comportamento idêntico — logs JSON batem com a versão anterior.

4. Reforça asserts em `test_success_on_third_attempt_returns_result`
   (tests/whatsapp/test_retry_backoff.py)

   Captura `mock_sleep` e assertiona `call_count == 2`. Antes, sleep
   estava mockado mas não verificado — não havia garantia de que
   backoff não estava sendo chamado também após o success. Agora
   garante que sleep só roda em failures retriadas, nunca depois de
   sucesso.

Todos os 4 testes em tests/whatsapp/test_retry_backoff.py passam.
@mt-alarcon mt-alarcon force-pushed the feat/wpp-retry-pr2-clean branch from 829aea6 to b93aa65 Compare May 13, 2026 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant