Skip to content

Assertion failure in set_cache_prepare_write_action_for_new_request on TSHttpTxnServerAddrSet retry #12971

@bryancall

Description

@bryancall

Description

ATS crashes with a failed assertion when a plugin uses TSHttpTxnServerAddrSet() to set a server address, the connection fails, and ATS retries with caching enabled.

Fatal: src/proxy/http/HttpTransact.cc:3503: failed assertion `s->redirect_info.redirect_in_process`

The crash occurs in set_cache_prepare_write_action_for_new_request() which is called from HandleCacheOpenReadMiss().

Root Cause

The TSHttpTxnServerAddrSet retry logic added in #12733 (for issue #12611) does not account for the cache write lock state. The flow is:

  1. Cache MISS → OSDNSLookup routes to HandleCacheOpenReadMiss
  2. First pass: set_cache_prepare_write_action_for_new_request sets action = PREPARE_TO_WRITE
  3. SM acquires the cache write lock → write_lock_state = SUCCESS, action = WRITE
  4. Origin connection fails
  5. Retry logic fires (line 3936): detects os_addr_style == USE_API, resets DNS state to TRY_DEFAULT, calls CallOSDNSLookup()
  6. DNS resolves → OSDNSLookupcache_lookup_result == MISS → routes back to HandleCacheOpenReadMiss
  7. Second pass: write_lock_state is already SUCCESSset_cache_prepare_write_action_for_new_request hits the assertion at line 3503 because redirect_in_process == false

The assertion only expects write_lock_state == SUCCESS during redirects, but the connection retry is a second legitimate scenario where the write lock is already held.

Why Tests Didn't Catch This

The autest added in #12733 uses enable_cache=False. With caching disabled, OSDNSLookup routes to LookupSkipOpenServer instead of HandleCacheOpenReadMiss, so the cache write lock path is never exercised.

Fix

The assertion in set_cache_prepare_write_action_for_new_request needs to be relaxed to also accept the retry case. When api_server_addr_set_retried is true, the cache key is the same (same URL, different server address), so reusing the write lock is safe:

ink_release_assert(s->redirect_info.redirect_in_process || s->api_server_addr_set_retried);

The existing autest should also add a test run with caching enabled.

Stack Trace

HttpTransact::set_cache_prepare_write_action_for_new_request(HttpTransact::State*)
HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*)
HttpSM::call_transact_and_set_next_state(void (*)(HttpTransact::State*))
HttpSM::state_api_callout(int, void*)
HttpSM::state_api_callback(int, void*)
TSHttpTxnReenable(tsapi_httptxn*, TSEvent)
<plugin>.so
INKContInternal::handle_event(int, void*)
APIHook::invoke(int, void*) const
HttpSM::state_api_callout(int, void*)
HttpSM::state_hostdb_lookup(int, void*)

Key State at Crash

s->redirect_info.redirect_in_process = false
s->cache_info.write_lock_state = SUCCESS
s->cache_info.action = WRITE
s->api_server_addr_set_retried = true
s->response_error = CONNECTION_OPEN_FAILED
s->transact_return_point = HandleCacheOpenReadMiss

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions