Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d1a9c46601
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
0627f69 to
1caca8f
Compare
d1a9c46 to
97533d1
Compare
1caca8f to
4c5ee88
Compare
97533d1 to
adf1dc1
Compare
4c5ee88 to
11d2006
Compare
adf1dc1 to
baabb85
Compare
11d2006 to
903e8e6
Compare
baabb85 to
727e1a4
Compare
903e8e6 to
a222483
Compare
727e1a4 to
c9751cf
Compare
a222483 to
c4665c5
Compare
c9751cf to
621e760
Compare
Collaborator
|
Checklist:
=> ok to merge |
c4665c5 to
8dc6a50
Compare
621e760 to
00ef8f7
Compare
8dc6a50 to
fcf6bff
Compare
Base automatically changed from
duburcqa/split_llvm_adstack_header_size
to
main
April 22, 2026 12:56
00ef8f7 to
cca0a2a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Surface LLVM adstack push/pop overflow as a Python exception
TL;DR
The prior behaviour on LLVM:
stack_pushwrapped pastmax_num_elementsby a silentn++with no check, so overflow produced out-of-bounds primal slots (if the alloca ever extended past its capacity) or plain overwrites of the last slot (if the runtime'sstack_top_primalindexing stopped pointing at valid storage). Either way: silently wrong gradients with no error.New behaviour:
stack_pushbounds-checksn + 1 > max_num_elementsand, on overflow, setsruntime->adstack_overflow_flag = 1(via__atomic_store_n(..., __ATOMIC_RELAXED)for multi-threaded CPU dispatch) and skips the increment instead of wrapping or trapping.stack_popandstack_top_primalclamp atn == 0so post-overflow code continues to execute without reading out of bounds. The flag surfaces the failure at the nextqd.sync()via a newcheck_adstack_overflow()poll that throwsQuadrantsAssertionError.Why
Silent-wrong-gradient is the worst failure mode: no error, no crash, just a wrong answer that propagates through downstream numerical code and may take days of optimizer misbehaviour to diagnose. The fix has to be loud at sync time, but it can't throw during process teardown (where
std::terminate()replaces the Python exception with an abort). Both constraints shape the design.Mechanism
Runtime-side flag
A second
i64on the runtime struct, intentionally disjoint fromerror_code:error_codeis tied toassert_failedand only polled whencompile_config.debugis on.adstack_overflow_flagis polled unconditionally at everysynchronize()because silent wrong gradients are too dangerous to gate on a debug flag.stack_push/stack_pop/stack_top_primalThe
__ATOMIC_RELAXEDstore is required because multiple CPU worker threads may race on the same flag (thread pool dispatch over a multi-element field). Plain unsynchronized stores from multiple threads to the same object are a data race even when all writers store the same value — the compiler is free to tear the write. Relaxed atomic store compiles to a regular naturally-aligned store on x86_64 / arm64 but satisfies the C++11 memory model. The host only reads the flag after the thread pool has joined, so no ordering beyond "happens eventually" is required.Host-side poll
LlvmRuntimeExecutor::check_adstack_overflow():runtime_retrieve_and_reset_adstack_overflow(runtime)— a one-shot JIT call that__atomic_exchange_ns the flag to 0 and writes the old value intoresult_buffer[quadrants_result_buffer_error_id].QuadrantsAssertionErrorwith a message pointing users atdefault_ad_stack_size.materialize_runtime(): no-op whenllvm_runtime_orresult_buffer_cache_is null.Teardown safety
LlvmProgramImpl::synchronize()callscheck_adstack_overflow()unconditionally — except during teardown, where a re-raise wouldstd::terminate()out of~Program(). The teardown guard:Program::finalize()invokesprogram_impl_->pre_finalize()before the two teardown syncs, so the flag is already true when those syncs run. The defensive re-assignment infinalize()is belt-and-suspenders: moving the assignment back intofinalize()alone would silently re-introduce thestd::terminate()teardown bug (the header's field-level comment spells this out).Codegen-side plumbing
TaskCodeGenLLVM::visit(AdStackPushStmt)now threadsget_runtime()as the first arg tostack_push, matching the new signature. Everything else in the codegen is unchanged — push/pop/top still take the opaquePtrand don't need to know about the flag.Program-level hook
Program::finalize()invokes a newprogram_impl_->pre_finalize()hook before its two teardown syncs. A matching no-opvirtual void pre_finalize() {}is added toProgramImplso non-LLVM backends are unaffected.Internal sanity test
internal_functions.h::test_stack()is extended to exercise the new contract: push past capacity, verifynstays atmax_num_elementsand the flag flips; over-pop pastn == 0, verifynstays at 0 andstack_top_primalclamps at slot 0; reset the flag so subsequent tests in the same fixture are not poisoned. Also plugs the pre-existingnew u8[132]leak with a matchingdelete[] stack;beforereturn 0;.Tests
Five new tests in
tests/python/test_adstack.py, all using a shared_overflowing_compute(n_elements, n_iter=64)helper (64 iterations + 2 setup pushes = 66 pushes, comfortably above every test'sad_stack_size=32):test_adstack_overflow_raises— overflow surfaces as(AssertionError, RuntimeError)atqd.sync(), not silently.test_adstack_overflow_flag_resets_after_catch— after the first raise, a subsequentqd.sync()with no new grad launch returns normally.test_adstack_large_capacity_resolves_overflow— raisingad_stack_sizeto 1024 lets the same kernel run to completion with the correct gradient (pins the remediation path the error message points users at).test_adstack_overflow_multithreaded— 16-element field so several threads overflow in parallel. Asserts the raise is still clean (no deadlock, no crash, one exception regardless of how many threads flipped the bit).test_adstack_overflow_during_teardown_does_not_abort— runs the overflowing grad in a child process and intentionally never callsqd.sync(). Child must exit with returncode 0 (thepre_finalizeguard suppresses the poll during teardown), not SIGABRT.Side-effect audit
compile_config.debug__atomic_store_n(..., __ATOMIC_RELAXED)pre_finalize()flips flag before teardown syncstest_adstack_overflow_during_teardown_does_not_abort.popclamps at 0,top_primalclamps at slot 0Program::result_buffer(no trailing underscore)result_buffer_cache_comment after claude bot flagged it.test_stackmemory leakdelete[] stack;addedStack
Autodiff 8 of 13. Second commit of the "LLVM adstack safety" triplet split. Based on #534 (header size). Followed by #495 (codegen budget guard).