Skip to content

Fix logging, tracing and tracy#217

Merged
Bisht13 merged 3 commits intomainfrom
remco/tracing
Oct 16, 2025
Merged

Fix logging, tracing and tracy#217
Bisht13 merged 3 commits intomainfrom
remco/tracing

Conversation

@recmo
Copy link
Copy Markdown
Contributor

@recmo recmo commented Oct 14, 2025

Before, the default features disabled all logging. To enable even basic logging you'd had to enable profilng which also enabled a very expensive Tracy stack trace for each allocation, blowing up runtime by 4x. This changes:

  • Logging is always enabled.
  • Use the re-exported crates from tracing_tracy to avoid version mismatch.
  • Make allocator counters an optional feature, but default since it is low overhead.
  • Make Tracy an optional dependency.
  • Make Tracy allocator tracing a runtime option, with configurable depth.

ptr
}

unsafe fn realloc(&self, ptr: *mut u8, old_layout: Layout, new_size: usize) -> *mut u8 {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be reporting the deallocation as well.

Suggested change
unsafe fn realloc(&self, ptr: *mut u8, old_layout: Layout, new_size: usize) -> *mut u8 {
unsafe fn realloc(&self, ptr: *mut u8, old_layout: Layout, new_size: usize) -> *mut u8 {
self.tracy_dealloc(ptr);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm following the example of tracy-client here:

https://github.com/nagisa/rust_tracy_client/blob/45cd2681f3083f0bef6fdfb117e575c0905bdf8f/tracy-client/src/lib.rs#L292

It made sense to me since the ptr uniquely identifies the allocation until it gets freed. So if the client sees a multiple alloc events on the same ptr it must mean a realloc. (Whether the Tracy client actually does this, I don't know).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the Tracy doc has a different opinion then what the tracy-cleint crate does:

Screenshot 2025-10-15 at 16 33 51 Screenshot 2025-10-15 at 16 34 20

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.current
.fetch_add(size, Ordering::SeqCst)
.wrapping_add(size);
self.max.fetch_max(current, Ordering::SeqCst);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, between fetch_add and fetch_max, another thread could update max, causing the local current calculation to be stale, we can avoid this by using fetch_update

Suggested change
self.max.fetch_max(current, Ordering::SeqCst);
self.max.fetch_update(Ordering::SeqCst, Ordering::SeqCst, |max_val| {
let new_current = self.current.load(Ordering::SeqCst);
(new_current > max_val).then_some(new_current)
}).ok();

Copy link
Copy Markdown
Contributor Author

@recmo recmo Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about this, it should be fine! The current value computed here is always accurate at that sequence point. Some thread might compute an earlier or a later value of current, but the max operation is a join, so doesn't matter if it gets sequenced as

self.max.fetch_max(current_old, Ordering::SeqCst);
self.max.fetch_max(current_new, Ordering::SeqCst);
self.max.fetch_max(current_new, Ordering::SeqCst);
self.max.fetch_max(current_old, Ordering::SeqCst);

The end result is going to be that max contains max(old_max, current_old, current_new). Similarly if there are more parallel threads.

The one exception to this may be if a concurrent thread resets the maximum.

/// enable tracy allocation tracking with provided stack depth
#[cfg(feature = "tracy")]
#[argh(option)]
pub tracy_allocations: Option<usize>,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UX improvement

Suggested change
pub tracy_allocations: Option<usize>,
#[argh(option, default = "0")]
pub tracy_allocations: usize,

and in main.rs

if args.tracy {
    if args.tracy_allocations > 0 {
        info!("Tracy enabled with allocation tracking (depth {}).", args.tracy_allocations);
        ALLOCATOR.enable_tracy(args.tracy_allocations);
    } else {
        info!("Tracy enabled (no allocation tracking).");
    }
}

Copy link
Copy Markdown
Contributor Author

@recmo recmo Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's actually a difference between 'off' and 'zero depth'. With zero depth we still collect allocation events, but we do not include stack traces (in fact, there is special code in the Allocator to handle this). With 'off' we do not collect allocation events in Tracy at all.

Comment thread tooling/cli/src/main.rs Outdated
unsafe {
tracy_client_sys::___tracy_shutdown_profiler();
//
// @recmo: This is not safe. tracing_tracy may still
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to handle this we can do one of the two things,

  1. Remove it
    Pro: No crashes. Con: Lose last ~100ms of trace data.
  2. Grace period (~100 ms)
    Pro: Usually works. Con: Still has race condition, just smaller window.

Copy link
Copy Markdown
Contributor Author

@recmo recmo Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Con: Lose last ~100ms of trace data.

It's worse unfortunately. There could be an unbounded backlog of data still to be transmitted. This actually happened all the time before this PR when we did 100-depth stacktraces on each allocation (times 90M allocations is gigabytes of data to transfer over what seems a very poorly implemented IPC bus).

I checked, but there doesn't seem to be a 'flush' function that keeps the Tracy client alive, but clears the backlog.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-10-15 at 16 54 46 Screenshot 2025-10-15 at 16 55 09

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the best solution here is to manually implement TRACY_NO_EXIT with something like "Keeping alive for Tracy connections. Press any key to exit."

@recmo recmo requested a review from Bisht13 October 15, 2025 15:23
@Bisht13
Copy link
Copy Markdown
Collaborator

Bisht13 commented Oct 16, 2025

LGTM

@Bisht13 Bisht13 merged commit cbcdd07 into main Oct 16, 2025
4 of 5 checks passed
@Bisht13 Bisht13 deleted the remco/tracing branch October 16, 2025 02:36
Bisht13 added a commit that referenced this pull request Feb 14, 2026
Bisht13 added a commit that referenced this pull request Feb 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants