Nested timers#2486
Conversation
|
Hi @marvinKaster, you are running into a CI issue that has been fixed in main. Please rebase and it should be gone. |
…hical_timer # Conflicts: # arbor/communication/communicator.cpp # arbor/simulation.cpp
|
Hi @marvinKaster, #2490 has a fix for the last failing test. Could you give it a review? Best, |
thorstenhater
left a comment
There was a problem hiding this comment.
Hi @marvinKaster,
thanks, that will make life a bit easier. I've left a few suggestions.
| @@ -12,12 +12,16 @@ | |||
| } | |||
|
|
|||
| // leave a profling region | |||
There was a problem hiding this comment.
Possibly a note for later updates. We could also offer a RAII macro here now, in the style of:
struct AutoProf {
AutoProf(const auto& label): lbl(label) { enter(lbl); }
~AutoProf() { leave(lbl); }
std::string lbl;
};Even some macro magick could be used to generate unique names based on __FILE__ and __LINE__.
There was a problem hiding this comment.
See other comment
There was a problem hiding this comment.
Possibly the allocation of strings is problematic. Anyhow, not needed. I played around with tracy, which is extremely nice to work with, but I recon not really playing well with MPI.
| #define PL arb::profile::profiler_leave | ||
| #define PL(name) \ | ||
| { \ | ||
| static std::size_t region_id_ = arb::profile::profiler_region_id(#name); \ |
There was a problem hiding this comment.
Another possible extension: we could record FILE LINE FUNCTION when we set up the region.
There was a problem hiding this comment.
I gave it and AutoProf a try today, and somehow made everything slow. Maybe that is an extension for another PR
There was a problem hiding this comment.
Yes, that was not a request to add it here.
| void recorder::leave(region_id_type index, const std::vector<std::string>& names) { | ||
| if(current_timer_stack[current_timer_stack.size()-1] != index) { | ||
| throw std::runtime_error("recorder::leave without matching recorder::enter Trying to leave "+names[index] + " but currently in "+names[current_timer_stack[current_timer_stack.size()-1]]); | ||
| throw std::runtime_error("recorder::leave without matching recorder::enter Trying to leave "+names[index] + " but currently in "+names[current_timer_stack[current_timer_stack.size()-1]] + + " TimerStack: " + timer_stack_to_string(current_timer_stack, names)); |
There was a problem hiding this comment.
+ + looks off to me?
There was a problem hiding this comment.
Thanks for catching that. Interesting that it compiled with it
There was a problem hiding this comment.
It compiles and work like you wanted it to
https://godbolt.org/z/n8jhnoj7h
There was a problem hiding this comment.
Reason is that C/C++ have unary plus and minus:
int x = 42;
-x; // negate
+x; // same as x
-x + +x; // 0see https://godbolt.org/z/6Thd4PWMv
That it works for strings might tell us something about C++ and operator overloading ;)
There was a problem hiding this comment.
Hmm. I would guess then the string literal is interpreted as char*, which has an unary +.
There was a problem hiding this comment.
Specifically that, yes.
|
Now CI is running into an error on UTF-8 decode? |
|
@thorstenhater I have no idea how removing the '+' in |
|
Possibly an interaction with MPI? |
|
I tried reproducing the error locally, but it works every time for me. A rerun of the job was also successful. It seems to be hard to reproduce. I don't think MPI is involved. As far as I understand, the methods don't call MPI at any point. |
|
The suspicion was that the output gets passed through github's interface and two interleaved MPI ranks produced something weird. |
# Preliminary release notes for v0.12 ## What's Changed * V0.11.0 rc by @thorstenhater in #2454 * Release finalisation by @thorstenhater in #2455 * CSCS CI by @thorstenhater in #2448 * Re-factor connection setup and event dispatch. by @thorstenhater in #2449 * Add tiled busyring by @thorstenhater in #2456 * Fill in a missing part of the docs by @thorstenhater in #2462 * Major Optimisations in Label Resolution by @thorstenhater in #2458 * Refactor domain decomposition by @thorstenhater in #2457 * Add an escape hatch to label resolution by @thorstenhater in #2459 * Fix crash on probding elided values by @thorstenhater in #2466 * Only store PRNG data if needed by @thorstenhater in #2467 * 🐍 Polish `label_dict` by @thorstenhater in #2451 * use better builtins for double max/min by @thorstenhater in #2469 * Remove typed map and refactor cable cell by @thorstenhater in #2468 * Many small refactoring / clean-ups / memory savings by @thorstenhater in #2465 * Bump pb11 to latest major version by @thorstenhater in #2475 * Fix GPU test issues by @ibanezbm in #2479 * Align versions by @thorstenhater in #2481 * Add probing of eX. by @thorstenhater in #2478 * Add trivial compiler test by @thorstenhater in #2488 * Replace MPI_Allgatherv with MPI_Alltoallv for communication pattern by @ibanezbm in #2460 * Use CBRNG in schedules. by @thorstenhater in #2386 * Update year 🎉 by @thorstenhater in #2491 * Don't use deprecated numpy functions by @thorstenhater in #2490 * Co-simulation tutorial by @thorstenhater in #2492 * Nested timers by @marvinKaster in #2486 * Clean-up iclamp interface by @thorstenhater in #2494 * Fix override by @thorstenhater in #2493 * Better use of __restrict__ by @thorstenhater in #2476 * Add AdEx cells. by @thorstenhater in #2230 * units in cv policy by @thorstenhater in #2495 * Flat map in mechanism description for 1.5x speed-up in synapse creation by @thorstenhater in #2501 * Perf/compress contiguous by @thorstenhater in #2470 * Allow more SWC files in the parser by @thorstenhater in #2471 * STDP case for busyring benchmark by @jlubo in #2489 * round robin load balance by @thorstenhater in #2500 ## New Contributors * @ibanezbm made their first contribution in #2479 * @marvinKaster made their first contribution in #2486 **Full Changelog**: v0.11.0...v0.12.0
# Release notes This release constitutes a major improvement in terms of parallel performance by replacing the underlying MPI algorithm (see our paper at IPDPS'26) and setup time by improving label resolution and adding the access to raw labels. On the functional side, we introduced AdEx cells, added a new load balancing method, and probes for the reversal potential. ## What's Changed * V0.11.0 rc by @thorstenhater in #2454 * Release finalisation by @thorstenhater in #2455 * CSCS CI by @thorstenhater in #2448 * Re-factor connection setup and event dispatch. by @thorstenhater in #2449 * Add tiled busyring by @thorstenhater in #2456 * Fill in a missing part of the docs by @thorstenhater in #2462 * Major Optimisations in Label Resolution by @thorstenhater in #2458 * Refactor domain decomposition by @thorstenhater in #2457 * Add an escape hatch to label resolution by @thorstenhater in #2459 * Fix crash on probding elided values by @thorstenhater in #2466 * Only store PRNG data if needed by @thorstenhater in #2467 * 🐍 Polish `label_dict` by @thorstenhater in #2451 * use better builtins for double max/min by @thorstenhater in #2469 * Remove typed map and refactor cable cell by @thorstenhater in #2468 * Many small refactoring / clean-ups / memory savings by @thorstenhater in #2465 * Bump pb11 to latest major version by @thorstenhater in #2475 * Fix GPU test issues by @ibanezbm in #2479 * Align versions by @thorstenhater in #2481 * Add probing of eX. by @thorstenhater in #2478 * Add trivial compiler test by @thorstenhater in #2488 * Replace MPI_Allgatherv with MPI_Alltoallv for communication pattern by @ibanezbm in #2460 * Use CBRNG in schedules. by @thorstenhater in #2386 * Update year 🎉 by @thorstenhater in #2491 * Don't use deprecated numpy functions by @thorstenhater in #2490 * Co-simulation tutorial by @thorstenhater in #2492 * Nested timers by @marvinKaster in #2486 * Clean-up iclamp interface by @thorstenhater in #2494 * Fix override by @thorstenhater in #2493 * Better use of __restrict__ by @thorstenhater in #2476 * Add AdEx cells. by @thorstenhater in #2230 * units in cv policy by @thorstenhater in #2495 * Flat map in mechanism description for 1.5x speed-up in synapse creation by @thorstenhater in #2501 * Perf/compress contiguous by @thorstenhater in #2470 * Allow more SWC files in the parser by @thorstenhater in #2471 * STDP case for busyring benchmark by @jlubo in #2489 * round robin load balance by @thorstenhater in #2500 ## New Contributors * @ibanezbm made their first contribution in #2479 * @marvinKaster made their first contribution in #2486 **Full Changelog**: v0.11.0...v0.12.0
Introducing nested timers. You can now create timed sections within other timed sections. The spent time will be outputted in a hierarchical format.
Code example
Example output of the ring benchmark