Stricter checks on network construction / simulation parameters. by thorstenhater · Pull Request #2264 · arbor-sim/arbor

thorstenhater · 2024-04-02T13:21:21Z

Clean-up semantic checks on parameters, ensure connections have finite delay.

bcumming · 2024-04-03T04:56:38Z

In this case, dt > min_delay/2 will lead to invalid output because spikes won't be delivered on time. I see that the desire is to "simulate" a zero-delay connection by picking a very small connection delay, so why not just require that they user selects delay = dt/2?

According to Arbor's model, shouldn't dt > min_delay/2 be a hard error, because the following should always be true:

dt <= min_delay/2
dt > 0
min_delay > 0

thorstenhater · 2024-04-03T08:07:58Z

Hi @bcumming, currently I am looking at alternatives, but so far I haven't anything I can get behind:

Implicitly changing all delays to at least dt. Altering user input without consent
Allocating a minimum delay like EPSILON. Makes for really slow simulations.
The variant above. Incorrect, as you noted.
Your suggestion of throwing on zero delay. The model needs to know about execution parameters.

Out of those, 4. is probably the least worst option.

bcumming · 2024-04-03T08:53:05Z

Do you need the model to know about execution parameters?

Instead, the simulation object could throw during model initialisation, or when simulation::run is called, based on all runtime information. This reflects that while the model might be "correctly" specified, the runtime configuration is not correct.

halfflat · 2024-04-03T09:26:29Z

My feelings about this:

The dt provided to simulation::run should be regarded as firstly a maximum dt and secondly as being up to cell group implementations to interpret in their context of finding a solution (including being ignored).
Maximum dt should be greater than zero (we can throw if not a positive number).
Because of the above, a cell group should be free to internally set its dt in integration to less than the epoch duration.

halfflat · 2024-04-03T09:40:54Z

-        if (std::isnan(delay) || delay < 0)  throw std::out_of_range("Connection delay must be non-negative and infinite in units of [ms].");
+        if (std::isnan(weight)) throw std::domain_error("Connection weight must be finite.");
+        if (std::isnan(delay) || delay <= 0)  throw std::domain_error("Connection delay must be positive, finite, and given in units of [ms].");
    }


We have quite specific exceptions such as arb::bad_connection_source_gid defined in <arbexcept.hpp>; it would be consistent to define some exceptions to throw here that derive from arb::arbor_exception.

I am a bit torn here, std::domain_error fits the basic checks for isnan etc well and has the advantage that it plays nice with pybind11 (it'll automatically convert to ValueError).

Here, one might provide additional information for users who want to have a minimum delay. For example:

if (!std::isfinite(delay) || delay <= 0) throw std::domain_error("Connection delay must be positive, finite, and given in units of [ms]. The minimum possible value is one timestep.");

And another thing: shouldn't there be a machine epsilon for comparisons with float values like delay? To avoid that some value very close to 0, which still meets >0, is used?

if (!std::isfinite(delay) || delay <= 0) throw std::domain_error("Connection delay must be positive, finite, and given in units of [ms]. The minimum possible value is one timestep.");

I decided to follow Sam's argument and practicality here. Knowing dt isn't feasible at connection construction, since that is only declared with sim.run(dt=...). I see the point though and compromise a compromise:
Let sim.run fail if dt > min_delay. For now. If we want to get cutesy later, we can revert that error state and add whatever smarts to the cell groups we desired. @halfflat @jlubo ? (Note that this is the last PR needed for 0.11)

And another thing: shouldn't there be a machine epsilon for comparisons with float values like delay? To avoid that some value very close to 0, which still meets >0, is used?

Comparing against 0 is fine, for this at least, since users likely type 0 by accident and not 1e-306. Also this would be absorbed by the check during run above.

I see, that makes sense, thanks for the clarification.

halfflat · 2024-04-03T09:41:35Z

        source(std::move(src)), target(std::move(dst)), weight(w), delay(d.value_as(U::ms)) {
-        if (std::isnan(weight)) throw std::out_of_range("Connection weight must be finite.");
-        if (std::isnan(delay) || delay < 0)  throw std::out_of_range("Connection delay must be non-negative and infinite in units of [ms].");
+        if (std::isnan(weight)) throw std::domain_error("Connection weight must be finite.");


Message doesn't match test: should this test be !std::isfinite(weight)?

That's better, yes.

halfflat · 2024-04-03T09:56:56Z

-        if (std::isnan(weight)) throw std::out_of_range("Connection weight must be finite.");
-        if (std::isnan(delay) || delay < 0)  throw std::out_of_range("Connection delay must be non-negative and infinite in units of [ms].");
+        if (std::isnan(weight)) throw std::domain_error("Connection weight must be finite.");
+        if (std::isnan(delay) || delay <= 0)  throw std::domain_error("Connection delay must be positive, finite, and given in units of [ms].");


Inasmuch as we're presuming NaNs work as expected, we could just have the test !(delay > 0) instead of std::isnan(delay) || delay <= 0.

Do we really need to enforce that delay is finite? If so, then the test should include that.

Also, not being familiar (yet) with how LLNL units works, why do we need to specify that the quantity is in milliseconds? Can't we just convert as required or else assert in the type?

Well, the two issues go hand in hand: (42 * U.ms).value_as(U.mV) == NaN. So, receiving a nan can mean either we got nan * U.ms or an erroneous unit. This is why the message allows for both.

halfflat · 2024-04-03T09:57:31Z

    gap_junction_connection(cell_global_label_type peer, cell_local_label_type local, double g):
        peer(std::move(peer)), local(std::move(local)), weight(g) {
-        if (std::isnan(weight)) throw std::out_of_range("Gap junction weight must be finite.");
+        if (std::isnan(weight)) throw std::domain_error("Gap junction weight must be finite.");


Similar comments as above: testing finite or testing Nan? We should use an arbor exception.

halfflat · 2024-04-03T10:11:28Z

+    if (!std::isfinite(dt) || dt < eps) throw std::domain_error("simulation: dt must be finite, positive, and in [ms]");
+    if (dt - t_interval_ > eps) throw std::domain_error(util::pprintf("simulation: dt={}ms is larger than epoch length by {}, chose at most half the minimal connection delay {}ms.", dt, dt - t_interval_, t_interval_));
+    if (!std::isfinite(tfinal) || tfinal < eps) throw std::domain_error("simulation: tfinal must be finite, positive, and in [ms]");
+    if (tfinal - epoch_.t1 < dt) throw std::domain_error(util::pprintf("simulation: tfinal={}ms doesn't make progress of least one dt; current time of simulation is {}ms and dt {}ms", tfinal, epoch_.t1, dt));


This seems all a bit fiddly; as mentioned in the general comments, I think we should just leave dt interpretation up to the integrators and they can make a sensible choice, e.g. clip it by epoch duration or interpret very small dt values as being larger, based on whatever they have to do. Having eps here is untidy because it splits the responsibility for sane dt interpretation between the main loop and the integrators.

Also, if someone wants to run arbor for ever, do we really need to stop them? We already have a check for tfinal being less than the end time of the preceding epoch, where we return the correct 'simulated up to' time, so we don't really need tests for it being zero, or negative; in fact a tfinal of zero should be a valid no-op in my opinion.

In short, a isnan test for tfinal remains sufficient, I believe.

In endeffect you are suggesting that instead of throwing an error, the cell groups' individual advance methods should decide. Do enable that, we'd have to pass all possible options, currently min_delay and dt, down the stack. I'd rather make the choice uniformly, especially since #2053.

No we don't have to pass it down the stack: we know epochs are at most min_delay/2 long. The epoch already has all the info they need, and if there is more info it will be in the cell kind global data, which they also have.

#2053 is just cable cells, and I still want to partially revert it so that we can use flexible time steps with a different cable cell integrator, even if it's just to make them line up correctly with epoch intervals.

As an example, if we have the last epoch (ending on tfinal) having duration 3.1 dt, wouldn't it be best to set the fixed dt for that epoch to be e.g. the duration/4 and then we would know that all cell states across cell groups were actually at the same integration time?

Also, if someone wants to run arbor for ever, do we really need to stop them? We already have a check for tfinal being less than the end time of the preceding epoch, where we return the correct 'simulated up to' time, so we don't really need tests for it being zero, or negative; in fact a tfinal of zero should be a valid no-op in my opinion.

Technically I agree. Practically I know that I'd spent way too much time looking for the reason why my simulation did nothing in that situation. Especially since our time_type doesn't distinguish time points from durations, so run(5 *ms) really could mean run until t=5ms or run for 5ms. Thus I think it's friendlier towards the user to tell them 'this is a no-op and you likely didn't mean that'. Especially, as I cannot currently imagine situations where this is a semantically meaningful request.

As a general principle, I think we should keep things general unless there is a good reason not to. I mean, the parameter is literally called tfinal, and we return the simulation state time - it's a simple interface with simple semantics (from the outside) and there's nothing stopping us adding checks on the outside of that if we want to provide more hand holds in the Python interface (though I don't think we should there either).

We can always change the name to run_to.

The semantics of running a simulation from t = 0 for 0 seconds should be that the state reflects the initial conditions. It's not what we'd expect someone to do in normal circumstances, but it's the consistent result, and may arise in circumstances where the simulator is being driven by another process or co-simulation.

Damn you and your arguments :D

This reverts commit 73fb6f5.

jlubo

@thorstenhater please see my comments above. Apart from that, looks good to me.

This pull request fixes issues in the GPU tests introduced in PR #2465 and PR #2264. Changes include: - Fixed the event stream GPU test implementation. - Fixed the handling of the "dt" parameter in the serdes test. These fixes ensure that the GPU tests run reliably again.

Tiny fix for |epoch| < dt.

6a5bdc4

Tighten the checks on simulation::run.

4fb5476

Stricter checks, adjust tests.

12261a2

thorstenhater changed the title ~~Tiny fix for |epoch| < dt.~~ Stricter checks on network construction / simulation parameters. Apr 3, 2024

halfflat reviewed Apr 3, 2024

View reviewed changes

thorstenhater added 17 commits April 3, 2024 13:31

Loosen checks a tiny bit, ensure min_delay is >0.

2739392

Merge, relax check on t_final, and use isfinite.

0893669

Fix tests.

4797b9f

Merge remote-tracking branch 'origin/master' into bug/epoch-lt-dt

b85e8e6

Format.

73acec6

Merge remote-tracking branch 'origin/master' into bug/epoch-lt-dt

1ba6b0c

Clean-up the monkey patching mess

79fe5aa

Resolve an old TODO

73fb6f5

simplify test_spikes

9b33a4c

Need to reset profiler :/

4a20634

This reverts commit 73fb6f5.

lint

f11f0d3

Guard against profiler issues.

57ad45a

Reset after every exception!

0409c41

Maybe?

dc1f7a9

MPI?

3921364

Now?

7f1918b

Proper namespace

84f6fd2

thorstenhater added 3 commits February 3, 2025 19:08

More pro-active clears

071ca95

Even more?

49bd6ed

Now?

25511b3

thorstenhater requested a review from jlubo February 5, 2025 09:32

thorstenhater marked this pull request as ready for review February 5, 2025 09:33

thorstenhater requested a review from halfflat March 1, 2025 11:14

jlubo reviewed Mar 13, 2025

View reviewed changes

thorstenhater added 2 commits March 21, 2025 09:11

Merge remote-tracking branch 'origin/master' into bug/epoch-lt-dt

f4bec3f

Add check dt ./. tau with some fudging.

5964ce6

thorstenhater requested a review from jlubo March 21, 2025 08:40

thorstenhater added 3 commits March 21, 2025 09:49

Adjust brunel

f436fff

Adjust brunel.cpp defaults.

0dde392

Root out last issues

f5abb4f

jlubo approved these changes Apr 1, 2025

View reviewed changes

thorstenhater merged commit b2f1851 into arbor-sim:master Apr 2, 2025

ibanezbm mentioned this pull request Sep 3, 2025

Fix GPU test issues #2479

Merged

Conversation

thorstenhater commented Apr 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bcumming commented Apr 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thorstenhater commented Apr 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bcumming commented Apr 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

halfflat commented Apr 3, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

halfflat Apr 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jlubo left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

thorstenhater commented Apr 2, 2024 •

edited

Loading

bcumming commented Apr 3, 2024 •

edited

Loading

thorstenhater commented Apr 3, 2024 •

edited

Loading

bcumming commented Apr 3, 2024 •

edited

Loading

halfflat Apr 3, 2024 •

edited

Loading