Skip to content

Split leaf and root dynamics optimization notebooks#69

Merged
SarahAlidoost merged 12 commits into
mainfrom
split_nbs
Jan 9, 2026
Merged

Split leaf and root dynamics optimization notebooks#69
SarahAlidoost merged 12 commits into
mainfrom
split_nbs

Conversation

@SarahAlidoost
Copy link
Copy Markdown
Collaborator

closes #66
closes #68

🔴 After merging #61, I will update the docs.

@SarahAlidoost SarahAlidoost marked this pull request as ready for review December 17, 2025 15:26
@SarahAlidoost SarahAlidoost changed the title Split leaf and root dynamics Split leaf and root dynamics optimization notebooks Dec 17, 2025
@SarahAlidoost
Copy link
Copy Markdown
Collaborator Author

@michielkallenberg @ronvree @SCiarella @fnattino in this pull request, STE method is used for the sigmoid approximation for the parameter SPAN in leaf_dynamics, see the issue #68. Please let me know what you think about this approach, or any other ideas/suggestions. Thanks!

Copy link
Copy Markdown
Collaborator

@SCiarella SCiarella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @SarahAlidoost, I really prefer this new approach to the sigmoid because the gradient now is much more controlled.

Later, we could even allow the user to set a lower sharpness to have a better gradient for parameter optimization, but right now the value of 1000 that you have chosen looks ok.

I approve this PR for merge 👍

@ronvree
Copy link
Copy Markdown

ronvree commented Dec 18, 2025

I also think this is a really good idea! During the development it's much better to remove the ambiguity of whether some sigmoid approximates the original model well enough. Great suggestion!

Although the soft threshold might in some case have a more realistic biophysical interpretation (as also mentioned by Francesco in issue #68), I feel these discussions should be separated from the initial model development.

If I understand correctly the sharpness should ideally still be set based on the magnitude of the input if we want to guarantee there are no issues during optimization?

Would it make sense to implement the thresholds as nn.Module instances so it's easier to inspect their behavior and see the impact of any potential adjustments?

Thanks @SarahAlidoost! I also approve the PR for merge

@SarahAlidoost
Copy link
Copy Markdown
Collaborator Author

I also think this is a really good idea! During the development it's much better to remove the ambiguity of whether some sigmoid approximates the original model well enough. Great suggestion!

Although the soft threshold might in some case have a more realistic biophysical interpretation (as also mentioned by Francesco in issue #68), I feel these discussions should be separated from the initial model development.

If I understand correctly the sharpness should ideally still be set based on the magnitude of the input if we want to guarantee there are no issues during optimization?

yes, smaller sharpness means larger gradients (DALV is more sensitive to changes in SPAN), and vice versa. But there is a practical challenge. if training is unstable, or gradient vanish, the sharpness should be adjusted accordingly.

Would it make sense to implement the thresholds as nn.Module instances so it's easier to inspect their behavior and see the impact of any potential adjustments?

If I understood your comment correctly, you mean to wrap the threshold in a nn.Module. But this is just a wrapper, still we need a quantization operation; sigmoid or something else. A good way to handle this, I think, would be to expose sharpness as a learnable parameter in leaf_dynamic model. Please see my reply here.

Thanks @SarahAlidoost! I also approve the PR for merge

Thanks for reviewing it 👍

Copy link
Copy Markdown
Collaborator

@fnattino fnattino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @SarahAlidoost! I have commented to this as part of #68. Looks good to me, my main concern was the large value of the sharpness parameter, but as also mentioned in the issue, we can probably get back to what is a reasonable default value for it later on!

@SarahAlidoost
Copy link
Copy Markdown
Collaborator Author

@michielkallenberg when you have time, can you please have a look at this PR and discussion in #68? Thanks!

@michielkallenberg
Copy link
Copy Markdown
Collaborator

Thanks @SarahAlidoost. I did not know this STE trick. Nice.
Just a double check: the zero-gradient that was found here does not result from the external-state-fixing? (An issue we identified before)
In any case I figure STE looks like a good approach to me, also after having read the discussion.

@SarahAlidoost
Copy link
Copy Markdown
Collaborator Author

Thanks @SarahAlidoost. I did not know this STE trick. Nice. Just a double check: the zero-gradient that was found here does not result from the external-state-fixing? (An issue we identified before)

No, that was not the issue here. LAI should have gradient wrt SPAN. The reason was a sharp sigmoid that acts like a step function in the previous changes. This is now fixed with STE method.

In any case I figure STE looks like a good approach to me, also after having read the discussion.

Thanks for reviewing.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jan 9, 2026

@SarahAlidoost SarahAlidoost merged commit 4a25b90 into main Jan 9, 2026
11 checks passed
@SarahAlidoost SarahAlidoost deleted the split_nbs branch January 9, 2026 12:38
@ronvree
Copy link
Copy Markdown

ronvree commented Jan 12, 2026

If I understood your comment correctly, you mean to wrap the threshold in a nn.Module. But this is just a wrapper, still we need a quantization operation; sigmoid or something else. A good way to handle this, I think, would be to expose sharpness as a learnable parameter in leaf_dynamic model. Please see my reply here.

Thanks for your response! What I meant was that this soft threshold pattern will reoccur a lot in diffWofost and understanding its behavior will matter a lot for understanding the model. I agree a nn.Module is in a way just a wrapper and doesn't provide any functional behavior but it makes things more modular and makes refactoring easier. For me it would make sense to implement it that way but of course it's just a suggestion

@SarahAlidoost
Copy link
Copy Markdown
Collaborator Author

If I understood your comment correctly, you mean to wrap the threshold in a nn.Module. But this is just a wrapper, still we need a quantization operation; sigmoid or something else. A good way to handle this, I think, would be to expose sharpness as a learnable parameter in leaf_dynamic model. Please see my reply here.

Thanks for your response! What I meant was that this soft threshold pattern will reoccur a lot in diffWofost and understanding its behavior will matter a lot for understanding the model. I agree a nn.Module is in a way just a wrapper and doesn't provide any functional behavior but it makes things more modular and makes refactoring easier. For me it would make sense to implement it that way but of course it's just a suggestion

thanks for the clarification. I submitted issue #72 for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sigmoid approximation and Straight-Through Estimator: SPAN in leaf_dynamics [Task] split and rerun optimization nb for leaf and root dynamics.

5 participants