[MNG-7429] The classloader containing build extension should be used throughout the build#690
[MNG-7429] The classloader containing build extension should be used throughout the build#690gnodet wants to merge 5 commits intoapache:maven-3.9.xfrom
Conversation
|
This does not supersede e327be3, but complements it? |
|
@laeubi can you check also? |
Yes, I actually don't think the first commit is really relevant or useful with the current fix. I think in most cases, only the top level project has a classloader, which means the previous commit won't actually do anything and the classloader will remain the same. The problem is that I don’t think the behavior is defined for the “scope” of such extensions for reactor builds. My assumption is that they are defined on the top level project and should be available for the whole execution, hence the fix that registers the top level project classloader… a bit earlier in the maven execution. So, I guess the answer should be yes ! |
|
What is a build extension is declared in a submodule only? |
You mean "what if" ? |
Yes, that was my question. Just a typo. So you don't expect any problems with that? |
I do expect problems, but I don't think it really matters because I don't think there's an actual use case for that. |
|
This looks like another issue that is only similar to the other fix, but why should anything be reverted? |
It's not another issue. The e327be3 commit actually breaks build extension. This PR fixes the problem, but the older commit is meaningless once this one is applied, as the classloader is already reverted by this fix (which was the original problem iiuc). |
|
But why/how does it breaks them? As described in the PR this currently leaks the CCL of the last project in the chain, and even if it makes extensions work somehow it is clearly broken so I think both are probably needed... |
See apache/maven-build-cache-extension#8 where the build extension test is broken on 3.9.x. I bisected the problem to e327be3. The reason is that the extension defines a component with a session scope. This component is defined in the project's classloader (because it's a build extension and not a core extension). This means that when the component is loaded, the projects classloader has to be used, else the component will not be seen by plexus. The previous commit reverts the TCCL to the core maven one before actually building the project. This PR aims at broadening the window where the project's classloader is used, and still revert it at the end. Also, about the last project point, I think in most cases, only the top level project has a specific classloader. The behavior before the previous commit was then that the top level project was used to set the TCCL, and the other projects, not having any specific classloader defined (if not defining any build extension), were simply ignored. |
|
@gnodet Does this apply to 3.8.x? I guess we need a new JIRA issue to clearly document this problem/regression. |
|
But this does actually shows that an project classlaoder is leaked. I don't see how reverting this makes the situation better (beside that 'it has worked before') as obviously here we have a gap where a random project classloader is used. |
If it is not the last then "the last one that has a specific classloader", and working of such extensions then was jsut a side-effect and if thats true what you wrote why then all these "attach to thread" for each project? As far as i understand the "top level" is the pom actually executed and assume that that is the only valid one to have extensions seems a bit strange. |
By the way I already noticed that SessionScoped components are actually hard to implement, I used the |
|
Do you refer to https://github.com/apache/maven-build-cache-extension/blob/master/src/main/java/org/apache/maven/buildcache/CacheLifecycleParticipant.java ? Why do you make this a session scoped |
@michael-o It does apply to 3.8.x, and yes, I'll create a JIRA.
It was leaking : the current PR still reverts at the end of the build. And yes, I agree that specifying build extensions at various level of a reactor may lead o unpredictable results. However,
Defining extensions not at the top level was broken, as you said, the last project defining extensions was used for the whole build. So I assume that's actually not a use case and we don't need to fix it. It would be easier to forbid them at the moment if we really want to fix the possible problem.
Actually, the one that cause problems is the BuildCacheMojosExecutionStrategy, and yes, it's session scoped because I'm aiming at integrating this extension in |
I don't think this is enough... as mentioned in the change, there is even this classloader used later on an further leaked as "the original" one as it is passed to another method call.
I think this must be supported without issues, just think about a aggregator build pom that includes different other projects then some might define an extension an other don't. That will lead to failures in this build but not in the individual build. That's really a nightmare to debug and that why I'm a bit strict at those rules to not leaking classloaders. If at some places it is necessary that the project classloader is used it has to be set but reset instantly afterwards!
But it seems you are rely on undefined behavior here then and what you wantt to archive could actually only work with a core-extension. |
This reverts commit d29af90.
My assumption so far was that build extensions defined in the root pom should be available to all modules built within the reactor. This implies that the classloader will not be destroyed before the session ends anyway and beans available if needed. This is broken, so I consider the previous commit a regression. That said, I'm all for improving the way extension works. I've raised #616 a while ago, though this is about core extensions, but I think both have valid use cases. But the use case you mention with aggregators for various projects is valid and would have to be covered, but I think it's broken now (especially with concurrent builds), so I'd rather address such new use cases in a different JIRA/PR. |
|
I've added a commit to revert e327be3 because it's useless. This restores the behavior from 3.8.4 when extensions are defined in the root pom and allows supporting the maven-build-cache-extension use case at apache/maven-build-cache-extension#8. This would break builds that would define extensions in the parent and in a child and expect them to be available later in the build. If we want to completely restore the 3.8.4 behavior, I can remove line https://github.com/apache/maven/pull/690/files#diff-7698873d65eece16fdf2ba67293827f623994a3affa509b814eaf33abb4537daR84 so that the last project with extensions wins. I agree this is not perfect and we should better support defining build extensions throughout the various projects in the reactor. Such use cases are clearly not well covered, especially inheritance between parent/child projects, ordering, concurrency, etc... @michael-o @laeubi I'm willing to write an IT, but I can't write until we agree on the behavior... |
It is not useless but prevents a classloader leak as you have proven.
From the code it is completely irrelevant where you define your extension as long as you only define exactly one and this is simply a wrong assumption.
I don't think it is good to make such behavior part of the covered assumptions. Whether or not an extension is defined in the root or not should be completely opaque and all code that works with extensions seems to not assume that but that project defined defined extensions are project scoped (taken the usual merging rules into account).
I think instead of reverting / restoring wrong behavior it would be much more profitable to find out whats the root cause for this. For example a stack trace where a CNF exception is triggered and maybe then we see that there is actually a
Just from reading the code and debugging the maven internals, a session-scoped component was never meant to be provided by a project-scoped extension and that it worked was just a side-effect but I could be wrong. At laest the Maven-CLI has distinct Classlaodersetups on the cases where a project scoped extension has to be accessed. |
|
Just one thing I'm curious about: Why is the mavencache not used as a core-extension? I would suspect that this will give much more powerful and stable integration than just integrate it on the project level. |
Does this actually has to be session scoped at all? As the session is passed in is it really session scoped? Is the problem in the class itself or is the class not discovered? Because |
|
Just another observation, it seems |
|
Just for references as I recently added support for project-scoped WorkspaceReaders, if you like to support such components one needs to add explicit support for them similar to: maven/maven-core/src/main/java/org/apache/maven/DefaultMaven.java Lines 371 to 379 in 97c1e4b maven/maven-core/src/main/java/org/apache/maven/DefaultMaven.java Lines 507 to 542 in 97c1e4b as you see, the calling code should also be aware that there are multiple items and has to handle this (combine, call in a row, throw exception ...). |
I tend to think that the current design is somewhat broken if whenever you need to lookup a component, you need to go through a complex setup in order to find where to load the component from.
Thx, I did set up the tests for the |
You don't need it "whenever you lookup" you only need it for the specific cases of where you have a project-scoped extension and that is required to isolate the class-realms to not leak classes from one project into another.
Sure why not, but then your extension would still not work in a session-scoped nor a singelton created in "root scope"... So your usage has only worked due to a bug (as far as I can tell from the provided information here) and I don't see how it could work with other ways of class-loader isolation.
So I'm really confused what is the issue then? As from the code, your problematic component is to be used in a singleton injected and called, how is it (and why) supposed to work if it is defined in a project scope? |
|
For the record, the lookup of the The problem here is a visibility problem across realms. Even with the lookup fixed, the code without the current PR will still have no visibility on the required bean which is registered as an extension on the project which is being built. When the simple lookup is done (i.e. a Looking at the code leads me to think that the fix from e327be3 is wrong, because it removes the visibility to the build extensions. I back this point by the fact that the code now between the One thing I don't really get in what you say : why do you consider that this PR does not fix the classloader leak ? The classloader is set to the extension class loader if one is provided and back to its original value after the build, so that looks ok to me... |
Alright.
If you want to have "project-visibility" when fetching the
I agree about the first two but not the last one see
I also noticed that, but please also note the TODO (that was there before) that the author of that code (haven't diged in here) is sure if it is actually necessary, so probably it would be better to not call
The purpose is to set the Thread loader to the one of the project while calling
It is set back after the build, but while the build it leaks a (random) classloader and makes the project realm accessible to a number of (random) other items as you have proven with your extension. Please also take a look at where it is called in line 105 we have the "real" context-classloader (also note that the context clasloader might be different from the classlaoder used to actually load the current class!) in line 122 the the current CCL (what is now is a random leaked from the previous call) is fetched and further passed to another component as the just assume your extension is there and works, now I add another extension (e.g. a wagon-provider) and your extension suddenly fails with CNF or CCE (thats what my fix reveals without another extension being used), would you really desire/expect this? |
Also just for the record: You won't have noticed this bug without the fix applied in e327be3 and that's why I'm so eager to fix bad usage of CCL/leakage as these hides hard to track problems you can spend hours of debugging and banging your head on the wall, even though most of the time people don't feel its a problem or even don't think its worth a change see https://issues.apache.org/jira/browse/CAMEL-10456 for another nasty problem in that category. |
|
@laeubi I've pushed another commit to discuss / experiment with. There are only two remaining calls to
|
Looks like a good idea.
I'm not sure why this should be the case... The "top level project" is not meant to be something special when it comes to classloading. Just assume the top-level is just an aggregator that includes all projects as modules (with the modules using individual parents) why should I want it to be used as the CCL root? |
I would assume some kind of implicit inheritance wrt to extensions. Another way would be to create a realm that would have visibility on all extensions of the build and use that one for the whole build duration. This would get rid of the |
…o the session scope
I don't think all extensions should share the same class-space, especially with projects-scoped extension (in contrast to core-extension) they started to become only active after a while, there is a bootstrap phase in maven when there is actually no such thing like "project", also as explained before you most probably don't want that an extension magically becomes active just because another module defines it. If you like that an extension applies for all projects then it should be defined as a core-extension in Thats also the reason why only core-extensions can participate in session-start because at that point there are no projects and thus any project-scoped extension is simply not accessible. |
Isn't that exactly the opposite of what is done for lifecycle participants and workspace readers at boot time ? Those are loaded from all registered extensions for all projects and are active for the whole build. |
But each of those are loaded from their own classloader (there is no shared one) if they are defined at the project level. If they are defined as core-extension they are all loaded from the "maven-core-realm", that's why I said if you want to be on the "global-scope" (thus all projects are sharing the same instance) you should use a maven-core extension. The whole caching-story for me sounds as if it is more suitable as a core-extension, for example tycho is defined on a per project level, but we check as the very first step that all projects in the reactor are using the same tycho version and fail the build if not. Neverless my goal is to make tycho a pure core-extension because project-scoped is much more limited and requires special care. |
But I'm not talking about changing the class loaders, just the visibility for beans with plexus lookup. I don't see why
Core extensions are not loaded if you're aggregating multiple projects as you hinted earlier, so I guess it really depends on the use case. I don't really any good reason to not try to support both. I've added a commit which explains my thoughts : a single ClassRealm is created with all extensions (and not only the ones from top level or the last project). This realm is not used to load classes from, it's only used for lookups. It's created as early as possible after the projects are created and used as the context classloader. It removes the need for explicit lookup in those realms for workspace readers and lifecycle participants. |
|
I'm not sure if I can give more input on this, but only quote this
So for your case, if you really thing your extension should be defined on the project level, then for me it makes totally sense that this is only available in the projects where it is defined (either explicit or implicit e.g. though parent reference) and then your extension would not be looked up globally. If you want ti to act globally independent of defined in a project and you want to have much more control I think it should be a core-extension even if others not so involved in the details think "it might be useful" to define it on the project level. @michael-o maybe you want to take over here or suggest others more familiar with how maven is supposed to work here. |
Well, I'd like to be sure about that. All the use cases I've seen so far rather indicate extensions are global to the build. The primary use case was for wagon providers afaik and those are global. Also given, they were currently mostly global (i.e. the extensions classloader is set until another project also define extensions), I'd rather assume users expects them to be global. I'd rather go for a behavior than suits most use cases so far (lifecycle, workspace readers, wagon providers, build cache), rather than changing the behavior and have to rely on specific hacks for each kind of extension, it kinda defeat the purpose. Do you have any actual use case of build extensions that should have a limited scope (which again, is not how they were working)... ?
|
Honestly, you both are much deeper in this issue, I very little clue about CL hierarchies. I will leave the decision upto to you both. |
Which ones? Just for the sake of leaving the field of theoretical discussions I have created an practical example that demonstrate that this is not the case, this contains:
There are three combinations one can build those (as demonstrated here):
This clearly demonstrate that extensions are not global to the build, and I can't imagine that "most users" would expect that the The same applies to the caching extension as well, if I enable it for a one project in an aggregator I do not expect it to be active for all.
I don't think users expect that we are leaking any item from one project to another if not explicitly decalred.
As mentioned before it depends on the case and there are fundamental difference in a caller of an extension decides that all project lifecycle listeners should be called, versus all project lifecycle are globally available (What they are not! You can't look them all up in a Mojo even though all are called in the setup phase). Funny enough you have proven by your change that this instantly causes dreaded CCE:
in integration tests, and just as of today I'm again facing a similar issue here for Tycho plugin, so I still think that classloader separation is more important than any vague user assumption. But I don't see how we can proceed here, as a matter of fact neither lifecycle, workspace readers nor wagon providers seem to have any issue with that but only cache-extension and now you try to propose that all of those are actually wrong and cache-extension is the one to rule that out... |
|
I've tried a few things with your example and that led me to understand something I had missed. If you define the following in the root pom: then the build works correctly, which is exactly what I assumed would be working, but could not understand when reading the code and your comments. However I wrongly assumed the extension was using some kind of global scope mechanism, similar to the way workspace readers and lifecycle participants are scanned and registered. The mechanism is completely different. In short, once I fix the bad lookup in maven, it seems the problem I had with the build cache extension does not occur anymore. Btw, if you could have a look at #692 and #693, that would be nice ! I'll rework my PR to not use the overall classloader I've introduced as a global one. And given it won't be a bug fix anymore, I'll target it for |
This works because the aggregator is an implicit parent. |
|
Resolve #9031 |
|
Resolve #9167 |
|
Resolve #9031 |
This fixes problems when using build extensions.