[MRESOLVER-7] Download dependency POMs in parallel#2
[MRESOLVER-7] Download dependency POMs in parallel#2hwellmann wants to merge 78 commits intoapache:masterfrom
Conversation
… a case-insensitive manner Fixed implementation to use case-insensitive matching for non-proxy hosts
…an already splitted collection of non-proxy hosts Introduced add(Proxy, Collection<String>) to DefaultProxySelector
… already splitted ids/types Introduced add(String, String, String, boolean, Collection<String>, Collection<String>) to DefaultMirrorSelector
Added configuration property aether.connector.http.webDav[.repoId] to forcefully disable WebDAV mode during uploads
Automatically disabled WebDAV mode when detecting a repository manager
…roxy authentication Stored server and proxy credentials in separate providers and used new DemuxCredentialsProvider to delegate to one of those providers depending on host in auth request
…ers as valid values
Dependencies are now processed asynchronously by an Exexutor
|
This commit is massive. I have serious problems understand what you have actually changed because I have never worked on Aether before. There are some noise changes like reformatting, externalizing. etc. Can you move them into a separate commit? It would make it easier for me to understand your real changes. Can you also provide some explanation why so much change is necessary? Did you already run your changes through Core IT Suite? |
|
Yes, I would have liked it to be smaller, but it's difficult to go back now. The original patch is almost a year old, and I just tried to rework everything to include the changes from master. This was a manual process and not a merge. The basic idea of my change is quite straightforward:
This is implemented here: Everything else is just auxiliary refactoring to make this possible.
I replaced long parameter lists by context objects and copied these objects rather than overwriting them to make the parallel recursion work. I used the official Eclipse formatter, but it seems it wasn't consistently applied in the past, so this explains some extra whitespace changes. Regarding the integration test suite, I did run it successfully with my original patch last year. At the moment, it seems to be broken even without my changes, or maybe I didn't invoke it in the correct way. I couldn't find any instructions except https://github.com/apache/maven-integration-testing/blob/master/run-its.sh which doesn't really look up to date. Maybe we can find more reviewers on the maven-dev list? When I posted the original patch, I had a private message from Jason saying the PR looked ok and he'd like to have it included after the aether-to-apache move, which has now taken so long, unfortunately... |
|
Virtually everything you wrote sounds reasonable to me. I still would like someone else to review it additionally as I am not really firm with the code. Please raise on the dev mailing list. |
|
Don't forget about the mailing list review if you want this to be merged. |
| final CollectRequest request; | ||
|
|
||
|
|
||
| public Args( RepositorySystemSession session, RequestTrace trace, DataPool pool, NodeStack nodes, |
There was a problem hiding this comment.
For consistency, that constructor should be package private as well, IMHO. The other package private classes, as well.
|
|
||
| private VersionRangeResolver versionRangeResolver; | ||
|
|
||
| private ExecutorService executor = Executors.newFixedThreadPool( 5 ); |
There was a problem hiding this comment.
You want this to scale so limiting this to 5 should not be done. In Java 8 there is a commonPool sized by the number of available processors. Is there a way to auto size that based on some criteria? Number of available processor? Bandwidth available?
ChristianSchulte
left a comment
There was a problem hiding this comment.
Have you been using this yourself locally for some time? Does it still speed things up if the local repository would implement some kind of locking to protect against concurrent modifications?
| DependencyTraverser depTraverser, VersionFilter verFilter, Dependency dependency ) | ||
| private Future<DependencyContext> asyncProcessDependency( final DependencyContext dc ) | ||
| { | ||
| return executor.submit( new Callable<DependencyContext>() |
There was a problem hiding this comment.
Are you sure this can be done? When resolving a child node, that node can change the repositories in use. If all child nodes will be processed asynchronously, that may not work anymore.
|
|
||
| rangeResult = cachedResolveRangeResult( rangeRequest, args.pool, args.session ); | ||
| VersionRangeRequest rangeRequest = createVersionRangeRequest( args, context.getRepositories(), dependency ); | ||
| VersionRangeResult rangeResult = cachedResolveRangeResult( rangeRequest, args.pool, args.session ); |
There was a problem hiding this comment.
If this will download files to the local repository, this cannot be done asynchronously as the local repository is not thread-safe.
| dc.originalArtifact = originalArtifact; | ||
| dc.managedDependency = d; | ||
| dc.futureDescriptorResult = | ||
| getArtifactDescriptorResult( args, results, noDescriptor, d, descriptorRequest ); |
There was a problem hiding this comment.
Here as well. It will modify the local repository asynchronously.
| args.pool.toKey( d.getArtifact(), childRepos, childSelector, childManager, childTraverser, childFilter ); | ||
| Object key = args.pool.toKey( d.getArtifact(), childContext ); | ||
|
|
||
| List<DependencyNode> children = args.pool.getChildren( key ); |
There was a problem hiding this comment.
This will not work when fetched asynchronously. It's an implementation side effect I do not like myself. Child node lists are cached and re-used - no matter what parent node has lead to the resolution of the child node list. It's not easy to remove that caching. It is used for processing cycles as well.
|
Would it be possible to split this into multiple smaller commits so that reviewing becomes easier? Changing behaviour of the 'DefaultDepencencyCollector' by accident is a recipe for disaster, for example. You have extracted code to various methods. Maybe a commit for each new method? So that the last commit is about parallelization only and not about refactoring code? I would not mind having to review 20+ commits for this when the first 19 commits do not change any behaviour but only are about preparing/refactoring code for parallelization. I am referring to what you wrote starting with:
Getting all those smaller changes merged without changing any behaviour would make things so much easier. |
|
does this have chance to be completed? maven still works as piece of shit with downloading dependencies |
If someone rebases on top of master, yes. This won't speed up artifact download, POMs only. |
|
While this is a nice feature to have I do not think we can reasonably merge this PR as it stands and I suggest to close it. I dont think anyone can tell what actually changes and hence the risk of applying this is way too high. |
|
I have prepared new version - please check. |
@mosabua That sounds odd. Shouldn't you always quickly know whether a change works or not by applying the software's unit and integration tests on it? |
|
@krichter722 while it sounds odd it is the reality. Test coverage within the library is just not big enough to absorb such a large changeset and even beyond that nobody can read it and understand what was changed. To really test resolver you have to also build it and then use it in a custom build of Maven itself with the new resolver and run the Maven IT test. Thats a largish undertaking and it is just easier when changes are more contained. So yes.. in an ideal world you are right, in reality things are a bit more complicated. |
This is actually what I do...with PRs |
|
I propose to close this PR in favor of #10 . |
|
@gdubicki I will close this implicitly with the merge into master to maintain the linked information. |
|
Closing this, ti has been superseded. |
|
Resolve #817 |
|
Resolve #892 |
|
Resolve #817 |
|
Resolve #892 |
No description provided.