Skip to content

[MRESOLVER-7] Download dependency POMs in parallel#2

Closed
hwellmann wants to merge 78 commits intoapache:masterfrom
hwellmann:MRESOLVER-7
Closed

[MRESOLVER-7] Download dependency POMs in parallel#2
hwellmann wants to merge 78 commits intoapache:masterfrom
hwellmann:MRESOLVER-7

Conversation

@hwellmann
Copy link
Copy Markdown
Contributor

No description provided.

bentmann and others added 30 commits May 18, 2014 19:54
… a case-insensitive manner

Fixed implementation to use case-insensitive matching for non-proxy hosts
…an already splitted collection of non-proxy hosts

Introduced add(Proxy, Collection<String>) to DefaultProxySelector
… already splitted ids/types

Introduced add(String, String, String, boolean, Collection<String>, Collection<String>) to DefaultMirrorSelector
Added configuration property aether.connector.http.webDav[.repoId] to forcefully disable WebDAV mode during uploads
Automatically disabled WebDAV mode when detecting a repository manager
…roxy authentication

Stored server and proxy credentials in separate providers and used new DemuxCredentialsProvider to delegate to one of those providers depending on host in auth request
@michael-o
Copy link
Copy Markdown
Member

This commit is massive. I have serious problems understand what you have actually changed because I have never worked on Aether before. There are some noise changes like reformatting, externalizing. etc. Can you move them into a separate commit? It would make it easier for me to understand your real changes. Can you also provide some explanation why so much change is necessary? Did you already run your changes through Core IT Suite?

@hwellmann
Copy link
Copy Markdown
Contributor Author

Yes, I would have liked it to be smaller, but it's difficult to go back now. The original patch is almost a year old, and I just tried to rework everything to include the changes from master. This was a manual process and not a merge.

The basic idea of my change is quite straightforward:

  • Split loading and processing dependency POMs into two separate steps.
  • Load the POMs of the direct dependencies asynchronously and in parallel.
  • Wait for these POMs to be loaded and then process them.

This is implemented here:
53ef352#diff-e8cd0dcf95330cc5029d1cc020c4fc27R378

Everything else is just auxiliary refactoring to make this possible.

DefaultDependencyCollector was just so huge and with so many code smells that I had to move things out of the way to really understand what was going on in the class.

I replaced long parameter lists by context objects and copied these objects rather than overwriting them to make the parallel recursion work.

I used the official Eclipse formatter, but it seems it wasn't consistently applied in the past, so this explains some extra whitespace changes.

Regarding the integration test suite, I did run it successfully with my original patch last year. At the moment, it seems to be broken even without my changes, or maybe I didn't invoke it in the correct way. I couldn't find any instructions except https://github.com/apache/maven-integration-testing/blob/master/run-its.sh which doesn't really look up to date.

Maybe we can find more reviewers on the maven-dev list? When I posted the original patch, I had a private message from Jason saying the PR looked ok and he'd like to have it included after the aether-to-apache move, which has now taken so long, unfortunately...

@michael-o
Copy link
Copy Markdown
Member

michael-o commented Nov 7, 2016

Virtually everything you wrote sounds reasonable to me. I still would like someone else to review it additionally as I am not really firm with the code. Please raise on the dev mailing list.

@michael-o
Copy link
Copy Markdown
Member

Don't forget about the mailing list review if you want this to be merged.

final CollectRequest request;


public Args( RepositorySystemSession session, RequestTrace trace, DataPool pool, NodeStack nodes,
Copy link
Copy Markdown
Contributor

@ChristianSchulte ChristianSchulte Dec 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, that constructor should be package private as well, IMHO. The other package private classes, as well.


private VersionRangeResolver versionRangeResolver;

private ExecutorService executor = Executors.newFixedThreadPool( 5 );
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want this to scale so limiting this to 5 should not be done. In Java 8 there is a commonPool sized by the number of available processors. Is there a way to auto size that based on some criteria? Number of available processor? Bandwidth available?

Copy link
Copy Markdown
Contributor

@ChristianSchulte ChristianSchulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you been using this yourself locally for some time? Does it still speed things up if the local repository would implement some kind of locking to protect against concurrent modifications?

DependencyTraverser depTraverser, VersionFilter verFilter, Dependency dependency )
private Future<DependencyContext> asyncProcessDependency( final DependencyContext dc )
{
return executor.submit( new Callable<DependencyContext>()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this can be done? When resolving a child node, that node can change the repositories in use. If all child nodes will be processed asynchronously, that may not work anymore.


rangeResult = cachedResolveRangeResult( rangeRequest, args.pool, args.session );
VersionRangeRequest rangeRequest = createVersionRangeRequest( args, context.getRepositories(), dependency );
VersionRangeResult rangeResult = cachedResolveRangeResult( rangeRequest, args.pool, args.session );
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this will download files to the local repository, this cannot be done asynchronously as the local repository is not thread-safe.

dc.originalArtifact = originalArtifact;
dc.managedDependency = d;
dc.futureDescriptorResult =
getArtifactDescriptorResult( args, results, noDescriptor, d, descriptorRequest );
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well. It will modify the local repository asynchronously.

args.pool.toKey( d.getArtifact(), childRepos, childSelector, childManager, childTraverser, childFilter );
Object key = args.pool.toKey( d.getArtifact(), childContext );

List<DependencyNode> children = args.pool.getChildren( key );
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work when fetched asynchronously. It's an implementation side effect I do not like myself. Child node lists are cached and re-used - no matter what parent node has lead to the resolution of the child node list. It's not easy to remove that caching. It is used for processing cycles as well.

@ChristianSchulte
Copy link
Copy Markdown
Contributor

ChristianSchulte commented Dec 12, 2016

Would it be possible to split this into multiple smaller commits so that reviewing becomes easier? Changing behaviour of the 'DefaultDepencencyCollector' by accident is a recipe for disaster, for example. You have extracted code to various methods. Maybe a commit for each new method? So that the last commit is about parallelization only and not about refactoring code? I would not mind having to review 20+ commits for this when the first 19 commits do not change any behaviour but only are about preparing/refactoring code for parallelization.

I am referring to what you wrote starting with:

DefaultDependencyCollector was just so huge and with so many code smells that I had to move things out of the way to really understand what was going on in the class.

Getting all those smaller changes merged without changing any behaviour would make things so much easier.

@hapylestat
Copy link
Copy Markdown

does this have chance to be completed? maven still works as piece of shit with downloading dependencies

@michael-o
Copy link
Copy Markdown
Member

does this have chance to be completed? maven still works as piece of shit with downloading dependencies

If someone rebases on top of master, yes. This won't speed up artifact download, POMs only.

@mosabua
Copy link
Copy Markdown
Member

mosabua commented Nov 24, 2017

While this is a nice feature to have I do not think we can reasonably merge this PR as it stands and I suggest to close it. I dont think anyone can tell what actually changes and hence the risk of applying this is way too high.

@slachiewicz
Copy link
Copy Markdown
Member

I have prepared new version - please check.

@krichter722
Copy link
Copy Markdown

While this is a nice feature to have I do not think we can reasonably merge this PR as it stands and I suggest to close it. I dont think anyone can tell what actually changes and hence the risk of applying this is way too high.

@mosabua That sounds odd. Shouldn't you always quickly know whether a change works or not by applying the software's unit and integration tests on it?

@mosabua
Copy link
Copy Markdown
Member

mosabua commented Jan 14, 2018

@krichter722 while it sounds odd it is the reality. Test coverage within the library is just not big enough to absorb such a large changeset and even beyond that nobody can read it and understand what was changed. To really test resolver you have to also build it and then use it in a custom build of Maven itself with the new resolver and run the Maven IT test. Thats a largish undertaking and it is just easier when changes are more contained. So yes.. in an ideal world you are right, in reality things are a bit more complicated.

@michael-o
Copy link
Copy Markdown
Member

To really test resolver you have to also build it and then use it in a custom build of Maven itself with the new resolver and run the Maven IT test. Thats a largish undertaking and it is just easier when changes are more contained.

This is actually what I do...with PRs

@gdubicki
Copy link
Copy Markdown

I propose to close this PR in favor of #10 .

@michael-o
Copy link
Copy Markdown
Member

@gdubicki I will close this implicitly with the merge into master to maintain the linked information.

@michael-o
Copy link
Copy Markdown
Member

Closing this, ti has been superseded.

@jira-importer
Copy link
Copy Markdown

Resolve #817

@jira-importer
Copy link
Copy Markdown

Resolve #892

@jira-importer
Copy link
Copy Markdown

Resolve #817

@jira-importer
Copy link
Copy Markdown

Resolve #892

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.