Skip to content

Refine performance (2nd)#1492

Closed
XenoAmess wants to merge 6 commits intoapache:masterfrom
xenoamess-fork:refine-performance
Closed

Refine performance (2nd)#1492
XenoAmess wants to merge 6 commits intoapache:masterfrom
xenoamess-fork:refine-performance

Conversation

@XenoAmess
Copy link
Copy Markdown

@XenoAmess XenoAmess commented Jun 14, 2025

refine performance for demo at apache/maven#2481

including 3 parts:

  1. fix the bug about using MMap when value be collection, but the old codes didn't copy the collection, thus may cause appending to a list value in a DoneMMap, thus makes infinity long value, thus makes the run slow in super multi module project.
  2. BfDependencyCollector can share executorService between resolvers, to reduce the create/close/create/close of the executorService, which is expensive.
  3. intern some strings to make use of the new hash cache in String class in jdk.

Following this checklist to help us incorporate your
contribution quickly and easily:

  • Your pull request should address just one issue, without pulling in other changes.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Each commit in the pull request should have a meaningful subject line and body.
    Note that commits might be squashed by a maintainer on merge.
  • Write unit tests that match behavioral changes, where the tests fail if the changes to the runtime are not applied.
    This may not always be possible but is a best-practice.
  • Run mvn verify to make sure basic checks pass.
    A more thorough check will be performed on your pull request automatically.
  • You have run the integration tests successfully (mvn -Prun-its verify).

If your pull request is about ~20 lines of code you don't need to sign an
Individual Contributor License Agreement if you are unsure
please ask on the developers list.

To make clear that you license your contribution under
the Apache License Version 2.0, January 2004
you have to acknowledge this by using the following check-box.

@XenoAmess XenoAmess changed the title Refine performance Refine performance (2nd) Jun 14, 2025
@XenoAmess XenoAmess force-pushed the refine-performance branch from 51bcd9b to 7328df7 Compare June 14, 2025 19:58
fix the bug about using MMap when value be collection, but the old codes didn't copy the collection, thus may cause appending to a list value in a DoneMMap, thus makes infinity long value, thus makes the run slow in super multi module project.
@XenoAmess XenoAmess force-pushed the refine-performance branch from 7328df7 to f295e15 Compare June 14, 2025 19:59
@XenoAmess
Copy link
Copy Markdown
Author

@cstamas please have a look.
the most severe part is the first part.(MMap part)

* @since 1.9.0
*/
public static final int DEFAULT_THREADS = 5;
public static final int DEFAULT_THREADS = Runtime.getRuntime().availableProcessors();
Copy link
Copy Markdown
Member

@cstamas cstamas Jun 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1: we can run on monsters as well. This is not an "improvement" on alg, just "throwing more" on it. In other words, doing this may hit "remote end" as well, think of https://en.wikipedia.org/wiki/Tragedy_of_the_commons like Central is.

Copy link
Copy Markdown
Author

@XenoAmess XenoAmess Jun 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cstamas we increase the number because we try to merge the executers. see changes in several lines under that in the same file.
before, each instance have one seperated executor,and thus have 2 problems

  1. when running with like T2C, I afraid there might be very many of instance in parallel, thus many small executors in parallel
  2. the start and stop of the small executors be time costing.(especially adding worker threads)

Still,if you don't like mixing idea for merging the executors of every instance, we can remove this part. I just don't think it good to start many small thread executors, and I still think a bigger single executor is better in this case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think we should use forkjoin pool by default, it is alsmot certainly created by a build these days and will avoid the overhead of custom pool with almost no benefit

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think we should use forkjoin pool by default, it is alsmot certainly created by a build these days and will avoid the overhead of custom pool with almost no benefit

yes it would be far better than using many small pools.
But as for forkjoin pool, I have no idea why we should use forkjoin pool but not executer here. IMO fork join pool is use for works who have relationship/dependency, and the tasks here seems not quite dependent on each other..?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://issues.apache.org/jira/browse/MRESOLVER-283

Oh, I see #213 and the related comments...
Interesting.
Good, now I kind of getting interested in the fork-join-pool solution now.

@cstamas Is there any news since the original executor pr closed? should I start another pr(and split this from this pr) for that?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, nothing new since. It was just a faint idea, and did not move anywhere yet.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, nothing new since. It was just a faint idea, and did not move anywhere yet.

Um. Seems it is good enough a thing that worth my time. Would have a try when next weekends.
I agree on split it out of this pr, it is a bigger thing than my original thought.
Though I do agree it seems a better way than I provided in this pr.
Please focus on reviewing other part of this pr then.

@gnodet
Copy link
Copy Markdown
Contributor

gnodet commented Jun 18, 2025

I think the intern logic (apart from the context in trace requests) would be better placed inside Maven when parsing the POM files.
It should be fairly simple, as it's just a matter of passing a ContentTransformer as an argument to this call.
We should intern all enums (scopes, packaging) and at least groupId, artifactId, version. The models are kept in memory throughout the whole build anyway, so it makes sense to intern quite a few values.

@gnodet
Copy link
Copy Markdown
Contributor

gnodet commented Jun 18, 2025

I think the intern logic (apart from the context in trace requests) would be better placed inside Maven when parsing the POM files. It should be fairly simple, as it's just a matter of passing a ContentTransformer as an argument to this call. We should intern all enums (scopes, packaging) and at least groupId, artifactId, version. The models are kept in memory throughout the whole build anyway, so it makes sense to intern quite a few values.

I've raised apache/maven#2495. I was quite sure the work was done already, but after investigation, it had not been plugged in.

@XenoAmess
Copy link
Copy Markdown
Author

XenoAmess commented Jun 18, 2025

I think the intern logic (apart from the context in trace requests) would be better placed inside Maven when parsing the POM files. It should be fairly simple, as it's just a matter of passing a ContentTransformer as an argument to this call. We should intern all enums (scopes, packaging) and at least groupId, artifactId, version. The models are kept in memory throughout the whole build anyway, so it makes sense to intern quite a few values.

yes, I agree.
interning is more important in mordern jdk versions because they add a cache for the hash of the string, thus a same string object calling hash methods for several time would be far faster than different objects.
(and we use hashmap based maps quite a lot)

I've raised apache/maven#2495. I was quite sure the work was done already, but after investigation, it had not been plugged in.

glad to see it. so I will remove this part from this pr too.

@cstamas
Copy link
Copy Markdown
Member

cstamas commented Jun 25, 2025

@XenoAmess refined it in #1508 please take a look

cstamas added a commit that referenced this pull request Jun 25, 2025
Reusing #1492 but simplifying it: copy only the collection that is about to be modified, leave other unchanged (less copy less mem, less cycles).
@cstamas
Copy link
Copy Markdown
Member

cstamas commented Jun 25, 2025

Superseded by #1508

@cstamas cstamas closed this Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants