Renamed 'Generic Column' -> 'Numeric Column'; Fixed a few resource leaks in processing; misc refinements#5957
Conversation
|
@nishantmonu51 FYI I've tried to fix #5956 opportunistically in this PR, but a lot of tests started to fail. So I reverted. |
| private Object[] arrayToObjectArray(Object array) | ||
| { | ||
| final Object[] objectArray = new Object[Array.getLength(array)]; | ||
| for (int j = 0; j < Array.getLength(array); j++) { |
There was a problem hiding this comment.
Maybe this line of code can become this way?
int len = Array.getLength(array);
final Object[] objectArray = new Object[len];
for (int j = 0; j < len; j++) {...}Cuz, for (int j = 0; j < Array.getLength(array); j++) {...} this code style will call Array.getLength method for each for loop.
There was a problem hiding this comment.
Thanks, extracted a variable.
| final ByteBuffer buf = ByteBuffer.allocateDirect(bytes.length).put(bytes); | ||
| buf.rewind(); | ||
| return new ImmutableConciseSet(buf); | ||
| return new ImmutableConciseSet(buf.asIntBuffer()); |
There was a problem hiding this comment.
Is asIntBuffer necessary? If the ImmutableConciseSet(ByteBuffer byteBuffer) constructor is called, it will also call the asIntBuffer method internally..
There was a problem hiding this comment.
I removed one of the constructors, don't remember for what reason, maybe to reduce ambiguity. I don't see that one version is better than another.
| } | ||
|
|
||
| ByteBuffer buffer = ByteBuffer.wrap(new byte[calcNumBytes(rTree)]); | ||
| ByteBuffer buffer = ByteBuffer.allocate(calcNumBytes(rTree)); |
There was a problem hiding this comment.
Maybe using ByteBuffer.allocateDirect will get better performance.
ByteBuffer buffer = ByteBuffer.allocateDirect(calcNumBytes(rTree));There was a problem hiding this comment.
Direct ByteBuffers are not the thing that "magically makes everything faster". There are two main reasons to use direct ByteBuffers:
-
Reduce the size of Java Heap, (ordinary BBs are backed with
byte[]arrays which are in the heap), to reduce GC pause times. Yes, even heap populated with arrays of primitives could affect GC pause times negatively. However, this comes with many downsides:- Native allocations are slower
- Native allocations contribute to potentially unrecoverable fragmentation. On the contrary, Java Heap fragmentation is cleaned up, sooner or later.
- You need ensure timely freeing of direct BBs, preferably with try-with-resources, not relying on reference cleaning
-
Avoid extra data copies when going native, particularly when a ByteBuffer participate network or disk IO calls (ByteChannel.read() / write())
Neither of this directly applies to ImmutableRTree.
|
@nishantmonu51 we can (however it will take several more weeks), but you could review this PR in parallel? |
It is extensible when someone writes a custom aggreagtor and extension wants to take control of full column serialization instead default behavior where extension writer only controls serialization of individual row values by providing a ObjectStrategy. |
|
@himanshug thanks. It should be said in |
|
@nishantmonu51 @gianm could we finish off this PR? Given it effectively has three approvals for three weeks, and still not merged? |
|
I haven't had a chance to review the entire thing yet, beyond the partial review I did in #5957 (review). Please don't consider me as blocking this PR, just not yet having the opportunity to review it in full. By way of explanation as to why: I can't speak for everyone, but personally I am prioritizing my time towards reviewing PRs that make functional changes (which we are getting a lot of!) vs. ones that are refactorings/cleanups. Doesn't mean I'll never get to it, but hopefully that explains why I haven't yet. I understand this is a dev blocker for you, so I guess all I can say to that is, I would suggest avoiding making functional changes dependent on large nonfunctional ones. |
|
I think this PR should be looked at in 0.13.1 |
c0ce675 to
d02b6b6
Compare
|
I think the PR should be broken into multiple PRs to make it much easier to review. The impact of breaking changes in such a significant PR impacting so many files is high. Furthermore, I'd like to see each separate PR comprehensively describe what was changed, and what the benefit was. I'm moving the milestone to 0.13.1 for now so give us enough time to properly review this PR. |
|
I'm not going to break up this PR - it's inadequately high price and not really needed for a PR that already has three approvals. I was asking @nishantmonu51 to approve extra changes (that are quite big, but consisting exclusively of mechanical import reorder so not changing any behaviour) because this is "nice" but I think not formally required. So since it turns out that he doesn't have time for this, I think this PR could be merged already. |
|
Is this still "Incompatible"? |
|
@gianm no, since the reversion of the rename of |
gianm
left a comment
There was a problem hiding this comment.
The changes here look fine to me. I think we can merge as soon as the conflicts are resolved.
But in general I think this PR would have been better as a few separate ones; the issue I experienced was not in terms of number of lines, but in terms of the variety of unrelated changes. One specific thing that would have helped is moving the trivial changes into their own PRs: one for import reordering and one for the cleanup of various comparators and lambdas. When trivial changes are mixed in to a larger PR, it is time-consuming to identify the substantive changes (like renaming of Column to ColumnHolder, and collapsing of that interface's methods).
I also think it's best to think twice before making large amounts of trivial changes. There is a cost to making any change (time for reviewers to read the changes, time for authors to resolve conflicts with PRs that generate large amounts of conflicts, and the fact that even trivial changes can introduce bugs or unintentional behavior changes). I think it would have been better to do the imports, comparator, and lambda cleanups gradually, as those files were edited for other reasons.
|
The test failures look legit (although are not a big deal) and are related to the renaming of HLLCV1. I don't think it's important that the value stay the same, so I think updating the tests is fine. |
|
@gianm thanks a lot for review and the suggestions - I'll account in future PRs. |
This PR accumulates many refactorings and small improvements that I did while preparing the next change set of https://github.com/druid-io/druid/projects/2. I finally decided to make them a separate PR to minimize the volume of the main PR.
Some of the changes:
ComplexMetricExtractor