Skip to content

Renamed 'Generic Column' -> 'Numeric Column'; Fixed a few resource leaks in processing; misc refinements#5957

Merged
leventov merged 46 commits intoapache:masterfrom
metamx:various-changes
Oct 2, 2018
Merged

Renamed 'Generic Column' -> 'Numeric Column'; Fixed a few resource leaks in processing; misc refinements#5957
leventov merged 46 commits intoapache:masterfrom
metamx:various-changes

Conversation

@leventov
Copy link
Copy Markdown
Member

@leventov leventov commented Jul 6, 2018

This PR accumulates many refactorings and small improvements that I did while preparing the next change set of https://github.com/druid-io/druid/projects/2. I finally decided to make them a separate PR to minimize the volume of the main PR.

Some of the changes:

  • Renamed confusing "Generic Column" term to "Numeric Column" (what it actually implies) in many class names.
  • Generified ComplexMetricExtractor

@leventov
Copy link
Copy Markdown
Member Author

leventov commented Jul 8, 2018

@nishantmonu51 FYI I've tried to fix #5956 opportunistically in this PR, but a lot of tests started to fail. So I reverted.

private Object[] arrayToObjectArray(Object array)
{
final Object[] objectArray = new Object[Array.getLength(array)];
for (int j = 0; j < Array.getLength(array); j++) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this line of code can become this way?

int len = Array.getLength(array);
final Object[] objectArray = new Object[len];
for (int j = 0; j < len; j++) {...}

Cuz, for (int j = 0; j < Array.getLength(array); j++) {...} this code style will call Array.getLength method for each for loop.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, extracted a variable.

final ByteBuffer buf = ByteBuffer.allocateDirect(bytes.length).put(bytes);
buf.rewind();
return new ImmutableConciseSet(buf);
return new ImmutableConciseSet(buf.asIntBuffer());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is asIntBuffer necessary? If the ImmutableConciseSet(ByteBuffer byteBuffer) constructor is called, it will also call the asIntBuffer method internally..

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed one of the constructors, don't remember for what reason, maybe to reduce ambiguity. I don't see that one version is better than another.

}

ByteBuffer buffer = ByteBuffer.wrap(new byte[calcNumBytes(rTree)]);
ByteBuffer buffer = ByteBuffer.allocate(calcNumBytes(rTree));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe using ByteBuffer.allocateDirect will get better performance.

ByteBuffer buffer = ByteBuffer.allocateDirect(calcNumBytes(rTree));

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct ByteBuffers are not the thing that "magically makes everything faster". There are two main reasons to use direct ByteBuffers:

  • Reduce the size of Java Heap, (ordinary BBs are backed with byte[] arrays which are in the heap), to reduce GC pause times. Yes, even heap populated with arrays of primitives could affect GC pause times negatively. However, this comes with many downsides:

    • Native allocations are slower
    • Native allocations contribute to potentially unrecoverable fragmentation. On the contrary, Java Heap fragmentation is cleaned up, sooner or later.
    • You need ensure timely freeing of direct BBs, preferably with try-with-resources, not relying on reference cleaning
  • Avoid extra data copies when going native, particularly when a ByteBuffer participate network or disk IO calls (ByteChannel.read() / write())

Neither of this directly applies to ImmutableRTree.

@asdf2014
Copy link
Copy Markdown
Member

asdf2014 commented Jul 9, 2018

Hi, @leventov . Already created a small PR #5980 to supplement the current PR. If u think it's necessary, then I will create a series of PRs to make it more complete.

@nishantmonu51
Copy link
Copy Markdown
Member

@leventov : can we finish up the review and merge #5958 before making any more major refactorings ?

@leventov
Copy link
Copy Markdown
Member Author

@nishantmonu51 we can (however it will take several more weeks), but you could review this PR in parallel?

@himanshug
Copy link
Copy Markdown
Contributor

Renamed GenericColumnSerializer to just ColumnSerializer and changed the signature of it's serialize() method. This class was annotated @ExtensionPoint, however it doesn't look so to me.

It is extensible when someone writes a custom aggreagtor and extension wants to take control of full column serialization instead default behavior where extension writer only controls serialization of individual row values by providing a ObjectStrategy.
We do have few aggregators (not in the open source) that take advantage of it.

@leventov
Copy link
Copy Markdown
Member Author

@himanshug thanks. It should be said in ColumnSerializer javadoc, could you please add?

@leventov
Copy link
Copy Markdown
Member Author

@nishantmonu51 @gianm could we finish off this PR? Given it effectively has three approvals for three weeks, and still not merged?

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Sep 20, 2018

I haven't had a chance to review the entire thing yet, beyond the partial review I did in #5957 (review). Please don't consider me as blocking this PR, just not yet having the opportunity to review it in full.

By way of explanation as to why: I can't speak for everyone, but personally I am prioritizing my time towards reviewing PRs that make functional changes (which we are getting a lot of!) vs. ones that are refactorings/cleanups. Doesn't mean I'll never get to it, but hopefully that explains why I haven't yet. I understand this is a dev blocker for you, so I guess all I can say to that is, I would suggest avoiding making functional changes dependent on large nonfunctional ones.

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Sep 25, 2018

I think this PR should be looked at in 0.13.1

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Sep 26, 2018

I think the PR should be broken into multiple PRs to make it much easier to review. The impact of breaking changes in such a significant PR impacting so many files is high. Furthermore, I'd like to see each separate PR comprehensively describe what was changed, and what the benefit was. I'm moving the milestone to 0.13.1 for now so give us enough time to properly review this PR.

@fjy fjy modified the milestones: 0.13.0, 0.13.1 Sep 26, 2018
@leventov
Copy link
Copy Markdown
Member Author

I'm not going to break up this PR - it's inadequately high price and not really needed for a PR that already has three approvals.

I was asking @nishantmonu51 to approve extra changes (that are quite big, but consisting exclusively of mechanical import reorder so not changing any behaviour) because this is "nice" but I think not formally required. So since it turns out that he doesn't have time for this, I think this PR could be merged already.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Sep 28, 2018

Is this still "Incompatible"?

@gianm gianm self-requested a review September 28, 2018 23:01
@leventov
Copy link
Copy Markdown
Member Author

@gianm no, since the reversion of the rename of GenericColumnSerializer.

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes here look fine to me. I think we can merge as soon as the conflicts are resolved.

But in general I think this PR would have been better as a few separate ones; the issue I experienced was not in terms of number of lines, but in terms of the variety of unrelated changes. One specific thing that would have helped is moving the trivial changes into their own PRs: one for import reordering and one for the cleanup of various comparators and lambdas. When trivial changes are mixed in to a larger PR, it is time-consuming to identify the substantive changes (like renaming of Column to ColumnHolder, and collapsing of that interface's methods).

I also think it's best to think twice before making large amounts of trivial changes. There is a cost to making any change (time for reviewers to read the changes, time for authors to resolve conflicts with PRs that generate large amounts of conflicts, and the fact that even trivial changes can introduce bugs or unintentional behavior changes). I think it would have been better to do the imports, comparator, and lambda cleanups gradually, as those files were edited for other reasons.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Oct 2, 2018

The test failures look legit (although are not a big deal) and are related to the renaming of HLLCV1. I don't think it's important that the value stay the same, so I think updating the tests is fine.

Failed tests: 
org.apache.druid.sql.calcite.http.SqlResourceTest.testCsvResultFormat(org.apache.druid.sql.calcite.http.SqlResourceTest)
  Run 1: SqlResourceTest.testCsvResultFormat:500 expected:<[2000-01-01T00:00:00.000Z,1,,a,1.0,1.0,org.apache.druid.hll.HLLCV1,, 2000-01-02T00:00:00.000Z,1,10.1,,2.0,2.0,org.apache.druid.hll.HLLCV1,, , ]> but was:<[2000-01-01T00:00:00.000Z,1,,a,1.0,1.0,org.apache.druid.hll.VersionOneHyperLogLogCollector,, 2000-01-02T00:00:00.000Z,1,10.1,,2.0,2.0,org.apache.druid.hll.VersionOneHyperLogLogCollector,, , ]>
  Run 2: SqlResourceTest.testCsvResultFormat:500 expected:<[2000-01-01T00:00:00.000Z,1,,a,1.0,1.0,org.apache.druid.hll.HLLCV1,, 2000-01-02T00:00:00.000Z,1,10.1,,2.0,2.0,org.apache.druid.hll.HLLCV1,, , ]> but was:<[2000-01-01T00:00:00.000Z,1,,a,1.0,1.0,org.apache.druid.hll.VersionOneHyperLogLogCollector,, 2000-01-02T00:00:00.000Z,1,10.1,,2.0,2.0,org.apache.druid.hll.VersionOneHyperLogLogCollector,, , ]>
  Run 3: SqlResourceTest.testCsvResultFormat:500 expected:<[2000-01-01T00:00:00.000Z,1,,a,1.0,1.0,org.apache.druid.hll.HLLCV1,, 2000-01-02T00:00:00.000Z,1,10.1,,2.0,2.0,org.apache.druid.hll.HLLCV1,, , ]> but was:<[2000-01-01T00:00:00.000Z,1,,a,1.0,1.0,org.apache.druid.hll.VersionOneHyperLogLogCollector,, 2000-01-02T00:00:00.000Z,1,10.1,,2.0,2.0,org.apache.druid.hll.VersionOneHyperLogLogCollector,, , ]>
  Run 4: SqlResourceTest.testCsvResultFormat:500 expected:<[2000-01-01T00:00:00.000Z,1,,a,1.0,1.0,org.apache.druid.hll.HLLCV1,, 2000-01-02T00:00:00.000Z,1,10.1,,2.0,2.0,org.apache.druid.hll.HLLCV1,, , ]> but was:<[2000-01-01T00:00:00.000Z,1,,a,1.0,1.0,org.apache.druid.hll.VersionOneHyperLogLogCollector,, 2000-01-02T00:00:00.000Z,1,10.1,,2.0,2.0,org.apache.druid.hll.VersionOneHyperLogLogCollector,, , ]>

@leventov leventov merged commit 3ae5632 into apache:master Oct 2, 2018
@leventov leventov deleted the various-changes branch October 2, 2018 17:50
@leventov
Copy link
Copy Markdown
Member Author

leventov commented Oct 2, 2018

@gianm thanks a lot for review and the suggestions - I'll account in future PRs.

@dclim dclim modified the milestones: 0.13.1, 0.13.0 Oct 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants