Renamed 'Generic Column' -> 'Numeric Column'; Fixed a few resource leaks in processing; misc refinements by leventov · Pull Request #5957 · apache/druid

leventov · 2018-07-06T13:43:52Z

This PR accumulates many refactorings and small improvements that I did while preparing the next change set of https://github.com/druid-io/druid/projects/2. I finally decided to make them a separate PR to minimize the volume of the main PR.

Some of the changes:

Renamed confusing "Generic Column" term to "Numeric Column" (what it actually implies) in many class names.
Generified ComplexMetricExtractor

leventov · 2018-07-08T08:58:37Z

@nishantmonu51 FYI I've tried to fix #5956 opportunistically in this PR, but a lot of tests started to fail. So I reverted.

asdf2014 · 2018-07-08T14:53:49Z

+      private Object[] arrayToObjectArray(Object array)
+      {
+        final Object[] objectArray = new Object[Array.getLength(array)];
+        for (int j = 0; j < Array.getLength(array); j++) {


Maybe this line of code can become this way?

int len = Array.getLength(array); final Object[] objectArray = new Object[len]; for (int j = 0; j < len; j++) {...}

Cuz, for (int j = 0; j < Array.getLength(array); j++) {...} this code style will call Array.getLength method for each for loop.

Thanks, extracted a variable.

asdf2014 · 2018-07-08T15:24:12Z

    final ByteBuffer buf = ByteBuffer.allocateDirect(bytes.length).put(bytes);
    buf.rewind();
-    return new ImmutableConciseSet(buf);
+    return new ImmutableConciseSet(buf.asIntBuffer());


Is asIntBuffer necessary? If the ImmutableConciseSet(ByteBuffer byteBuffer) constructor is called, it will also call the asIntBuffer method internally..

I removed one of the constructors, don't remember for what reason, maybe to reduce ambiguity. I don't see that one version is better than another.

asdf2014 · 2018-07-08T15:36:42Z

    }

-    ByteBuffer buffer = ByteBuffer.wrap(new byte[calcNumBytes(rTree)]);
+    ByteBuffer buffer = ByteBuffer.allocate(calcNumBytes(rTree));


Maybe using ByteBuffer.allocateDirect will get better performance.

ByteBuffer buffer = ByteBuffer.allocateDirect(calcNumBytes(rTree));

Direct ByteBuffers are not the thing that "magically makes everything faster". There are two main reasons to use direct ByteBuffers:

Reduce the size of Java Heap, (ordinary BBs are backed with byte[] arrays which are in the heap), to reduce GC pause times. Yes, even heap populated with arrays of primitives could affect GC pause times negatively. However, this comes with many downsides:

Native allocations are slower

Native allocations contribute to potentially unrecoverable fragmentation. On the contrary, Java Heap fragmentation is cleaned up, sooner or later.

You need ensure timely freeing of direct BBs, preferably with try-with-resources, not relying on reference cleaning

Avoid extra data copies when going native, particularly when a ByteBuffer participate network or disk IO calls (ByteChannel.read() / write())

Neither of this directly applies to ImmutableRTree.

asdf2014 · 2018-07-09T08:56:57Z

Hi, @leventov . Already created a small PR #5980 to supplement the current PR. If u think it's necessary, then I will create a series of PRs to make it more complete.

nishantmonu51 · 2018-07-09T16:49:29Z

@leventov : can we finish up the review and merge #5958 before making any more major refactorings ?

leventov · 2018-07-10T14:45:07Z

@nishantmonu51 we can (however it will take several more weeks), but you could review this PR in parallel?

himanshug · 2018-07-11T00:05:21Z

Renamed GenericColumnSerializer to just ColumnSerializer and changed the signature of it's serialize() method. This class was annotated @ExtensionPoint, however it doesn't look so to me.

It is extensible when someone writes a custom aggreagtor and extension wants to take control of full column serialization instead default behavior where extension writer only controls serialization of individual row values by providing a ObjectStrategy.
We do have few aggregators (not in the open source) that take advantage of it.

leventov · 2018-07-14T17:59:39Z

@himanshug thanks. It should be said in ColumnSerializer javadoc, could you please add?

leventov · 2018-09-20T12:32:42Z

@nishantmonu51 @gianm could we finish off this PR? Given it effectively has three approvals for three weeks, and still not merged?

gianm · 2018-09-20T18:11:41Z

I haven't had a chance to review the entire thing yet, beyond the partial review I did in #5957 (review). Please don't consider me as blocking this PR, just not yet having the opportunity to review it in full.

By way of explanation as to why: I can't speak for everyone, but personally I am prioritizing my time towards reviewing PRs that make functional changes (which we are getting a lot of!) vs. ones that are refactorings/cleanups. Doesn't mean I'll never get to it, but hopefully that explains why I haven't yet. I understand this is a dev blocker for you, so I guess all I can say to that is, I would suggest avoiding making functional changes dependent on large nonfunctional ones.

fjy · 2018-09-25T00:52:33Z

I think this PR should be looked at in 0.13.1

fjy · 2018-09-26T03:51:27Z

I think the PR should be broken into multiple PRs to make it much easier to review. The impact of breaking changes in such a significant PR impacting so many files is high. Furthermore, I'd like to see each separate PR comprehensively describe what was changed, and what the benefit was. I'm moving the milestone to 0.13.1 for now so give us enough time to properly review this PR.

leventov · 2018-09-26T07:23:26Z

I'm not going to break up this PR - it's inadequately high price and not really needed for a PR that already has three approvals.

I was asking @nishantmonu51 to approve extra changes (that are quite big, but consisting exclusively of mechanical import reorder so not changing any behaviour) because this is "nice" but I think not formally required. So since it turns out that he doesn't have time for this, I think this PR could be merged already.

gianm · 2018-09-28T22:59:57Z

Is this still "Incompatible"?

leventov · 2018-09-29T13:21:00Z

@gianm no, since the reversion of the rename of GenericColumnSerializer.

gianm

The changes here look fine to me. I think we can merge as soon as the conflicts are resolved.

But in general I think this PR would have been better as a few separate ones; the issue I experienced was not in terms of number of lines, but in terms of the variety of unrelated changes. One specific thing that would have helped is moving the trivial changes into their own PRs: one for import reordering and one for the cleanup of various comparators and lambdas. When trivial changes are mixed in to a larger PR, it is time-consuming to identify the substantive changes (like renaming of Column to ColumnHolder, and collapsing of that interface's methods).

I also think it's best to think twice before making large amounts of trivial changes. There is a cost to making any change (time for reviewers to read the changes, time for authors to resolve conflicts with PRs that generate large amounts of conflicts, and the fact that even trivial changes can introduce bugs or unintentional behavior changes). I think it would have been better to do the imports, comparator, and lambda cleanups gradually, as those files were edited for other reasons.

gianm · 2018-10-02T16:14:07Z

The test failures look legit (although are not a big deal) and are related to the renaming of HLLCV1. I don't think it's important that the value stay the same, so I think updating the tests is fine.

Failed tests: 
org.apache.druid.sql.calcite.http.SqlResourceTest.testCsvResultFormat(org.apache.druid.sql.calcite.http.SqlResourceTest)
  Run 1: SqlResourceTest.testCsvResultFormat:500 expected:<[2000-01-01T00:00:00.000Z,1,,a,1.0,1.0,org.apache.druid.hll.HLLCV1,, 2000-01-02T00:00:00.000Z,1,10.1,,2.0,2.0,org.apache.druid.hll.HLLCV1,, , ]> but was:<[2000-01-01T00:00:00.000Z,1,,a,1.0,1.0,org.apache.druid.hll.VersionOneHyperLogLogCollector,, 2000-01-02T00:00:00.000Z,1,10.1,,2.0,2.0,org.apache.druid.hll.VersionOneHyperLogLogCollector,, , ]>
  Run 2: SqlResourceTest.testCsvResultFormat:500 expected:<[2000-01-01T00:00:00.000Z,1,,a,1.0,1.0,org.apache.druid.hll.HLLCV1,, 2000-01-02T00:00:00.000Z,1,10.1,,2.0,2.0,org.apache.druid.hll.HLLCV1,, , ]> but was:<[2000-01-01T00:00:00.000Z,1,,a,1.0,1.0,org.apache.druid.hll.VersionOneHyperLogLogCollector,, 2000-01-02T00:00:00.000Z,1,10.1,,2.0,2.0,org.apache.druid.hll.VersionOneHyperLogLogCollector,, , ]>
  Run 3: SqlResourceTest.testCsvResultFormat:500 expected:<[2000-01-01T00:00:00.000Z,1,,a,1.0,1.0,org.apache.druid.hll.HLLCV1,, 2000-01-02T00:00:00.000Z,1,10.1,,2.0,2.0,org.apache.druid.hll.HLLCV1,, , ]> but was:<[2000-01-01T00:00:00.000Z,1,,a,1.0,1.0,org.apache.druid.hll.VersionOneHyperLogLogCollector,, 2000-01-02T00:00:00.000Z,1,10.1,,2.0,2.0,org.apache.druid.hll.VersionOneHyperLogLogCollector,, , ]>
  Run 4: SqlResourceTest.testCsvResultFormat:500 expected:<[2000-01-01T00:00:00.000Z,1,,a,1.0,1.0,org.apache.druid.hll.HLLCV1,, 2000-01-02T00:00:00.000Z,1,10.1,,2.0,2.0,org.apache.druid.hll.HLLCV1,, , ]> but was:<[2000-01-01T00:00:00.000Z,1,,a,1.0,1.0,org.apache.druid.hll.VersionOneHyperLogLogCollector,, 2000-01-02T00:00:00.000Z,1,10.1,,2.0,2.0,org.apache.druid.hll.VersionOneHyperLogLogCollector,, , ]>

leventov · 2018-10-02T17:50:57Z

@gianm thanks a lot for review and the suggestions - I'll account in future PRs.

leventov added Bug Area - Batch Ingestion Area - Querying Incompatible Improvement Development Blocker labels Jul 6, 2018

leventov requested review from gianm and nishantmonu51 July 6, 2018 13:43

Various changes

9359961

leventov force-pushed the various-changes branch from 86d28b3 to 9359961 Compare July 6, 2018 13:44

leventov added 2 commits July 6, 2018 18:12

Fixes

ff5141d

Fixes

e768315

Null fix

9dce337

asdf2014 reviewed Jul 8, 2018

View reviewed changes

Fix inspection:

b5ef733

leventov added 3 commits July 14, 2018 12:37

Merge remote-tracking branch 'upstream/master' into various-changes

ddef55d

Fix in JavaScriptAggregatorFactory and License header fixes

9c3f4f6

Fix checkstyle

1bf6471

asdf2014 approved these changes Jul 14, 2018

View reviewed changes

leventov added 3 commits July 15, 2018 09:34

Close dimension value lookups in StringDimensionMergerV9

b01a43f

Suppress MustBeClosedChecker

407890e

Close CloseableIndexed in tests

bd10c18

leventov mentioned this pull request Sep 18, 2018

"Object" aggregator factories mutate queryable data during index merging #4672

Closed

leventov added this to the 0.13.0 milestone Sep 18, 2018

Merge remote-tracking branch 'upstream/master' into various-changes

df517ca

Merge remote-tracking branch 'upstream/master' into various-changes

d02b6b6

leventov force-pushed the various-changes branch from c0ce675 to d02b6b6 Compare September 25, 2018 11:46

fjy modified the milestones: 0.13.0, 0.13.1 Sep 26, 2018

leventov added 2 commits September 26, 2018 11:30

Merge remote-tracking branch 'upstream/master' into various-changes

bd179f6

Merge remote-tracking branch 'upstream/master' into various-changes

f387ce8

gianm self-requested a review September 28, 2018 23:01

Merge remote-tracking branch 'upstream/master' into various-changes

5b018cc

leventov removed the Incompatible label Sep 29, 2018

leventov added 2 commits October 1, 2018 11:51

Merge remote-tracking branch 'upstream/master' into various-changes

4eb7f5c

Merge remote-tracking branch 'upstream/master' into various-changes

c0f7ceb

gianm approved these changes Oct 2, 2018

View reviewed changes

Merge remote-tracking branch 'upstream/master' into various-changes

17d0333

Fix test

5fff237

leventov merged commit 3ae5632 into apache:master Oct 2, 2018

leventov deleted the various-changes branch October 2, 2018 17:50

dclim modified the milestones: 0.13.1, 0.13.0 Oct 9, 2018

Conversation

leventov commented Jul 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leventov commented Jul 8, 2018

Uh oh!

asdf2014 Jul 8, 2018

Choose a reason for hiding this comment

Uh oh!

leventov Jul 14, 2018

Choose a reason for hiding this comment

Uh oh!

asdf2014 Jul 8, 2018

Choose a reason for hiding this comment

Uh oh!

leventov Jul 14, 2018

Choose a reason for hiding this comment

Uh oh!

asdf2014 Jul 8, 2018

Choose a reason for hiding this comment

Uh oh!

leventov Jul 14, 2018

Choose a reason for hiding this comment

Uh oh!

asdf2014 commented Jul 9, 2018

Uh oh!

nishantmonu51 commented Jul 9, 2018

Uh oh!

leventov commented Jul 10, 2018

Uh oh!

himanshug commented Jul 11, 2018

Uh oh!

leventov commented Jul 14, 2018

Uh oh!

leventov commented Sep 20, 2018

Uh oh!

gianm commented Sep 20, 2018

Uh oh!

fjy commented Sep 25, 2018

Uh oh!

fjy commented Sep 26, 2018

Uh oh!

leventov commented Sep 26, 2018

Uh oh!

gianm commented Sep 28, 2018

Uh oh!

leventov commented Sep 29, 2018

Uh oh!

gianm left a comment

Choose a reason for hiding this comment

Uh oh!

gianm commented Oct 2, 2018

Uh oh!

leventov commented Oct 2, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

leventov commented Jul 6, 2018 •

edited

Loading