Skip to content

Simplifying dimension merging#2094

Merged
himanshug merged 1 commit intoapache:masterfrom
navis:simplify-index-merge
Jan 29, 2016
Merged

Simplifying dimension merging#2094
himanshug merged 1 commit intoapache:masterfrom
navis:simplify-index-merge

Conversation

@navis
Copy link
Copy Markdown
Contributor

@navis navis commented Dec 15, 2015

Currently, dimension merging is processed by two stages. One for dictionary, one for index. If this can be processed by single stage, total processing time could be deceased.

@navis navis force-pushed the simplify-index-merge branch from 80a3377 to 72d4987 Compare December 16, 2015 02:30
@navis
Copy link
Copy Markdown
Contributor Author

navis commented Dec 16, 2015

Reduced dim conversion time from 11.6 sec to 7.1 sec, with 12 index with 500K rows each.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the performance improvement when use IntBuffer instead of int[]?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think it will affect performace (I'll check that tomorrow). Is IntBuffer is better than int[]? Less intuitive for me.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just do not know why use IntBuffer,and other people can tell why.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously the IntBuffer was allocated off-heap, with this change it will be on-heap. Whether this is better or worse can be debated. Currently we rely on garbage collection to hopefully cleanup our buffers before we run out of memory, but allocating it on-heap may create lots of heap pressure if those arrays are big and long-lived. It might be worth some benchmarks to get a sense of how things will be affected.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With direct IntBuffer, it took 8 seconds, not making any notable differences in performance. I prefer int[] but I'm ok with direct buffer. Opinion?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heap pressure is already really bad during the making of index files, and making sure we know how this change impacts heap pressure during that time is important. During the merge and persist phase of realtime tasks, we already have very high CPU usage, enough to where you have to be aware how it impacts query performance. Adding in more heap pressure during that phase should be done with great care.

There are also issues with heap size during the reduce portion of hadoop tasks (or spark batch tasks). So I'm curious if adding more objects (int[]) messes the limit of the number of rows per segment (or if it impacts high cardinality dimensions).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted to direct IntBuffer.

@binlijin
Copy link
Copy Markdown
Contributor

👍

@navis navis force-pushed the simplify-index-merge branch 2 times, most recently from 98e5a91 to 82bab15 Compare December 17, 2015 02:59
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a more descriptive name than counter?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numDimensions or something

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to numMergeIndex

@navis navis force-pushed the simplify-index-merge branch from 82bab15 to a278fe5 Compare December 21, 2015 01:06
@fjy
Copy link
Copy Markdown
Contributor

fjy commented Dec 29, 2015

👍 This looks good to me, but I think someone else familiar with this code needs to do a review

@binlijin
Copy link
Copy Markdown
Contributor

binlijin commented Jan 6, 2016

+1

@binlijin
Copy link
Copy Markdown
Contributor

binlijin commented Jan 7, 2016

can you squash the commits ?
If no one have any question i will merge.

@navis navis force-pushed the simplify-index-merge branch from a278fe5 to 35bc224 Compare January 7, 2016 01:32
@navis
Copy link
Copy Markdown
Contributor Author

navis commented Jan 7, 2016

@binlijin squashed. thanks.

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Jan 7, 2016

@binlijin do you mind holding off on merging? I just want 1 more pair of eyes to review this

@xvrl can you take a look?

@binlijin
Copy link
Copy Markdown
Contributor

binlijin commented Jan 7, 2016

@fjy ok

@navis navis force-pushed the simplify-index-merge branch 3 times, most recently from 182a0e1 to c1b0f06 Compare January 18, 2016 02:51
@navis
Copy link
Copy Markdown
Contributor Author

navis commented Jan 19, 2016

@binlijin It's becoming more and more painful to rebase. Do you really have a time to look into this?

@binlijin
Copy link
Copy Markdown
Contributor

@navis, i am ok with the PR,but fj tell need @xrvl to take a look.

1 similar comment
@binlijin
Copy link
Copy Markdown
Contributor

@navis, i am ok with the PR,but fj tell need @xrvl to take a look.

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Jan 20, 2016

@navis yes, there's multiple teams working on the same code, which is why we need proposals to coordinate, and all the changes are important

@fjy fjy added this to the 0.9.0 milestone Jan 20, 2016
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these be static? also can we define them at the top of the file?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove timer. just used to check the performance.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use Guava's Stopwatch?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this part

@jon-wei
Copy link
Copy Markdown
Contributor

jon-wei commented Jan 22, 2016

@navis

Looks good to me, I don't have any additional comments beyond what @fjy has already noted.

Can you rebase? I'll do another review pass on this after.

@navis navis force-pushed the simplify-index-merge branch from c1b0f06 to 05fc7dc Compare January 24, 2016 23:52
@navis
Copy link
Copy Markdown
Contributor Author

navis commented Jan 24, 2016

Rebased on master, barely. I'll address comments.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename "WithConversion", e.g. ConvertingIndexSeeker or IndexSeekerWithConversion? Currently it sounds a bit like a boolean parameter and it's not immediately clear that it's a seeker

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@jon-wei
Copy link
Copy Markdown
Contributor

jon-wei commented Jan 26, 2016

👍 looks good to me after addressing the Seeker renaming comments

@navis navis force-pushed the simplify-index-merge branch from d55fc5e to d74e526 Compare January 27, 2016 00:42
@fjy
Copy link
Copy Markdown
Contributor

fjy commented Jan 27, 2016

@himanshug @xvrl do you want to take a look? I think this is getting close to ready and will merge unless there's more comments

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming above code is as is moved into method makeRowIterable(..)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, made a method to be used in V9Merger.

@himanshug
Copy link
Copy Markdown
Contributor

👍 pls squash.

@navis navis force-pushed the simplify-index-merge branch from d74e526 to dd774ef Compare January 29, 2016 01:04
@navis
Copy link
Copy Markdown
Contributor Author

navis commented Jan 29, 2016

squashed

himanshug added a commit that referenced this pull request Jan 29, 2016
@himanshug himanshug merged commit 93c50d8 into apache:master Jan 29, 2016
@fjy fjy mentioned this pull request Feb 5, 2016
@navis navis deleted the simplify-index-merge branch February 13, 2016 03:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants