Use Combiner to merge InputRows at the Mapper during Hadoop Batch Ingestion#1472
Use Combiner to merge InputRows at the Mapper during Hadoop Batch Ingestion#1472fjy merged 5 commits intoapache:masterfrom
Conversation
655713d to
dcafabc
Compare
3b925fc to
606d593
Compare
There was a problem hiding this comment.
do we need to check index size bounds here ?
There was a problem hiding this comment.
@nishantmonu51 can you point me to some code doing the bound checking? I dont see that happening in IndexGeneratorReducer.reduce(..) either.
There was a problem hiding this comment.
the reducer checks whether a row can be appended to the indexer or not by calling index.canAppendRow() , If the index is full it persists the current one and creates another index.
There was a problem hiding this comment.
thanks, I see that now.
In this case persist(..) is not really an option, what we can do is.. flush rows from that index into context.write(..) and create another index if/when index.canAppendRow() returns false.
|
how much using combiner helped in improving ingestion time ? |
|
@nishantmonu51 i haven't done much perf testing on this because performance improvements will really depend upon how much merging is possible, however I have ensured that if there was only one row for a key then combiner is effectively noop with almost no overhead. |
There was a problem hiding this comment.
are there any changes in this function or did it just get moved around? It might be easier to leave it in place for review and worry about formatting separately.
There was a problem hiding this comment.
just moved so that it can be used in combiner class too.
|
It looks like the tact taken was to create IncrementalIndex objects in the combiner and then persist those. I agree that this will functionally work, but I fear that it will be really difficult to manage and tune from the MR side. There are new configurations to manage about how much memory to give the combiner, etc. that will make it difficult for someone who doesn't know MR to have this "just work". I think we need to move the dimension names and values into the actual Key object and have each combiner work on only a single entry. |
There was a problem hiding this comment.
Is it possible to have the dimList be the list used in most parts of the function, and dims be absent? the ArrayWritable can take dimList.toArray(new String[dimList.size()]) or similar
There was a problem hiding this comment.
instead of relying on hadoop serialization here, would it make sense for us to define our own so we can potentially re-use that serde elsewhere? For instance, it might speed up serializing and passing large groupby results between nodes, and might be useful for on-disk merging of group by results in the future? Just a thought.
There was a problem hiding this comment.
that makes sense. However, from toByte(..) fromBytes(..) contract perspective, there is no dependency on hadoop libs. when I did it, hadoop libs just felt like most obvious and simple choice since it was already included in this module.
important thing is that serde implementation can be changed completely at any time with no functional user facing impact. We can do it whenever needed.
|
@drcrallen I have cleaned up the commit history already. commits in the history now represent key logical stages of the development which I would like to preserve. |
There was a problem hiding this comment.
I'd move this to the beginning. It's the only fixed-width portion of what is being serialized and having it in a known location can make it easier to re-use this format for other things (like partitioning, etc.) if we ever want to.
There was a problem hiding this comment.
done, moved timestamp to be first in the serialized byte[] .
|
Aside from the bits in serialization, I'm 👍 @himanshug Can you verify those bits and then merge at will? |
…tricSerde This provides the alternative to using ComplexMetricSerde.getObjectStrategy() and using the serde methods from ObjectStrategy as that usage pattern is deprecated.
This allows for arbitrary InputFormat while hadoop batch ingestion that can return records of value type other than Text
…dd a hadoop combiner to hadoop batch ingestion to do merges on the mappers if possible
|
@himanshug I can merge this since it has 2 +1s now. Do you still want to fix anything else up? |
|
@fjy pls go ahead, i'm done. |
Use Combiner to merge InputRows at the Mapper during Hadoop Batch Ingestion
|
@himanshug did we ever do some performance testing to get a sense of how much speed improvements we can get for things that aggregate well, and how much impact it may have on things that don't aggregate well? It would be useful to know at which point it makes sense to start using the combiner or not. |
|
@xvrl not yet, but, will do that. |
|
@xvrl I've just moved my old avro hadoop indexing module to that based on @himanshug 's unification of hadoop indexing code in 0.8.1, it's more clear in code compare with our old hacky way. And here are some performance testing results against combiner. Obviously combiner is wonderful for my data which I think is low cardinality, caused 75% indexing time off. Great job! Avro indexing w/ mapout compressed and w/o combinerUsed 310 mapper and 2 reducer, with Average time: map=51s, reduce=43m, shuffle=11m, merge=4m Counters are:
Avro indexing w/ mapout compressed and w/ combinerUsed 310 mapper and 2 reducer, with Average time: map=1m, reduce=10m, shuffle=3m, merge=16s Counters are:
|
|
|
|
@zhaown thanks for reporting the numbers, glad that its working well for you. @drcrallen I don't think adding the numbers straight away is an indicator of total job time. You would have to see something like.. ignoring overhead, % improvement ~ 75% |
|
@drcrallen @himanshug I think adding the numbers straight away is the total cpu-clock time istead of the wall-clock time. After all Druid has to do what it need to do for indexing, combiner does no magic reducing works besides shuffle-like-overhead. It's like moving some work from reducer to mapper, and while we cannot have many reducers as we don't want many output files, we can have many mappers, so using combiner increases parallelism, reduces wall-clock time. Actually to further increase parallelism, I change targetPartitionSize from 5,000,000 to 1,000,000, then I have 6 reducers instead of 2, reduce total indexing time by ~50%+ off. |
|
@zhaown @himanshug So is it safe to say that this patch helps an individual job running on its own cluster, but the amortized run time of a multi-tenant cluster which is run at or above capacity might not change (or not change much)? |
|
@drcrallen individual job finishing faster would mean slots getting freed up quicker. also, this is good because your ingestion finishes faster. however, since this reduces the time taken for reducers mainly which are way less in number than mappers so overall cluster capacity might not change that much. |
|
@himanshug As such does it also change the memory requirements of the mapper? |
|
@drcrallen yes, but just enough to hold one "merged" row in an index object and overhead associated with combiner. not sure how to calculate per mapper overhead from the numbers above. |
|
@himanshug sure no problem, I just want to make sure the potential impact areas are communicated |
this patch
(1) takes control of the serialization format of InputRow from mapper to reducer and in turn allows InputFormat to return records of arbitrary type and not just Text (only thing is that configured InputRowParser can produce InputRow from that arbitrary type)
(2) introduces combiner and merges input rows at mapper nodes if possible