Skip to content

Allow use of non-threadsafe ObjectCachingColumnSelectorFactory#4397

Merged
leventov merged 7 commits intoapache:masterfrom
metamx:non-threadsafe-objectcachingcolumnselectorfactory
Jun 16, 2017
Merged

Allow use of non-threadsafe ObjectCachingColumnSelectorFactory#4397
leventov merged 7 commits intoapache:masterfrom
metamx:non-threadsafe-objectcachingcolumnselectorfactory

Conversation

@xanec
Copy link
Copy Markdown
Contributor

@xanec xanec commented Jun 13, 2017

Added am additional boolean parameter in IncrementalIndex constructor (and subsequently the constructors for its subclasses OnheapIncrementalIndex and OffheapIncrementalIndex) and the protected function initAggs to indicate if ConcurrentMap should be used.

@xanec xanec force-pushed the non-threadsafe-objectcachingcolumnselectorfactory branch from 68413b2 to 1085000 Compare June 13, 2017 02:40
@@ -366,14 +390,24 @@ public void close()
// here its made thread safe to support the special case of groupBy where the multiple threads can add concurrently to the IncrementalIndex.
static class ObjectCachingColumnSelectorFactory implements ColumnSelectorFactory
{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please also change code of makeXxxColumnSelector methods to use computeIfAbsent() instead of putIfAbsent()?

@@ -91,6 +92,26 @@ public OffheapIncrementalIndex(
aggBuffers.add(bb);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no compatibility concerns, so suggested to inline all usages of this constructor and remove it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a similar constructor for OnheapIncrementalIndex, should I inline it as well?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Also having 4 boolean parameters in a row on the call sites is dangerous (to accidentally switch them) and not readable, consider replacing one or two of them with enums of two options.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought so too. Actually, I was considering creating Builders for On/OffheapIncrementalIndex, would that be better?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will go with that.

@leventov leventov changed the title [BACKEND-902] Allow use of non-threadsafe ObjectCachingColumnSelectorFactory Allow use of non-threadsafe ObjectCachingColumnSelectorFactory Jun 13, 2017

if (concurrentEventAdd) {
longColumnSelectorMap = Maps.newConcurrentMap();
floatColumnSelectorMap = Maps.newConcurrentMap();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use ConcurrentHashMap, the implementation behind Maps.newConcurrentMap() has higher overhead and is less scalable.

@jihoonson
Copy link
Copy Markdown
Contributor

It would be nice if you add docs to somewhere when IncrementalIndex must be thread-safe or not.

@xanec
Copy link
Copy Markdown
Contributor Author

xanec commented Jun 13, 2017

Hi @jihoonson, it's already stated together with ObjectCachingColumnSelectorFactory:

// Caches references to selector objects for each column instead of creating a new object each time in order to save heap space.
// In general the selectorFactory need not to thread-safe.
// here its made thread safe to support the special case of groupBy where the multiple threads can add concurrently to the IncrementalIndex.

But I will add in additional remarks/docs together with the declaration of the parameter.

- Replace Maps.newXXXMap() with normal instantiation
- Documentations on when is thread-safe required.
- Use Builders for On/OffheapIncrementalIndex
.withRollup(IncrementalIndexSchema.DEFAULT_ROLLUP)
.build()
)
.setMaxRowCount(1000)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is about groupBy, shouldn't it have .setConcurrentEventAdd(true)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed that.

boolean sortFacts,
int maxRowCount
)
public static class Builder
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This builder and OffHeapIncrementalIndex.Builder could be the same class with two building methods: buildOffHeap() and buildOnHeap().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that can be done.

Actually, I have also considered having OffheapIncrementalIndex.Builder to extend the OnheapIncrementalIndex.Builder. Would this be better?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think overriding doesn't fit here, just IncrementalIndexBuilder with two different building methods is better IMO.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it 👍

);
return prev != null ? prev : newSelector;
}
return floatColumnSelectorMap.computeIfAbsent(columnName, delegate::makeFloatColumnSelector);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

computeIfAbsent() when majority of calls if expected to find the entry is present already could be slower. So what I meant is code like this:

FloatColumnSelector existing = floatColumnSelectorMap.get(columnName);
if (existing != null) {
  return existing;
} else {
  floatColumnSelectorMap.computeIfAbsent(columnName, delegate::makeFloatColumnSelector);
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But won't this be reflected in the implementation of computeIfAbsent? The source code of the Map interface as follows:

    default V computeIfAbsent(K key,
            Function<? super K, ? extends V> mappingFunction) {
        Objects.requireNonNull(mappingFunction);
        V v;
        if ((v = get(key)) == null) {
            V newValue;
            if ((newValue = mappingFunction.apply(key)) != null) {
                put(key, newValue);
                return newValue;
            }
        }

        return v;
    }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of how code is inlined it might be not identical. And for ConcurrentHashMap the difference is bigger, because computeIfAbsent() always locks, and get() just makes a volatile read.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the explanation.

);
return prev != null ? prev : newSelector;
}
return longColumnSelectorMap.computeIfAbsent(columnName, delegate::makeLongColumnSelector);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

);
return prev != null ? prev : newSelector;
}
return objectColumnSelectorMap.computeIfAbsent(columnName, delegate::makeObjectColumnSelector);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

)
).build()
)
.setMaxRowCount(NUM_POINTS)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setReportParseExceptions(false) should be here

.withMinTimestamp(0)
.withQueryGranularity(Granularities.NONE)
.withMetrics(aggs)
.withRollup(IncrementalIndexSchema.DEFAULT_ROLLUP)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEFAULT_ROLLUP (true) is set by default in the builder, don't need to be specified explicitly

return new OnheapIncrementalIndex.Builder()
.setIncrementalIndexSchema(
new IncrementalIndexSchema.Builder()
.withMinTimestamp(0)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the default value, don't need to be specified explicitly.

return new OnheapIncrementalIndex.Builder()
.setIncrementalIndexSchema(
new IncrementalIndexSchema.Builder()
.withQueryGranularity(Granularities.NONE)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the default value, don't need to be specified explicitly.

new IncrementalIndexSchema.Builder()
.withQueryGranularity(Granularities.NONE)
.withMetrics(schemaInfo.getAggsArray())
.withDimensionsSpec(DimensionsSpec.ofEmpty())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the default value, don't need to be specified explicitly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW DimensionsSpec.ofEmpty() is not constant, maybe makes sense to make it so.

- Constant EMPTY DimensionsSpec
- Improvement on IncrementalIndexSchema.Builder
  - Remove setting of default values
  - Use var args for metrics
- Correction on On/OffheapIncrementalIndex Builders
- Combine On/OffheapIncrementalIndex Builders
Copy link
Copy Markdown
Member

@leventov leventov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also fix Checkstyle

return new OnheapIncrementalIndex.Builder()
return new IncrementalIndex.Builder()
.setIncrementalIndexSchema(
new IncrementalIndexSchema.Builder()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very frequent pattern, suggested to add shortcut like IncrementalIndex.Builder. setSimpleIncrementalIndexSchema(AggregatorFactory...)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure but it seems that such usage (i.e., setting only metrics) only occurs in non-productions use (i.e., tests and benchmarks). Should we add in another API only for these uses?

Copy link
Copy Markdown
Member

@leventov leventov Jun 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to remove boilerplate and repetition from tests as well as from production code, because the need to write a lot of boilerplate demotivates people to write tests. Also you can rename method setIncrementalIndexSchema() to just setIndexSchema() because in the context of IncrementalIndex.Builder it's not needed to repeat that the index schema is "incremental".

You could call this method setSimpleTestingIndexSchema and annotate @VisibleForTesting to emphasize that it should be used only for testing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay 👍

return this;
}

// A helper method to set a simple index schema with only metrics and default values for the other parameters. Note
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opps, really sorry about that.

// that this method is normally used for testing and benchmarking; it is unlikely that you would use it in
// production settings.

/** A helper method to set a simple index schema with only metrics and default values for the other parameters. Note
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to Java style text should start on the next line after /**

@leventov leventov merged commit f68a069 into apache:master Jun 16, 2017
@leventov leventov deleted the non-threadsafe-objectcachingcolumnselectorfactory branch June 16, 2017 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants