[QTL] Move kafka-extraction-namespace to the Lookup framework. by drcrallen · Pull Request #2800 · apache/druid

drcrallen · 2016-04-07T00:33:32Z

The major changes to the PR are as follows:

The kafka extraction namespace stuff now works off the lookup extractor factory framework. It still uses the caching mechanisms in the main lookup extension, but uses some random UUID in the cache instead of trying to keep some sort of relationship to the lookup name (which is enforced by the Lookup framework in core).

This means the kafka-extraction-namespace must go through the main lookup framework.

drcrallen · 2016-04-07T18:08:22Z

Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.575 sec <<< FAILURE! - in io.druid.query.lookup.TestKafkaExtractionCluster
testSimpleRename(io.druid.query.lookup.TestKafkaExtractionCluster)  Time elapsed: 4.57 sec  <<< ERROR!
com.fasterxml.jackson.databind.JsonMappingException: Could not resolve type id 'kafka' into a subtype of [simple type, class io.druid.query.lookup.LookupExtractorFactory]
 at [Source: {"type":"kafka","kafkaTopic":"testTopic","kafkaProperties":{"request.required.acks":"1","zookeeper.connect":"127.0.0.1:54849/kafka","zookeeper.sync.time.ms":"200","zookeeper.session.timeout.ms":"10000"}}; line: 1, column: 2]
    at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
    at com.fasterxml.jackson.databind.DeserializationContext.unknownTypeException(DeserializationContext.java:862)
    at com.fasterxml.jackson.databind.jsontype.impl.TypeDeserializerBase._findDeserializer(TypeDeserializerBase.java:167)
    at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:99)
    at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:84)
    at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:132)
    at com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize(TypeWrappedDeserializer.java:41)
    at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3066)
    at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2161)
    at io.druid.query.lookup.TestKafkaExtractionCluster.setUp(TestKafkaExtractionCluster.java:266)

actual problem, fixing

nishantmonu51 · 2016-04-18T13:17:15Z

+
+  private final Object startStopLock = new Object();
+  private final ListeningExecutorService executorService;
+  private final AtomicLong doubleEventCount = new AtomicLong(0L);


why do we maintain doubleEventCount instead of eventCount ?

I was trying to have a simple way to minimize race conditions and locking. I could do a read/write lock if this isn't good enough.

Basically, before and after the critical section the count is increased (as opposed to just doing before or just doing after).

The crux revolves around "What was the state of the map that produced this result" for computing cache key. Which, for a continuously mutable map is a little tricky.

nishantmonu51 · 2016-04-18T13:29:13Z

could you summarize all the changes in this PR in the description ?

b-slim · 2016-04-20T18:55:56Z

-|`druid.query.rename.kafka.properties`|A json map of kafka consumer properties. See below for special properties.|See below|
-
-The following are the handling for kafka consumer properties in `druid.query.rename.kafka.properties`
+The consumer properties `group.id` and `auto.offset.reset` CANNOT be set in `kafkaProperties` as they are set by the extension as `UUID.randomUUID().toString()` and `smallest` respectively.


i am curious what is the implication of this constraint ?

auto.offeset.reset as smallest means "read all the data available in the topic" otherwise two different servers could replay different changelogs.

group.id means every instance is a unique consumer, so they should be accounted for as different consumers.

drcrallen · 2016-04-21T23:01:47Z

Merge problems from introspection PR, fixing

drcrallen · 2016-04-22T14:15:14Z

@b-slim I'm trying to hammer out some tests to prevent racy-ness, but otherwise this should be done.

b-slim · 2016-04-27T14:38:09Z

+  private final String factoryId = UUID.randomUUID().toString();
+  private final AtomicReference<Map<String, String>> mapRef = new AtomicReference<>(null);
+
+  private AtomicBoolean started = new AtomicBoolean(false);


can we use this for startStop lock as well ?

that is possible yes, changed

…tency

b-slim · 2016-04-28T21:36:05Z

@drcrallen can you explain how the caching mechanism works ?
I guess First the user need to know how caching works and where the cache is leaving.
Plus i am not sure how eviction is suppose to work here ? this impl is going to fill the cache with all the entry thought.

drcrallen · 2016-04-28T21:37:32Z

@b-slim this impl fills the cache with all entries without eviction.

drcrallen · 2016-04-28T21:40:00Z

@b-slim this does not change caching mechanism from prior impl I don't understand why caching would be a blocker for this PR

b-slim · 2016-04-28T21:40:05Z

@drcrallen i know that, my comments it to help the user understand how thing works by adding it to the docs. IMHO it is a serious limitation to mention.

b-slim · 2016-04-28T21:40:58Z

@drcrallen i am not blocking it, just update the DOCs to reflect the limitation.

b-slim · 2016-04-28T21:41:12Z

👍 after Docs and squash.

drcrallen · 2016-04-28T21:41:18Z

Will add docs very shortly

drcrallen · 2016-04-28T21:45:15Z

@b-slim added

# Limitations

Currently the Kafka lookup extractor feeds the entire kafka stream into a local cache. If you are using OnHeap caching, this can easily clobber your java heap if the kafka stream spews a lot of unique keys.
OffHeap caching should alleviate these concerns, but there is still a limit to the quantity of data that can be stored.
There is currently no eviction policy.

drcrallen · 2016-04-28T23:18:38Z

https://travis-ci.org/druid-io/druid/jobs/126506259 died in


-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running io.druid.indexing.kafka.KafkaDataSourceMetadataTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.022 sec - in io.druid.indexing.kafka.KafkaDataSourceMetadataTest
Running io.druid.indexing.kafka.KafkaIndexTaskTest

nishantmonu51 · 2016-05-02T16:44:50Z

👍 , LGTM

drcrallen added the Improvement label Apr 7, 2016

drcrallen added this to the 0.9.1 milestone Apr 7, 2016

drcrallen mentioned this pull request Apr 7, 2016

[QTL] Implement LookupExtractorFactory of namespaced lookup #2716

Closed

drcrallen force-pushed the kafkaLookupOnly branch from a922358 to 90f4e4d Compare April 11, 2016 16:14

nishantmonu51 reviewed Apr 18, 2016
View reviewed changes

b-slim reviewed Apr 20, 2016
View reviewed changes

drcrallen added 2 commits April 21, 2016 15:43

Move kafka-extraction-namespace to the Lookup framework.

6daa551

Address comments

604b215

Fix missing kafka introspection

4ca0c26

drcrallen force-pushed the kafkaLookupOnly branch from b9b1685 to 4ca0c26 Compare April 21, 2016 23:02

drcrallen added 2 commits April 21, 2016 22:56

Fix tests to be less racy

c259ec1

Make testing a bit more leniant

29bc8d0

Make tests even more forgiving

0dac9b6

b-slim reviewed Apr 27, 2016
View reviewed changes

Add comments to kafka lookup cache method

a7b5901

drcrallen added 4 commits April 27, 2016 08:12

Move startStopLock to just use started

601f940

Make start() and stop() idempotent

faa9505

Forgot to update test after last change, test now accounts for idempo…

d9613f7

…tency

Add extra idempotency on stop check

36817cd

drcrallen mentioned this pull request Apr 27, 2016

[QTL] Immediate future plans #2889

Closed

Add more descriptive docs of behavior

4ab5e38

drcrallen closed this Apr 28, 2016

drcrallen reopened this Apr 28, 2016

drcrallen merged commit 54b717b into apache:master May 2, 2016

nishantmonu51 deleted the kafkaLookupOnly branch May 2, 2016 16:45

Conversation

drcrallen commented Apr 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drcrallen commented Apr 7, 2016

Uh oh!

nishantmonu51 Apr 18, 2016

Choose a reason for hiding this comment

Uh oh!

drcrallen Apr 19, 2016

Choose a reason for hiding this comment

Uh oh!

nishantmonu51 commented Apr 18, 2016

Uh oh!

b-slim Apr 20, 2016

Choose a reason for hiding this comment

Uh oh!

drcrallen Apr 20, 2016

Choose a reason for hiding this comment

Uh oh!

drcrallen commented Apr 21, 2016

Uh oh!

drcrallen commented Apr 22, 2016

Uh oh!

b-slim Apr 27, 2016

Choose a reason for hiding this comment

Uh oh!

drcrallen Apr 27, 2016

Choose a reason for hiding this comment

Uh oh!

b-slim commented Apr 28, 2016

Uh oh!

drcrallen commented Apr 28, 2016

Uh oh!

drcrallen commented Apr 28, 2016

Uh oh!

b-slim commented Apr 28, 2016

Uh oh!

b-slim commented Apr 28, 2016

Uh oh!

b-slim commented Apr 28, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drcrallen commented Apr 28, 2016

Uh oh!

drcrallen commented Apr 28, 2016

Uh oh!

drcrallen commented Apr 28, 2016

Uh oh!

nishantmonu51 commented May 2, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

drcrallen commented Apr 7, 2016 •

edited

Loading

b-slim commented Apr 28, 2016 •

edited

Loading