Add Sql InputSource by a2l007 · Pull Request #9449 · apache/druid

a2l007 · 2020-03-02T21:31:01Z

Add Sql InputSource support for ingesting events from RDBMS using parallel indexing.

This PR has:

been self-reviewed.
added documentation for new or modified features or behaviors.
added unit tests or modified existing tests to cover new code paths.
been tested in a test Druid cluster.

vogievetsky · 2020-03-05T03:36:55Z

Does this inputSource not need an inputFormat? How is that handled?

a2l007 · 2020-03-05T20:09:06Z

SqlInputSource wouldn't require a custom inputFormat since the input data is being read off as result sets from a RDBMS source. Each sql entity is read into a local file and an iterator is returned based of this file.

nishantmonu51

LGTM, 👍

pjain1 · 2020-04-28T11:50:33Z

                new NamedType(SqlFirehoseFactory.class, "sql"),
-                new NamedType(InlineFirehoseFactory.class, "inline")
+                new NamedType(InlineFirehoseFactory.class, "inline"),
+                new NamedType(SqlInputSource.class, "sql")


can this be part of new module as InputSource seems to replacement for firehose related interfaces ?

Thanks. Extracted it to a separate module.

Thanks, however I would have just called it InputSourceModule and made it available to all processes by adding it in makeInjectorWithModules of Initialization.java similar to FirehoseModule.

I can refactor the module name, but I believe this module wouldn't be needed in the broker, historical and router processes since its specific to indexing.

…putsource

suneet-s · 2020-05-06T23:50:06Z

@a2l007 have you thought about how to add integration tests for something like this? Can we add innodb or something in a docker container and run it as part of the integration testing framework?

a2l007 · 2020-05-07T01:52:58Z

@suneet-s Yeah good idea! I'm planning to add InputSource based equivalent for CombiningFirehose first as well as fix this bug: #9389
I'd take a stab at adding integration tests for both after that.

jihoonson

LGTM overall. Left minor comments on docs.

jihoonson · 2020-05-07T21:59:17Z

    return new CloseableIterator<R>()
    {
-      CloseableIterator<R> iterator = findNextIeteratorIfNecessary();
+      CloseableIterator<R> iterator = findNextIteratorIfNecessary();


jihoonson · 2020-05-07T22:01:53Z

+```
+
+The spec above will read all events from two separate sqls
+within the interval `2013-01-01/2013-01-02`.


Maybe worth mentioning one more time that these SQLs are executed in two sub tasks when you run a Parallel task.

jihoonson · 2020-05-07T22:03:19Z

+...
+```
+
+The spec above will read all events from two separate sqls


sqls should be properly capitalized as SQLs.

jihoonson · 2020-05-07T22:16:47Z

+  protected CloseableIterator<Map<String, Object>> intermediateRowIterator() throws IOException
+  {
+    final Closer closer = Closer.create();
+    final InputEntity.CleanableFile resultFile = closer.register(source.fetch(temporaryDirectory, null));


Could you add a comment on why we fetch all result in local storage first? I remember this is to avoid holding database connections for too long time. It would help other developers.

jihoonson · 2020-05-07T22:18:31Z

+    final Closer closer = Closer.create();
+    final InputEntity.CleanableFile resultFile = closer.register(source.fetch(temporaryDirectory, null));
+    FileInputStream inputStream = new FileInputStream(resultFile.file());
+    JsonIterator<Map<String, Object>> jsonIterator = new JsonIterator<>(new TypeReference<Map<String, Object>>()


nit: this doesn't have to be done in this PR, but how about making JsonIterator a CloseableIterator? It already implements Iterator and Closeable so it would be pretty simple.

jihoonson · 2020-05-07T22:44:20Z

+|property|description|required?|
+|--------|-----------|---------|
+|type|This should be "sql".|Yes|
+|database|Specifies the database connection details.|Yes|


Would you add more detailed docs for this parameter? It should probably mention that you have to load some extension to read from a particular type of database.

jihoonson · 2020-05-07T22:49:37Z

 This spec above will only return the `page`, `user` dimensions and `added` metric.
 Only rows where `page` = `Druid` will be returned.

+### Sql Input Source


One more thing, I remember that many people from our community have been asking about how to use SqlFirehose. What do you think about adding a section that explains how to use it in production environment? To be honest, it's not clear for me what are best practices to make a scalable and efficient pipeline using this input source. For example, how do you parallelize each ingestion task (which means, how do you split queries)? How do you handle data updates in database especially after ingestion job is done? How often do you run ingestion jobs? and so on.

@jihoonson I have added more details related to this in the docs. Please take a look. Since this is a one time ingestion from rdbms, updates to the source db data for a specific interval would require a new sql ingestion task be spawned for the same interval which will replace the old segments.

By the way, I would like to add a continuous ingestion from RDBMS feature as well which could use the indexing service with a RDBMS supervisor. Do you think this would be a useful feature?

I would definitely vote for having such a supervisor! That will be super useful.

suneet-s

Thanks for the contribution. I have some questions around how the packages are structured - should this be an extension or part of druid-core? If it should be part of core, can we move all of it under a set of smaller packages so it's easier to maintain (ie. all Sql ingestion stuff is under a 1 module, with potentially sub-packages inside it)

Also I think more logging on what to do operationally when something goes wrong would be helpful. If I'm reading the code correctly, the only real issue is running out of disk space in the temp folder - since we stream in the results. Are there any safeguards we can put in place to prevent users from blowing up the machine by issuing a SELECT * query to a very large database?

2bethere · 2020-05-08T19:19:01Z

Thanks for the contribution, I have a few questions.

If the SQL table has a timestamp like column, is there a way for me to specify this as a parameter so that not the entire table gets pulled?
Is there a way for me to specify which column to split this on? Because the user might already know how the table is sharded/partitioned to make it more efficient in parallel ingestion
If incremental loads are supported, how are duplicates handled? Do I specify a key or this is handled downstream?

…putsource

a2l007 · 2020-05-29T16:36:25Z

@2bethere
Thanks for taking a look.

If the SQL table has a timestamp like column, is there a way for me to specify this as a parameter so that not the entire table gets pulled?

You could use WHERE clauses within your SQL query to restrict the data based on your requirements. It is recommended to filter SQL queries based on the intervals specified in the granularity spec so as to avoid handling unwanted data.

Is there a way for me to specify which column to split this on? Because the user might already know how the table is sharded/partitioned to make it more efficient in parallel ingestion

There isnt a direct way to split the input data based on a column as this InputSource splits the task into sub tasks based on the the number of SQL queries. One way you could spread out the data across sub-tasks would be to introduce pagination within your SQL queries.

If incremental loads are supported, how are duplicates handled? Do I specify a key or this is handled downstream?

This InputSource is no different any other native batch InputSource types in terms of handling updates. Therefore, any changes in your source db for a specific interval would require you to ingest the entire data for that interval again and this will replace the existing segments for the interval.
Hope that helps.

jihoonson · 2020-05-31T00:55:19Z

+...
+```
+
+The spec above will read all events from two separate SQLs within the interval `2013-01-01/2013-01-02`.


Where is the interval 2013-01-01/2013-01-02 from?

jihoonson · 2020-05-31T01:00:54Z

+
+* During indexing, each sub-task would execute one of the SQL queries and the results are stored locally on disk. The sub-tasks then proceed to read the data from these local input files and generate segments. Presently, there isn’t any restriction on the size of the generated files and this would require the MiddleManagers or Indexers to have sufficient disk capacity based on the volume of data being indexed.
+
+* Filtering the SQL queries based on the intervals specified in the `granularitySpec` can avoid unwanted data being retrieved and stored locally by the indexing sub-tasks.


I'm not sure what it means by "avoid unwanted data being retrieved and stored locally". Does this mean the subtask can modify the sql to filter out data out of the interval in the granularitySpec? Would you point me out where it is implemented?

No, I meant to say that the SQL queries should have date time range based WHERE clauses with the same interval as the interval specified in the granularitySpec. If the query doesnt have date time based filters, the query would be pulling data outside the required intervals, which is unwanted. Hope it is clear now. I have added an example as well.

That makes sense 🙂

jihoonson · 2020-05-31T01:04:07Z

+          }
+      );
+    }
+    return new CleanableFile()


The tempFile will not be deleted if an exception is thrown in any lines above. We should catch all exceptions and delete the file properly.

jihoonson · 2020-05-31T01:05:59Z

+    JsonIterator<Map<String, Object>> jsonIterator = new JsonIterator<>(new TypeReference<Map<String, Object>>()
+    {
+    }, inputStream, closer, objectMapper);
+    return new CloseableIterator<Map<String, Object>>()


Thanks for making the JsonIterator a CloseableIterator. Now you can return jsonIterator directly and remove this.

jihoonson · 2020-05-31T01:07:27Z

 This spec above will only return the `page`, `user` dimensions and `added` metric.
 Only rows where `page` = `Druid` will be returned.

+### Sql Input Source


I would definitely vote for having such a supervisor! That will be super useful.

jihoonson

@a2l007 thanks for updating the PR. I left some more comments. Also please check the CI failures.

> mdspell --en-us --ignore-numbers --report '../docs/**/*.md'
    ../docs/ingestion/native-batch.md
     1351 | red to the other native batch InputSources, SQL InputSource behaves diff 
>> 1 spelling error found in 165 files

You may need to add InputSources to the website/.spelling file.

Another CI failure is an insufficient test coverage of the SqlEntity.

Diff coverage statistics:
------------------------------------------------------------------------------
|     lines      |    branches    |   functions    |   path
------------------------------------------------------------------------------
| 100% (2/2)     | 100% (0/0)     |  80% (4/5)     | org/apache/druid/segment/realtime/firehose/SqlFirehoseFactory.java
|  89% (25/28)   |  92% (13/14)   |  75% (24/32)   | org/apache/druid/metadata/input/SqlInputSource.java
|  75% (3/4)     | 100% (0/0)     |  80% (4/5)     | org/apache/druid/metadata/input/InputSourceModule.java
|  74% (37/50)   |  35% (7/20)    |  73% (28/38)   | org/apache/druid/metadata/input/SqlEntity.java
|  80% (4/5)     | 100% (0/0)     |  83% (5/6)     | org/apache/druid/metadata/input/SqlInputFormat.java
|  95% (21/22)   | 100% (0/0)     |  76% (19/25)   | org/apache/druid/metadata/input/SqlReader.java
------------------------------------------------------------------------------
Total diff coverage:
 - lines: 82% (92/111)
 - branches: 58% (20/34)
 - functions: 75% (84/111)
ERROR: Insufficient branch coverage of 58% (20/34). Required 65%.

You may want to add a unit test that verifies the tempFile is deleted properly even when an exception is thrown before returning CleanableFile.

jihoonson · 2020-05-31T01:12:53Z

+      throws IOException
+  {
+    try (FileOutputStream fos = new FileOutputStream(tempFile)) {
+      final JsonGenerator jg = objectMapper.getFactory().createGenerator(fos);


Please move jg to the above try-resource clause so that i can be closed safely.

…putsource

a2l007 · 2020-06-01T22:32:34Z

@jihoonson Thanks for reviewing. I've addressed the review comments. Could you run travis again? It seems to have failed for an different reason.

jihoonson · 2020-06-01T22:35:31Z

I restarted the timed out test. Will take another look and finish my review soon.

jihoonson

LGTM. @a2l007 thanks!

suneet-s

Thanks for your patience with this @a2l007

Last set of concerns:

Is there a benefit to this approach vs exporting the results of a sql query to temp files and then using another InputSource to ingest the data? I ask this because I wonder if Druid has to take on the extra maintainability burden of correctly downloading and serializing sql query results.
There appears to be an NPE bug in CaseFoldedMap
Better WARNING message telling users that SqlInputSource is not recommended for production use yet.

suneet-s · 2020-06-03T18:09:05Z

+The spec above will read all events from two separate SQLs for the interval `2013-01-01/2013-01-02`.
+Each of the SQL queries will be run in its own sub-task and thus for the above example, there would be two sub-tasks.
+
+Compared to the other native batch InputSources, SQL InputSource behaves differently in terms of reading the input data and so it would be helpful to consider the following points before using this InputSource in a production environment:


Can we mark this as a big warning in the docs? I think a similar warning should be made on line 1316 indicating that this functionality is experimental and not yet recommended for production use.

Is this necessary? This inputsource has been used in a couple of production type environments internally and as long as the indexer is allocated enough disk space by the cluster operator, it shouldnt run into any issues.
I've added some text asking the user to review these points before using the input source.

I wouldn't say the SqlInputSource is an experimental. The experimental feature in Druid means either 1) the feature hasn't been stabilized yet and there could be potential bugs while using it or 2) the feature might be stable but can be removed in the future since we know there is a better way. In this case, I believe there will be a better way for ingesting from databases such as the SqlSupervisor mentioned here, but even in a better way, we can build the SqlSupervisor on top of the SqlInputSource (probably the supervisor can use it). Also, we have been providing the SqlFirehose as a non-experimental feature.

Maybe instead of experimental, we should call this feature Beta. It indicates that there is future work coming and we may want to change the behavior or configuration of the InputSource, so users shouldn't rely on the API / behavior being consistent across releases.

Since there are no integration tests, I'm concerned about calling this GA, because someone would have to run through these tests manually before each release to make sure we did not accidentally break the SqlInputSource

Maybe instead of experimental, we should call this feature Beta. It indicates that there is future work coming and we may want to change the behavior or configuration of the InputSource, so users shouldn't rely on the API / behavior being consistent across releases.

I agree that more fine-grained feature lifecycle would be nice, but it should be discussed separately in the dev mailing list. For APIs/behaviors, all Druid InputSources are UnstableApi which may change in breaking ways at any time even between minor Druid release lines.

Since there are no integration tests, I'm concerned about calling this GA, because someone would have to run through these tests manually before each release to make sure we did not accidentally break the SqlInputSource

Since all inputSources are UnstableApis, I wouldn't call any of them GA. Integration tests sound nice, we should add them in the near future.

suneet-s · 2020-06-03T18:43:20Z

+            ).iterator();
+            jg.writeStartArray();
+            while (resultIterator.hasNext()) {
+              jg.writeObject(resultIterator.next());


Sorry I missed this earlier: Why did we chose json serialization? What happens if the object returned from the result set can not be serialized by the json generator?

The design was made to align with one of the multiple use cases where events from json sources were imported into RDBMS and this inputsource would be used to index off of that. The data types supported by file formats would work with this Inputsource as well. Going forward I’d like to take a stab at evaluating if this result file can be hooked up to read from one of the TextReader implementations. The tricky part right now is that the intermediate row parsing logic for SqlReader is completely different compared to the TextReader one.

…putsource

suneet-s

Since there are no integration tests, I'd like to advocate that we call this InputSource beta in the docs or something like that. Otherwise LGTM.

If others in the community are ok with the wording in the docs - I'm happy to change my vote.

@nishantmonu51 @pjain1 @jihoonson - any thoughts on this comment since you previously reviewed this change.

And thanks again @a2l007 for the PR - this will definitely make it easier for new users to try out Druid 🎉

…putsource

a2l007 · 2020-06-08T19:47:23Z

Since there are no integration tests, I'd like to advocate that we call this InputSource beta in the docs or something like that. Otherwise LGTM.

If others in the community are ok with the wording in the docs - I'm happy to change my vote.

Tagging previous reviewers: @jihoonson @nishantmonu51 @pjain 1 : Could you please provide your feedback regarding this?

jihoonson · 2020-06-08T20:34:41Z

I wrote my thought here. It maybe makes sense to call all InputSources Beta or something depending on what Beta means. However, we don't have such feature tags yet and so it should be discussed separately instead of in this PR.

Regarding integration tests, I would say the original author or anyone can add it later. I think it's better to not add it in this PR since we don't have a framework to easily add such a test yet, which mean the PR size can grow large. We can open an issue for those integration tests and tag the next major release version so that those issues become release blockers.

pjain1 · 2020-06-09T12:25:16Z

We already have documentation for InputSource and its been used in production, we can just add notes there about the potential pitfalls and discuss separately what should it be called - experimental, beta or anything else.

suneet-s · 2020-06-09T14:29:24Z

Regarding integration tests, I would say the original author or anyone can add it later. I think it's better to not add it in this PR since we don't have a framework to easily add such a test yet, which mean the PR size can grow large. We can open an issue for those integration tests and tag the next major release version so that those issues become release blockers.

I've filed #10009 to add the integration tests - @a2l007 will you have time to add them before the 0.19 release?

@pjain1 - I see you and @jon-wei were discussing a security model for Druid in #9380 Does adding an InputSource that allows a Druid user to read any SQL table - like the metadata store, config tables, etc. break the security model you had in mind for Druid?

suneet-s · 2020-06-09T14:44:18Z

@pjain1 - I see you and @jon-wei were discussing a security model for Druid in #9380 Does adding an InputSource that allows a Druid user to read any SQL table - like the metadata store, config tables, etc. break the security model you had in mind for Druid?

Reading the code more closely, it looks like the user will have to provide credentials to the metadata store to read the data. This seems like a reasonable expectation. Initially I thought a user could use the credentials already configured in the Druid cluster to read the metadata store.

suneet-s

Approved since we have a blocking issue for 0.19 to add integration tests to verify this works

* Add Sql InputSource * Add spelling * Use separate DruidModule * Change module name * Fix docs * Use sqltestutils for tests * Add additional tests * Fix inspection * Add module test * Fix md in docs * Remove annotation Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>

maytasm · 2020-06-26T00:50:25Z

Have a PR up for adding integration tests to SqlInputSource. Please see: #10080

* web-console clean coverage report on build clean (#9718) * fixes for inline subqueries when multi-value dimension is present (#9698) * fixes for inline subqueries when multi-value dimension is present * fix test * allow missing capabilities for vectorized group by queries to be treated as single dims since it means that column doesnt exist * add comment * Fix numbered list formatting in markdown. (#9664) * Align library version (#9636) * align JUnitParams version 1.1.1,1.0.4 to 1.1.1 * aligin junit version 4.8.1,4.12 to 4.12 * exclude explicitly specified version * Fixes intermittent failure in ITAutoCompactionTest (#9739) * fix intermittent failure in ITAutoCompactionTest * fix typo * update javadoc * Add QueryResource to log4j2 template. (#9735) * Add integration tests for kafka ingestion (#9724) * add kafka admin and kafka writer * refactor kinesis IT * fix typo refactor * parallel * parallel * parallel * parallel works now * add kafka it * add doc to readme * fix tests * fix failing test * test * test * test * test * address comments * addressed comments * Datasource doc structure adjustments. (#9716) - Reorder both the datasource and query-execution page orderings to table, lookup, union, inline, query, join. (Roughly increasing order of conceptual "fanciness".) - Add more crosslinks from datasource page to query-execution page: one per datasource type. * Optimize FileWriteOutBytes to avoid high system cpu usage (#9722) * optimize FileWriteOutBytes to avoid high sys cpu * optimize FileWriteOutBytes to avoid high sys cpu -- remove IOException * optimize FileWriteOutBytes to avoid high sys cpu -- remove IOException in writeOutBytes.size * Revert "optimize FileWriteOutBytes to avoid high sys cpu -- remove IOException in writeOutBytes.size" This reverts commit 965f7421 * Revert "optimize FileWriteOutBytes to avoid high sys cpu -- remove IOException" This reverts commit 149e08c0 * optimize FileWriteOutBytes to avoid high sys cpu -- avoid IOEception never thrown check * Fix size counting to handle IOE in FileWriteOutBytes + tests * remove unused throws IOException in WriteOutBytes.size() * Remove redundant throws IOExcpetion clauses * Parameterize IndexMergeBenchmark Co-authored-by: huanghui.bigrey <huanghui.bigrey@bytedance.com> Co-authored-by: Suneet Saldanha <suneet.saldanha@imply.io> * Initialize SettableByteEntityReader only when inputFormat is not null (#9734) * Lazy initialization of SettableByteEntityReader to avoid NPE * toInputFormat for tsv * address comments * common code * revert datasketches-java version to 1.1.0-incubating until new version is released (#9751) * revert datasketches-java version to 1.1.0-incubating until fix is in place * fix tests * checkstyle * fix issue where CloseableIterator.flatMap does not close inner CloseableIterator (#9761) * fix issue where CloseableIterator.flatMap does not close inner CloseableIterator * more test * style * clarify test * Adjust string comparators used for ingestion (#9742) * Adjust string comparators used for ingestion * Small tweak * Fix inspection, more javadocs * Address PR comment * Add rollup comment * Add ordering test * Fix IncrementaIndexRowCompTest * Test reading from empty kafka/kinesis partitions (#9729) * add test for stream sequence number returns null * fix checkstyle * add index test for when stream returns null * retrigger test * Adding support for autoscaling in GCE (#8987) * Adding support for autoscaling in GCE * adding extra google deps also in gce pom * fix link in doc * remove unused deps * adding terms to spelling file * version in pom 0.17.0-incubating-SNAPSHOT --> 0.18.0-SNAPSHOT * GCEXyz -> GceXyz in naming for consistency * add preconditions * add VisibleForTesting annotation * typos in comments * use StringUtils.format instead of String.format * use custom exception instead of exit * factorize interval time between retries * making literal value a constant * iter all network interfaces * use provided on google (non api) deps * adding missing dep * removing unneded this and use Objects methods instead o 3-way if in hash and comparison * adding import * adding retries around getRunningInstances and adding limit for operation end waiting * refactor GceEnvironmentConfig.hashCode * 0.18.0-SNAPSHOT -> 0.19.0-SNAPSHOT * removing unused config * adding tests to hash and equals * adding nullable to waitForOperationEnd * adding testTerminate * adding unit tests for createComputeService * increasing retries in unrelated integration-test to prevent sporadic failure (hopefully) * reverting queryResponseTemplate change * adding comment for Compute.Builder.build() returning null * table fix (#9769) * Fix problem when running single integration test using -Dit.test= (#9778) * fix running single it * fix checksyle * Improve "waiting for tasks complete" logic in integration tests (#9759) * improve waiting for tasks complete logic in integration tests * improve waiting for tasks complete logic in integration tests * fix forbidden check * changed Preview to Apply (#9757) * Fix potential NPEs in joins (#9760) * Fix potential NPEs in joins intelliJ reported issues with potential NPEs. This was first hit in testing with a filter being pushed down to the left hand table when joining against an indexed table. * More null check cleanup * Optimize filter value rewrite for IndexedTable * Add unit tests for LookupJoinable * Add tests for IndexedTableJoinable * Add non null assert for dimension selector * Supress null warning in LookupJoinMatcher * remove some null checks on hot path * Integration tests for stream ingestion with various data formats (#9783) * Integration tests for stream ingestion with various data formats * fix npe * better logging; fix tsv * fix tsv * exclude kinesis from travis * some readme * Druid Quickstart refactor and update (#9766) * Update data-formats.md Per Suneet, "Since you're editing this file can you also fix the json on line 177 please - it's missing a comma after the }" * Light text cleanup * Removing discussion of sample data, since it's repeated in the data loading tutorial, and not immediately relevant here. * Update index.md * original quickstart full first pass * original quickstart full first pass * first pass all the way through * straggler * image touchups and finished old tutorial * a bit of finishing up * Review comments * fixing links * spell checking gymnastics * More Hadoop integration tests (#9714) * More Hadoop integration tests * Add missing s3 instructions * Address PR comments * Address PR comments * PR comments * Fix typo * remove UnionMergeRule rules from SQL planner (#9797) * Update notice; fix version of druid-query-toolkit (#9799) * fix npe in IncrementalIndexReadBenchmark (#9754) Co-authored-by: 黄辉 <huanghui.bigrey@bytedance.com> * Fixed flaky BlockingPoolTest.testConcurrentTakeBatch() (#9692) * Update documention for metricCompression (#9811) * added number of bins parameter (#9436) * added number of bins parameter * addressed review points * test equals Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com> * Fix filtering on boolean values in transformation (#9812) * Fix filter on boolean value in Transform * assert * more descriptive test * remove assert * add assert for cached string; disable tests * typo * Remove ParseSpec.toInputFormat() (#9815) * Remove toInputFormat() from ParseSpec * fix test * Ignore druid-processing benchmarks in tests (#9821) * Clarifying workerThreads and a few other nits (#9804) * Update data-formats.md Per Suneet, "Since you're editing this file can you also fix the json on line 177 please - it's missing a comma after the }" * Light text cleanup * Removing discussion of sample data, since it's repeated in the data loading tutorial, and not immediately relevant here. * Clarifying accepted values for URI lookup * Update index.md * original quickstart full first pass * original quickstart full first pass * first pass all the way through * straggler * image touchups and finished old tutorial * a bit of finishing up * druid-caffeine-cache ext previously removed * Sample MaxDirectMemorySize value unrealistic * Review comments * fixing links * spell checking gymnastics * workerThreads desc slightly expanded * typo * Typo * Reversing Kafka config order * Changing order of configs for Kinesis * Trying this again: ioConfig then tuningConfig * Avoid sorting values in InDimFilter if possible (#9800) * Avoid sorting values in InDimFilter if possible * tests * more tests * fix and and or filters * fix build * false and true vector matchers * fix vector matchers * checkstyle * in filter null handling * remove wrong test * address comments * remove unnecessary null check * redundant separator * address comments * typo * tests * Add equivalent test coverage for all RHS join impls (#9831) * Add equivalent test coverage for all RHS join impls * address comments * increase druid-histogram postagg test coverage (#9732) * low hanging fruit - presize hash map for DruidSegmentReader (#9836) * fill out missing test coverage for druid-stats, druid-momentsketch, druid-tdigestsketch postaggs (#9740) * postagg test coverage for druid-stats, druid-momentsketch, druid-tdigestsketch and fixes * style fixes * fix comparator for TDigestQuantilePostAggregator * add flag to flattenSpec to keep null columns (#9814) * add flag to flattenSpec to keep null columns * remove changes to inputFormat interface * add comment * change comment message * update web console e2e test * move keepNullColmns to JSONParseSpec * fix merge conflicts * fix tests * set keepNullColumns to false by default * fix lgtm * change Boolean to boolean, add keepNullColumns to hash, add tests for keepKeepNullColumns false + true with no nuulul columns * Add equals verifier tests * Directly rewrite filters on RHS join columns into LHS equivalents (#9818) * Directly rewrite filters on RHS join columns into LHS equivalents * PR comments * Fix inspection * Revert unnecessary ExprMacroTable change * Fix build after merge * Address PR comments * Add TaskCountStatsMonitor to config docs (#9447) * Add javadoc for stream ingestion integration tests (#9795) * fix license registry for com.nimbusds lang-tag (#9860) * Fail incorrectly constructed join queries (#9830) * Fail incorrectly constructed join queries * wip annotation for equals implementations * Add equals tests * fix tests * Actually fix the tests * Address review comments * prohibit Pattern.hashCode() * Add back FieldMayBeFinal inspection (#9865) * Console E2E test docs (#9864) * druid.storage.maxListingLength should default to 1000 for s3 (#9858) * druid.storage.maxListingLength should default to 1000 for s3 * * Address review comments * * Address review comments * * Address comments * Bad plan for table-lookup-lookup join with filter on first lookup and outer limit (#9773) * Bad plan for table-lookup-lookup join with filter on first lookup and outer limit * Bad plan for table-lookup-lookup join with filter on first lookup and outer limit * Bad plan for table-lookup-lookup join with filter on first lookup and outer limit * Bad plan for table-lookup-lookup join with filter on first lookup and outer limit * Bad plan for table-lookup-lookup join with filter on first lookup and outer limit * Bad plan for table-lookup-lookup join with filter on first lookup and outer limit * address comments * address comments * fix checkstyle * address comments * address comments * Fix potential resource leak in ParquetReader (#9852) * Fix potential resource leak in ParquetReader * add test * never thrown exception * catch potential exceptions * Add support for Avro OCF using InputFormat (#9671) * Add AvroOCFInputFormat * Support supplying a reader schema in AvroOCFInputFormat * Add docs for Avro OCF input format * Address review comments * Address second round of review * Datasketches 1 3 0 (#9880) * use the latest datasketches release * new sketch debug print Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com> * refactor SeekableStreamSupervisor usage of RecordSupplier (#9819) * refactor SeekableStreamSupervisor usage of RecordSupplier to reduce contention between background threads and main thread, refactor KinesisRecordSupplier, refactor Kinesis lag metric collection and emitting * fix style and test * cleanup, refactor, javadocs, test * fixes * keep collecting current offsets and lag if unhealthy in background reporting thread * review stuffs * add comment * Number based columns representing time in custom format cannot be used as timestamp column in Druid. (#9877) * Number based columns representing time in custom format cannot be used as timestamp column in Druid. Prior to this fix, if an integer column in parquet is storing dateint in format yyyyMMdd, it cannot be used as timestamp column in Druid as the timestamp parser interprets it as a number storing UTC time instead of treating it as a number representing time in yyyyMMdd format. Data formats like TSV or CSV don't suffer from this problem as the timestamp is passed in an as string which the timestamp parser is able to parse correctly. * Fails creation of TaskResource if availabilityGroup is null (#9892) * Fails creation of TaskResource if availabilityGroup is null * add check for requiredCapacity * Enforce code coverage (#9863) * Enforce code coverage Add an automated way of checking if new code has adequate unit tests, since merging code coverage reports and check coverage thresholds via coveralls or codecov is unreliable. The following minimum unit test code coverage is now enforced: - 80% functions - 65% branch - 65% line Branch and line coverage thresholds are slightly lower for now as they are harder to achieve. After the code coverage check looks reliable, the thresholds can be increased later if needed. * Add comments * fix docs error: google to azure and hdfs to http (#9881) * Fix deleting a data node tier causes load rules to display incorrectly (#9891) * Fix Deleting a data node tier causes load rules to malfunction & display incorrectly * add tests * fix style * Fix Hadoop IT Legacy test query json was not parameterized (#9901) * Fixed the Copyright year of Druid (#9859) * add some details to the build doc (#9885) * update initial build command * add some details for building * fix spelling check errors * fix spelling check warnings Signed-off-by: frank chen <frank.chen021@outlook.com> * Fix web console query view crashing on simple query (#9897) * only parse full queries * upgraded sql parser * Re-order and document format detection in web console (#9887) Motivation for this change is to not inadvertently identify binary formats that contain uncompressed string data as TSV or CSV. Moving detection of magic byte headers before heuristics should be more robust in general. * Suppress CVEs for openstack-keystone (#9903) CVE-2020-12689, CVE-2020-12691, and CVE-2020-12690 can be ignored for openstack-keystone as they are for the python SDK and druid uses the java SDK. * Update doc on tmp dir (java.io.tmpdir) best practice (#9910) * Update doc on tmp dir best practice * remove local recommendation * Disable function code coverage check (#9933) As observed in https://github.com/apache/druid/pull/9905 and https://github.com/apache/druid/pull/9915, the function code coverage check flags false positive issues, so it should be disabled for now. * Make it easier for devs to add CalciteQueryTests (#9922) * Add ingestion specs for CalciteQueryTests This PR introduces ingestion specs that can be used for local testing so that CalciteQueryTests can be built on a druid cluster. * Add README * Update sql/src/test/resources/calcite/tests/README.md * update kafka client version to 2.5.0 (#9902) - remove dependency on deprecated internal Kafka classes - keep LZ4 version in line with the version shipped with Kafka * Add parameterized Calcite tests for join queries (#9923) * Add parameterized Calcite tests for join queries * new tests * review comments * Fix type restriction for Pattern hashcode inspection (#9947) * Refactor JoinFilterAnalyzer (#9921) * Refactor JoinFilterAnalyzer This patch attempts to make it easier to follow the join filter analysis code with the hope of making it easier to add rewrite optimizations in the future. To keep the patch small and easy to review, this is the first of at least 2 patches that are planned. This patch adds a builder to the Pre-Analysis, so that it is easier to instantiate the preAnalysis. It also moves some of the filter normalization code out to Fitlers with associated tests. * fix tests * Modify information schema doc to specify correct value of TABLE_CATALOG (#9950) * Querying doc refresh tutorial (#9879) * Update tutorial-query.md * First full pass complete * Smoothing over, a bit * link and spell checking * Update querying.md * Review comments; screenshot fixes * Making ports consistent, pending confirmation Switching to the Router port, to make this be consistent with the tutorial ports, but can switch back here and there if it should be 8082 instead. * Resizing screenshot * Update querying.md * Review feedback incorporated. * Refactor JoinFilterAnalyzer - part 2 (#9929) * Refactor JoinFilterAnalyzer This patch attempts to make it easier to follow the join filter analysis code with the hope of making it easier to add rewrite optimizations in the future. To keep the patch small and easy to review, this is the first of at least 2 patches that are planned. This patch adds a builder to the Pre-Analysis, so that it is easier to instantiate the preAnalysis. It also moves some of the filter normalization code out to Fitlers with associated tests. * fix tests * Refactor JoinFilterAnalyzer - part 2 This change introduces the following components: * RhsRewriteCandidates - a wrapper for a list of candidates and associated functions to operate on the set of candidates. * JoinableClauses - a wrapper for the list of JoinableClause that represent a join condition and the associated functions to operate on the clauses. * Equiconditions - a wrapper representing the equiconditions that are used in the join condition. And associated test changes. This refactoring surfaced 2 bugs: - Missing equals and hashcode implementation for RhsRewriteCandidate, thus allowing potential duplicates in the rhs rewrite candidates - Missing Filter#supportsRequiredColumnRewrite check in analyzeJoinFilterClause, which could result in UnsupportedOperationException being thrown by the filter * fix compile error * remove unused class * Optimize join queries where filter matches nothing (#9931) * Refactor JoinFilterAnalyzer This patch attempts to make it easier to follow the join filter analysis code with the hope of making it easier to add rewrite optimizations in the future. To keep the patch small and easy to review, this is the first of at least 2 patches that are planned. This patch adds a builder to the Pre-Analysis, so that it is easier to instantiate the preAnalysis. It also moves some of the filter normalization code out to Fitlers with associated tests. * fix tests * Refactor JoinFilterAnalyzer - part 2 This change introduces the following components: * RhsRewriteCandidates - a wrapper for a list of candidates and associated functions to operate on the set of candidates. * JoinableClauses - a wrapper for the list of JoinableClause that represent a join condition and the associated functions to operate on the clauses. * Equiconditions - a wrapper representing the equiconditions that are used in the join condition. And associated test changes. This refactoring surfaced 2 bugs: - Missing equals and hashcode implementation for RhsRewriteCandidate, thus allowing potential duplicates in the rhs rewrite candidates - Missing Filter#supportsRequiredColumnRewrite check in analyzeJoinFilterClause, which could result in UnsupportedOperationException being thrown by the filter * fix compile error * remove unused class * Refactor JoinFilterAnalyzer - Correlations Move the correlation related code out into it's own class so it's easier to maintain. Another patch should follow this one so that the query path uses the correlation object instead of it's underlying maps. * Optimize join queries where filter matches nothing Fixes #9787 This PR changes the Joinable interface to return an Optional set of correlated values for a column. This allows the JoinFilterAnalyzer to differentiate between the case where the column has no matching values and when the column could not find matching values. This PR chose not to distinguish between cases where correlated values could not be computed because of a config that has this behavior disabled or because of user error - like a column that could not be found. The reasoning was that the latter is likely an error and the non filter pushdown path will surface the error if it is. * only close exec if it exists (#9952) * fix unsafe concurrent access in StreamAppenderatorDriver (#9943) during segment publishing we do streaming operations on a collection not safe for concurrent modification. To guarantee correct results we must also guard any operations on the stream itself. This may explain the issue seen in https://github.com/apache/druid/issues/9845 * Prevent JOIN reducing to a JOIN with constant in the ON condition (#9941) * Prevent Join reducing to on constant condition * Prevent Join reducing to on constant condition * addreess comments * set queryContext in tests * support customized factory.json via IndexSpec for segment persist (#9957) * support customized factory.json via IndexSpec for segment persist * equals verifier * Integration Tests. (#9854) * Integration Tests. Added docker-compose with druid-cluster configuration. Refactored shell scripts. split code in a few files * Integration Tests. Added environment variable: DRUID_INTEGRATION_TEST_GROUP * Integration Tests. Removed nit * Integration Tests. Updated if block in docker_run_cluster.sh. * Integration Tests. Readme. Added Docker-compose section. * Integration Tests. removed yml files for s3, gcs, azure. Renamed variables for skip start/stop/build docker. Updated readme. Rollback maven profile: int-tests-config-file * Integration Tests. Removed docker-compose.test-env.yml file. Added DRUID_INTEGRATION_TEST_GROUP variable to docker-compose.yml * Integration Tests. Readme. Added details about docker-compose * Integration Tests. cleanup shell scripts Co-authored-by: agritsenko <agritsenko@provectus.com> * fix nullhandling exceptions related to test ordering (#9964) follow-up to https://github.com/apache/druid/pull/9570 * Adjust code coverage check (#9969) Since there is not currently a good way to have fine-grain code coverage check exclusions, lower the coverage thresholds to make the check more lenient for now. Also, display the code coverage report in the Travis CI logs to make it easier to understand how to improve coverage. * Fix various Yielder leaks. (#9934) * Fix various Yielder leaks. - CombiningSequence leaked the input yielder from "toYielder" if it ran into an exception while accumulating the last value from the input yielder. - MergeSequence leaked input yielders from "toYielder" if it ran into an exception while building the initial priority queue. - ScanQueryRunnerFactory leaked the input yielder in its "priorityQueueSortAndLimit" strategy if it ran into an exception while scanning and sorting. - YieldingSequenceBase.accumulate chomped IOExceptions thrown in "accumulate" during yielder closing. * Add tests. * Fix braces. * Fix various processing buffer leaks and simplify BlockingPool. (#9928) * - GroupByQueryEngineV2: Fix leak of intermediate processing buffer when exceptions are thrown before result sequence is created. - PooledTopNAlgorithm: Fix leak of intermediate processing buffer when exceptions are thrown before the PooledTopNParams object is created. - BlockingPool: Remove unused "take" methods. * Add tests to verify that buffers have been returned. * remove ListenableFutures and revert to using the Guava implementation (#9944) This change removes ListenableFutures.transformAsync in favor of the existing Guava Futures.transform implementation. Our own implementation had a bug which did not fail the future if the applied function threw an exception, resulting in the future never completing. An attempt was made to fix this bug, however when running againts Guava's own tests, our version failed another half dozen tests, so it was decided to not continue down that path and scrap our own implementation. Explanation for how was this bug manifested itself: An exception thrown in BaseAppenderatorDriver.publishInBackground when invoked via transformAsync in StreamAppenderatorDriver.publish will cause the resulting future to never complete. This explains why when encountering https://github.com/apache/druid/issues/9845 the task will never complete, forever waiting for the publishFuture to register the handoff. As a result, the corresponding "Error while publishing segments ..." message only gets logged once the index task times out and is forcefully shutdown when the future is force-cancelled by the executor. * Document unsupported Join on multi-value column (#9948) * Document Unsupported Join on multi-value column * Document Unsupported Join on multi-value column * address comments * Add unit tests * address comments * add tests * Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT. (#9893) * Add REGEXP_LIKE, fix empty-pattern bug in REGEXP_EXTRACT. - Add REGEXP_LIKE function that returns a boolean, and is useful in WHERE clauses. - Fix REGEXP_EXTRACT return type (should be nullable; causes incorrect filter elision). - Fix REGEXP_EXTRACT behavior for empty patterns: should always match (previously, they threw errors). - Improve error behavior when REGEXP_EXTRACT and REGEXP_LIKE are passed non-literal patterns. - Improve documentation of REGEXP_EXTRACT. * Changes based on PR review. * Fix arg check. * Important fixes! * Add speller. * wip * Additional tests. * Fix up tests. * Add validation error tests. * Additional tests. * Remove useless call. * Fix shutdown reason for unknown tasks in taskQueue (#9954) * Fix shutdown reason for unknown tasks in taskQueue * unused imports * Fix Subquery could not be converted to groupBy query (#9959) * Fix join * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * add tests * address comments * fix failing tests * Fix groupBy with literal in subquery grouping (#9986) * fix groupBy with literal in subquery grouping * fix groupBy with literal in subquery grouping * fix groupBy with literal in subquery grouping * address comments * update javadocs * Integration Tests. Small fixes for CI. (#9988) Co-authored-by: agritsenko <agritsenko@provectus.com> * ColumnCapabilities.hasMultipleValues refactor (#9731) * transition ColumnCapabilities.hasMultipleValues to Capable enum, remove ColumnCapabilities.isComplete * remove artifical, always multi-value capabilities from IncrementalIndexStorageAdapter and fix up fallout from that, fix ColumnCapabilities merge in index merger * fix typo * remove unused method * review stuffs, revert IncrementalIndexStorageAdapater capabilities change, plumb lame workaround to SegmentAnalyzer * more comment * use volatile booleans * fix line length * correctly handle missing columns for vector processors * return ColumnCapabilities.Capable for BitmapIndexSelector.hasMultipleValues, fix vector processor selection for complex * false on non-existent * Empty partitionDimension has less rollup compared to when explicitly specified (#9861) * Empty partitionDimension has less rollup compared to the case when it is explicitly specified * Adding a unit test for the empty partitionDimension scenario. Fixing another test which was failing * Fixing CI Build Inspection Issue * Addressing all review comments * Updating the javadocs for the hash method in HashBasedNumberedShardSpec * Add git pre-commit hook to source control (#9554) * Add git pre-commit hook to source control * Changed hook to pre-push and simply hook to run all checkstyle * Clean up setup-hooks * Add apache header * Add apache header * add documentation to intellij-setup.md * retrigger tests * update Co-authored-by: Maytas Monsereenusorn <52679095+maytasm3@users.noreply.github.com> * Fix compact partially overlapping segments (#9905) * fix compact overlapping segments * fix comment * fix CI failure * fix NilVectorSelector filter optimization (#9989) * Load broadcast datasources on broker and tasks (#9971) * Load broadcast datasources on broker and tasks * Add javadocs * Support HTTP segment management * Fix indexer maxSize * inspection fix * Make segment cache optional on non-historicals * Fix build * Fix inspections, some coverage, failed tests * More tests * Add CliIndexer to MainTest * Fix inspection * Rename UnprunedDataSegment to LoadableDataSegment * Address PR comments * Fix * small fixes to configuration documentation (#9975) * Add Sql InputSource (#9449) * Add Sql InputSource * Add spelling * Use separate DruidModule * Change module name * Fix docs * Use sqltestutils for tests * Add additional tests * Fix inspection * Add module test * Fix md in docs * Remove annotation Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com> * add a GeneratorInputSource to fill up a cluster with generated data for testing (#9946) * move benchmark data generator into druid-processing, add a GeneratorInputSource to fill up a cluster with data * newlines * make test coverage not fail maybe * remove useless test * Update pom.xml * Update GeneratorInputSourceTest.java * less passive aggressive test names * remove incorrect and unnecessary overrides from BooleanVectorValueMatcher (#9994) * remove incorrect and unnecessary overrides from BooleanVectorValueMatcher * add test case * add unit tests for ... part of VectorValueMatcherColumnProcessorFactory * Update VectorValueMatcherColumnProcessorFactoryTest.java * make joinables closeable (#9982) * make joinables closeable * tests and adjustments * refactor to make join stuffs impelement ReferenceCountedObject instead of Closable, more tests * fixes * javadocs and stuff * fix bugs * more test * fix lgtm alert * simplify * fixup javadoc * review stuffs * safeguard against exceptions * i hate this checkstyle rule * make IndexedTable extend Closeable * Fix failed tests in TimestampParserTest when running locally (#9997) * fix failed tests in TimestampPaserTest due to timezone * remove unneeded -Duser.country=US Co-authored-by: huagnhui.bigrey <huanghui.bigrey@bytedance.com> * Simplify CompressedVSizeColumnarIntsSupplierTest (#10003) The parameters generator uses CompressionStrategy.noNoneValues() instead of CompressionStrategyTest.compressionStrategies() which wrapped each strategy in a single element array. This improves readability of the test. * Update password-provider.md (#9857) * ignore brokers in broker views (#10017) * Add instruction for code coverage checks (#9995) * Add instruction for code coverage checks * address comments * Remove duplicate parameters from test (#10022) Commit 771870ae2d312d643e6d98f3d0af8a9618af9681 removed constructor arguments from the rules. Therefore multiple parameters of the test are now the same and can be removed. * Remove colocated datasources from web console for broadcast indexed tables (#10018) * Fix CVE-2020-13602 (#10024) Upgrade postgres jdbc driver to latest version to address CVE, which was fixed in 42.2.13. * Fix broadcast rule drop and docs (#10019) * Fix broadcast rule drop and docs * Remove racy test check * Don't drop non-broadcast segments on tasks, add overshadowing handling * Don't use realtimes for overshadowing * Fix dropping for ingestion services * fix balancer + broadcast segments npe (#10021) * Set the core partition set size properly for batch ingestion with dynamic partitioning (#10012) * Fill in the core partition set size properly for batch ingestion with dynamic partitioning * incomplete javadoc * Address comments * fix tests * fix json serde, add tests * checkstyle * lpad and rpad functions match postrges behavior in SQL compatible mode (#10006) * lpad and rpad functions deal with empty pad Return null if the pad string used by the `lpad` and `rpad` functions is an empty string * Fix rpad * Match PostgreSQL behavior in SQL compliant null handling mode * Match PostgreSQL behavior for pad -ve len * address review comments * Integration test docker compose readme (#10016) * Integration Tests. Docker-compose readme part * Readme updates. PR fixes Co-authored-by: agritsenko <agritsenko@provectus.com> * make phaser of ReferenceCountingCloseableObject protected instead of private so subclasses can do stuff with it (#10035) * Remove LegacyDataSource. (#10037) * Remove LegacyDataSource. Its purpose was to enable deserialization of strings into TableDataSources. But we can do this more straightforwardly with Jackson annotations. * Slight test improvement. * ROUND and having comparators correctly handle special double values (#10014) * ROUND and having comparators correctly handle doubles Double.NaN, Double.POSITIVE_INFINITY and Double.NEGATIVE_INFINITY are not real numbers. Because of this, they can not be converted to BigDecimal and instead throw a NumberFormatException. This change adds support for calculations that produce these numbers either for use in the `ROUND` function or the HavingSpecMetricComparator by not attempting to convert the number to a BigDecimal. The bug in ROUND was first introduced in #7224 where we added the ability to round to any decimal place. This PR changes the behavior back to using `Math.round` if we recognize a number that can not be converted to a BigDecimal. * Add tests and fix spellcheck * update error message in ExpressionsTest * Address comments * fix up round for infinity * round non numeric doubles returns a double * fix spotbugs * Update docs/misc/math-expr.md * Update docs/querying/sql.md * global table datasource for broadcast segments (#10020) * global table datasource for broadcast segments * tests * fix * fix test * comments and javadocs * review stuffs * use generated equals and hashcode * API to verify a datasource has the latest ingested data (#9965) * API to verify a datasource has the latest ingested data * API to verify a datasource has the latest ingested data * API to verify a datasource has the latest ingested data * API to verify a datasource has the latest ingested data * API to verify a datasource has the latest ingested data * fix checksyle * API to verify a datasource has the latest ingested data * API to verify a datasource has the latest ingested data * API to verify a datasource has the latest ingested data * API to verify a datasource has the latest ingested data * fix spelling * address comments * fix checkstyle * update docs * fix tests * fix doc * address comments * fix typo * fix spelling * address comments * address comments * fix typo in docs * All aggregators should check if column can be vectorize (#10026) * All aggregators should use vectorization-aware column processor * All aggregators should use vectorization-aware column processor * fix canVectorize * fix canVectorize * add tests * revert back default * address comment * address comments * address comment * address comment * Druid Avatica - Handle escaping of search characters correctly (#10040) Fix Avatica based metadata queries by appending ESCAPE '\' clause to the LIKE expressions * IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty*()" (#9690) * IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty*()" * Reverted checkstyle rule * Added tests to pass CI * Codestyle * fix docs (#9114) Co-authored-by: tomscut <tomscut@gmail.com> * global table only if joinable (#10041) * global table if only joinable * oops * fix style, add more tests * Update sql/src/test/java/org/apache/druid/sql/calcite/schema/DruidSchemaTest.java * better information schema columns, distinguish broadcast from joinable * fix javadoc * fix mistake Co-authored-by: Jihoon Son <jihoonson@apache.org> * Coordinator loadstatus API full format does not consider Broadcast rules (#10048) * Coordinator loadstatus API full format does not consider Broadcast rules * address comments * fix checkstyle * minor optimization * address comments * Remove changes from #9114 (#10050) * Create packed core partitions for hash/range-partitioned segments in native batch ingestion (#10025) * Fill in the core partition set size properly for batch ingestion with dynamic partitioning * incomplete javadoc * Address comments * fix tests * fix json serde, add tests * checkstyle * Set core partition set size for hash-partitioned segments properly in batch ingestion * test for both parallel and single-threaded task * unused variables * fix test * unused imports * add hash/range buckets * some test adjustment and missing json serde * centralized partition id allocation in parallel and simple tasks * remove string partition chunk * revive string partition chunk * fill numCorePartitions for hadoop * clean up hash stuffs * resolved todos * javadocs * Fix tests * add more tests * doc * unused imports * Fix join filter rewrites with nested queries (#10015) * Fix join filter rewrites with nested queries * Fix test, inspection, coverage * Remove clauses from group key * Fix import order Co-authored-by: Gian Merlino <gianmerlino@gmail.com> * fix topn on string columns with non-sorted or non-unique dictionaries (#10053) * fix topn on string columns with non-sorted or non-unique dictionaries * fix metadata tests * refactor, clarify comments and code, fix ci failures * Add safeguard to make sure new Rules added are aware of Rule usage in loadstatus API (#10054) * Add safeguard to make sure new Rules added are aware of Rule usuage in loadstatus API * address comments * address comments * add tests * SketchAggregator.updateUnion should handle null inside List update object (#10055) * fix docs error in hadoop-based part (#9907) * fix docs error: google to azure and hdfs to http * fix docs error: indexSpecForIntermediatePersists of tuningConfig in hadoop-based batch part * fix docs error: logParseExceptions of tuningConfig in hadoop-based batch part * fix docs error: maxParseExceptions of tuningConfig in hadoop-based batch part * minor rework of topn algorithm selection for clarity and more javadocs (#10058) * minor refactor of topn engine algorithm selection for clarity * adjust * more javadoc * change default number of segment loading threads (#9856) * change default number of segment loading threads * fix docs * missed file * min -> max for segment loading threads Co-authored-by: Dylan <dwylie@spotx.tv> * retry 500 and 503 errors against kinesis (#10059) * retry 500 and 503 errors against kinesis * add test that exercises retry logic * more branch coverage * retry 500 and 503 on getRecords request when fetching sequence numberu Co-authored-by: Harshpreet Singh <hrshpr@twitch.tv> * Druid user permissions (#10047) * Druid user permissions apply in the console * Update index.md * noting user warning in console page; some minor shuffling * noting user warning in console page; some minor shuffling 1 * touchups * link checking fixes * Updated per suggestions * Fix HyperUniquesAggregatorFactory.estimateCardinality null handling to respect output type (#10063) * fix return type from HyperUniquesAggregator/HyperUniquesVectorAggregator * address comments * address comments * Enable query vectorization by default (#10065) * Enable query vectorization by default * update docs * Optimize protobuf parsing for flatten data (#9999) * optimize for protobuf parsing * fix import error and maven dependency * add unit test in protobufInputrowParserTest for flatten data * solve code duplication (remove the log and main()) * rename 'flatten' to 'flat' to make it clearer Co-authored-by: xionghuilin <xionghuilin@bytedance.com> * fix dimension names for jvm monitor metrics (#10071) * update avatica to handle additional character sets over jdbc (#10074) * update avatica to handle additional character sets over jdbc * update license yaml, fix test * oops * Fix balancer strategy (#10070) * fix server overassignment * fix random balancer strategy, add more tests * comment * added more tests * fix forbidden apis * fix typo * fix dropwizard emitter jvm bufferpoolName metric (#10075) * fix dropwizard emitter jvm bufferpoolName metric * fixes * Allow append to existing datasources when dynamic partitioning is used (#10033) * Fill in the core partition set size properly for batch ingestion with dynamic partitioning * incomplete javadoc * Address comments * fix tests * fix json serde, add tests * checkstyle * Set core partition set size for hash-partitioned segments properly in batch ingestion * test for both parallel and single-threaded task * unused variables * fix test * unused imports * add hash/range buckets * some test adjustment and missing json serde * centralized partition id allocation in parallel and simple tasks * remove string partition chunk * revive string partition chunk * fill numCorePartitions for hadoop * clean up hash stuffs * resolved todos * javadocs * Fix tests * add more tests * doc * unused imports * Allow append to existing datasources when dynamic partitioing is used * fix test * checkstyle * checkstyle * fix test * fix test * fix other tests.. * checkstyle * hansle unknown core partitions size in overlord segment allocation * fail to append when numCorePartitions is unknown * log * fix comment; rename to be more intuitive * double append test * cleanup complete(); add tests * fix build * add tests * address comments * checkstyle * Fix missing temp dir for native single_dim (#10046) * Fix missing temp dir for native single_dim Native single dim indexing throws a file not found exception from InputEntityIteratingReader.java:81. This MR creates the required temporary directory when setting up the PartialDimensionDistributionTask. The change was tested on a Druid cluster. After installing the change native single_dim indexing completes successfully. * Fix indentation * Use SinglePhaseSubTask as example for creating the temp dir * Move temporary indexing dir creation in to TaskToolbox * Remove unused dependency Co-authored-by: Morri Feldman <morri@appsflyer.com> * More prominent instructions on code coverage failure (#10060) * More prominent instructions on code coverage failure * Update .travis.yml * Add NonnullPair (#10013) * Add NonnullPair * new line * test * make it consistent * Add integration tests for SqlInputSource (#10080) * Add integration tests for SqlInputSource * make it faster * ensure ParallelMergeCombiningSequence closes its closeables (#10076) * ensure close for all closeables of ParallelMergeCombiningSequence * revert unneeded change * consolidate methods * catch throwable instead of exception * fix MaterializedView gropuby query return arry result by default (#9936) * fix bug:MaterializedView gropuby query return map result by default * add unit test * add unit test * add unit test * fix bug:MaterializedView gropuby query return map result by default * add unit test * add unit test * add unit test * update pr * update pr Co-authored-by: xiangqiao <xiangqiao@kuaishou.com> * Fix NPE when brokers use custom priority list (#9878) * fix query memory leak (#10027) * fix query memory leak * rollup ./idea * roll up .idea * clean code * optimize style * optimize cancel function * optimize style * add concurrentGroupTest test case * add test case * add unit test * fix code style * optimize cancell method use * format code * reback code * optimize cancelAll * clean code * add comment * Segment timeline doesn't show results older than 3 months (#9956) * Segment timeline doesn't show results older than 3 months * Adoption testing patch for web segment timeline view and also refactoring default time config * Filter http requests by http method (#10085) * Filter http requests by http method Add a config that allows a user which http methods to allow against their Druid server. Druid will only accept http requests with the method: GET, PUT, POST, DELETE and OPTIONS. If a Druid admin wants to allow other methods, they can do so by using the ServerConfig#allowedHttpMethods config. If a Druid user would like to disallow OPTIONS, this can be done by changing the AuthConfig#allowUnauthenticatedHttpOptions config * Exclude OPTIONS from always supported HTTP methods Add HEAD as an allowed method for web console e2e tests * fix docs * fix security IT * Actually fix the web console e2e tests * Ignore icode coverage for nitialization classes * code review * Move shardSpec tests to core (#10079) * Move shardSpec tests to core * checkstyle * inject object mapper for testing * unused import * Fix native batch range partition segment sizing (#10089) * Fix native batch range partition segment sizing Fixes #10057. Native batch range partitioning was only considering the partition dimension value when grouping rows instead of using all of the row's partition values. Thus, for schemas with multiple dimensions, the rollup was overestimated, which would cause too many dimension values to be packed into the same range partition. The resulting segments would then be overly large (and not honor the target or max partition sizes). Main changes: - PartialDimensionDistributionTask: Consider all dimension values when grouping row - RangePartitionMultiPhaseParallelIndexingTest: Regression test by having input with rows that should roll up and rows that should not roll up * Use hadoop & native hash ingestion row group key * Fix nullhandling exception (#10095) Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com> * Make 0.19 brokers compatible with 0.18 router (#10091) * Make brokers backwards compatible In 0.19, Brokers gained the ability to serve segments. To support this change, a `BROKER` ServerType was added to `druid.server.coordination`. Druid nodes prior to this change do not know of this new server type and so they would fail to deserialize this node's announcement. This change makes it so that the broker only announces itself if the segment cache is configured on the broker. It is expected that a Druid admin will only configure the segment cache on the broker once the cluster has been upgraded to a version that supports a broker using the segment cache. * make code nicer * Add tests * Ignore icode coverage for nitialization classes * Revert "Ignore icode coverage for nitialization classes" This reverts commit aeec0c2ac2b07c1b9262e32201913c7194167271. * code review * Correct the position of the double quotation in distinctcount.md file (#10094) ``` "dimensions": "[sample_dim]" ``` should be ``` "dimensions": ["sample_dim"] ``` * QueryCountStatsMonitor can be injected in the Peon (#10092) * QueryCountStatsMonitor can be injected in the Peon This change fixes a dependency injection bug where there is a circular dependency on getting the MonitorScheduler when a user configures the QueryCountStatsMonitor to be used. * fix tests * Actually fix the tests this time * Information schema doc update (#10081) * add docs for IS_JOINABLE and IS_BROADCAST to INFORMATION_SCHEMA docs * fixes * oops * revert noise * missed one * spellbot * Remove payload field from table sys.segment (#9883) * remove payload field from table sys.segments * update doc * fix test * fix CI failure * add necessary fields * fix doc * fix comment * Web console: allow link overrides for docs, and more (#10100) * link overrides * change doc version * fix snapshots * Enabling Static Imports for Unit Testing DSLs (#331) (#9764) * Enabling Static Imports for Unit Testing DSLs (#331) Co-authored-by: mohammadshoaib <mohammadshoaib@miqdigital.com> * Feature 8885 - Enabling Static Imports for Unit Testing DSLs (#435) * Enabling Static Imports for Unit Testing DSLs * Using suppressions checkstyle to allow static imports only in the UTs Co-authored-by: mohammadshoaib <mohammadshoaib@miqdigital.com> * Removing the changes in the checkstyle because those are not needed Co-authored-by: mohammadshoaib <mohammadshoaib@miqdigital.com> * Prevent unknown complex types from breaking DruidSchema refresh (#9422) * Update web address to datasketches.apache.org (#10096) * Join filter pre-analysis simplifications and sanity checks. (#10104) * Join filter pre-analysis simplifications and sanity checks. - At pre-analysis time, only compute pre-analysis for the innermost root query, since this is the one that will run on the join that involves the base datasource. Previously, pre-analyses were computed for multiple levels of the query, some of which were unnecessary. - Remove JoinFilterPreAnalysisGroup and join query level gathering code, since they existed to support precomputation of multiple pre-analyses. - Embed JoinFilterPreAnalysisKey into JoinFilterPreAnalysis and use it to sanity check at processing time that the correct pre-analysis was done. Tangentially related changes: - Remove prioritizeAndLaneQuery functionality from LocalQuerySegmentWalker. The computed priority and lanes were not being used. - Add "getBaseQuery" method to DataSourceAnalysis to support identification of the proper subquery for filter pre-analysis. * Fix compilation errors. * Adjust tests. * Filter on metrics doc (#10087) * add note about filter on metrics to filter docs * edit doc to include having and filtered aggregator links * Fix UnknownTypeComplexColumn#makeVectorObjectSelector * Fix RetryQueryRunner to actually do the job (#10082) * Fix RetryQueryRunner to actually do the job * more javadoc * fix test and checkstyle * don't combine for testing * address comments * fix unit tests * always initialize response context in cachingClusteredClient * fix subquery * address comments * fix test * query id for builders * make queryId optional in the builders and ClusterQueryResult * fix test * suppress tests and unused methods * exclude groupBy builder * fix jacoco exclusion * add tests for builders * address comments * don't truncate * Closing yielder from ParallelMergeCombiningSequence should trigger cancellation (#10117) * cancel parallel merge combine sequence on yielder close * finish incomplete comment * Update core/src/test/java/org/apache/druid/java/util/common/guava/ParallelMergeCombiningSequenceTest.java Fixes checkstyle Co-authored-by: Jihoon Son <jihoonson@apache.org> * Revert "Fix UnknownTypeComplexColumn#makeVectorObjectSelector" (#10121) This reverts commit 7bb7489afc7a2cc496be93ae69681b6ab13a7c66. * update links datasketches.github.io to datasketches.apache.org (#10107) * update links datasketches.github.io to datasketches.apache.org * now with more apache * oops * oops * Fix Stack overflow with infinite loop in ReduceExpressionsRule of HepProgram (#10120) * Fix Stack overflow with SELECT ARRAY ['Hello', NULL] * address comments * fixes for ranger docs (#10109) * Fix UnknownComplexTypeColumn#makeVectorObjectSelector. Add a warning … (#10123) * Fix UnknownComplexTypeColumn#makeVectorObjectSelector. Add a warning message to indicate failure in deserializing. * support Aliyun OSS service as deep storage (#9898) * init commit, all tests passed * fix format Signed-off-by: frank chen <frank.chen021@outlook.com> * data stored successfully * modify config path * add doc * add aliyun-oss extension to project * remove descriptor deletion code to avoid warning message output by aliyun client * fix warnings reported by lgtm-com * fix ci warnings Signed-off-by: frank chen <frank.chen021@outlook.com> * fix errors reported by intellj inspection check Signed-off-by: frank chen <frank.chen021@outlook.com> * fix doc spelling check Signed-off-by: frank chen <frank.chen021@outlook.com> * fix dependency warnings reported by ci Signed-off-by: frank chen <frank.chen021@outlook.com> * fix warnings reported by CI Signed-off-by: frank chen <frank.chen021@outlook.com> * add package configuration to support showing extension info Signed-off-by: frank chen <frank.chen021@outlook.com> * add IT test cases and fix bugs Signed-off-by: frank chen <frank.chen021@outlook.com> * 1. code review comments adopted 2. change schema from 'aliyun-oss' to 'oss' Signed-off-by: frank chen <frank.chen021@outlook.com> * add license info Signed-off-by: frank chen <frank.chen021@outlook.com> * fix doc Signed-off-by: frank chen <frank.chen021@outlook.com> * exclude execution of IT testcases of OSS extension from CI Signed-off-by: frank chen <frank.chen021@outlook.com> * put the extensions under contrib group and add to distribution * fix names in test cases * add unit test to cover OssInputSource * fix names in test cases * fix dependency problem reported by CI Signed-off-by: frank chen <frank.chen021@outlook.com> * Clarify change in behavior for druid.server.maxSize (#10105) * Clarify maxSize docs * Add info about maxSize Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com> * Add DimFilter.toOptimizedFilter(), ensure that join filter pre-analysis operates on optimized filters (#10056) * Ensure that join filter pre-analysis operates on optimized filters, add DimFilter.toOptimizedFilter * Remove aggressive equality check that was used for testing * Use Suppliers.memoize * Checkstyle * Fix CachingClusteredClient when querying specific segments (#10125) * Fix CachingClusteredClient when querying specific segments * delete useless test * roll back timeout * Remove unsupported task types in doc (#10111) * VersionedIntervalTimeline: Fix thread-unsafe call to "lookup". (#10130) * bump version to 0.20.0-SNAPSHOT (#10124) * AbstractOptimizableDimFilter should be public (#10142) * mask secrets in MM task command log (#10128) * mask secrets in MM task command log * unit test for masked iterator * checkstyle fix * Update Jetty to 9.4.30.v20200611. (#10098) * Update Jetty to 9.4.30.v20200611. This is the latest version currently available in the 9.4.x line. * Various adjustments. * Class name fixes. * Remove unused HttpClientModule code. * Add coverage suppressions. * Another coverage suppression. * Fix wildcards. * ui: fix missing columns during Transform step (#10086) Co-authored-by: egor-ryashin <egor.ryashin@metamarkets.com> * Add availability and consistency docs. (#10149) * Add availability and consistency docs. Describes transactional ingestion and atomic replacement. Also, this patch deletes some bad advice from the javadocs for SegmentTransactionalInsertAction. * Fix missing word. * Update dictionary for spell check (#10152) * Fix avg sql aggregator (#10135) * new average aggregator * method to create count aggregator factory * test everything * update other usages * fix style * fix more tests * fix datasketches tests * Reduce memory footprint of integration test by not starting unneeded containers (#10150) * Reduce memory footprint of integration test * fix README * fix README * fix error in script * fix security IT * Add integration tests for all InputFormat (#10088) * Add integration tests for Avro OCF InputFormat * Add integration tests for Avro OCF InputFormat * add tests * fix bug * fix bug * fix failing tests * add comments * address comments * address comments * address comments * fix test data * reduce resource needed for IT * remove bug fix * fix checkstyle * add bug fix * Follow-up for RetryQueryRunner fix (#10144) * address comments; use guice instead of query context * typo * QueryResource tests * address comments * catch queryException * fix spell check * Fix documentation for Kinesis fetchThreads. (#10156) * Fix documentation for Kinesis fetchThreads The default was changed in #9819, but the documentation wasn't updated. * Add 'procs' to spelling. * renamed authenticationChain to authenticatorChain (#10143) * Fix flaky tests in DruidCoordinatorTest (#10157) * Fix flaky tests in DruidCoordinatorTest * Imporve fail msg * Fix flaky tests in DruidCoordinatorTest * Update ambari-metrics-common to version 2.6.1.0.0 (#10165) * Switch to apache version of ambari-metrics-common * Add test * Fix intellij inspection * Fix intellij inspection * Do not echo back username on auth failure (#10097) * Do not echo back username on auth failure * use bad username * Remove username from exception messages * fix tests * fix the tests * hopefully this time * this time the tests work * fixed this time * fix * upgrade to Jetty 9.4.30 * Unknown users echo back Unauthorized * fix * fix website build (#10172) * fix mvn website build to use mvn supplied nodejs, fix broken redirects, move block from custom.css to custom.scss so will be correctly generated * sidebar * fix lol * split web-console e2e-tests from unit tests (#10173) * split web-console e2e-test from unit test * fix stuff * smaller change * oops * Fix formatting in druid-pac4j documentation (#10174) Superfluous column broke table formatting. * Add additional properties for Kafka AdminClient and consumer from test config file (#10137) * Add kafka test configs from file for AdminClient and consumer * review comment * Add groupBy limitSpec to queryCache key (#10093) * Add groupBy limitSpec to queryCache key * Only add limitSpec to cache key if pushdown is set to true * review comment * Add validation for authenticator and authorizer name (#10106) * Add validation for authorizer name * fix deps * add javadocs * Do not use resource filters * Fix BasicAuthenticatorResource as well * Add integration tests * fix test * fix * JettyTest.testNumConnectionsMetricHttp is rarely flaky (#10169) * Change color of Run button for native queries (#10170) * Change color of Run button for native queries When a user tries to run a native query, change the color of the button to Druid's secondary color to indicate that the user is not running a SQL query. Before this change, the web-console would indicate this by changing the text of the button from Run (SQL queries) to Rune (native queries). Rune could be confusing to users as this appears to be a typo. * Update web-console/src/views/query-view/run-button/run-button.scss * Update web-console/src/views/query-view/run-button/run-button.scss * Update web-console/src/views/query-view/run-button/run-button.scss * code review Co-authored-by: Clint Wylie <cwylie@apache.org> Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> Co-authored-by: calvinhkf <calvinkfh@gmail.com> Co-authored-by: Maytas Monsereenusorn <52679095+maytasm@users.noreply.github.com> Co-authored-by: Gian Merlino <gianmerlino@gmail.com> Co-authored-by: BIGrey <huanghui0143@163.com> Co-authored-by: huanghui.bigrey <huanghui.bigrey@bytedance.com> Co-authored-by: Suneet Saldanha <suneet.saldanha@imply.io> Co-authored-by: Jihoon Son <jihoonson@apache.org> Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> Co-authored-by: Francesco Nidito <11637948+frnidito@users.noreply.github.com> Co-authored-by: James Dalton <tarpdalton@users.noreply.github.com> Co-authored-by: Aleksei Chumagin <a-chumagin@users.noreply.github.com> Co-authored-by: Suneet Saldanha <44787917+suneet-s@users.noreply.github.com> Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Aleksey Plekhanov <Plehanov.Alex@gmail.com> Co-authored-by: Jian Wang <wjhypo@gmail.com> Co-authored-by: Alexander Saydakov <13126686+AlexanderSaydakov@users.noreply.github.com> Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com> Co-authored-by: mcbrewster <37322608+mcbrewster@users.noreply.github.com> Co-authored-by: awelsh93 <32643586+awelsh93@users.noreply.github.com> Co-authored-by: Chi Cao Minh <chi.caominh@imply.io> Co-authored-by: zachjsh <zachjsh@gmail.com> Co-authored-by: Joseph Glanville <jpg@jpg.id.au> Co-authored-by: Samarth Jain <samarth@apache.org> Co-authored-by: Jianhuan Liu <hemin179@163.com> Co-authored-by: Furkan KAMACI <furkankamaci@gmail.com> Co-authored-by: frank chen <frank.chen021@outlook.com> Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com> Co-authored-by: Xavier Léauté <xvrl@apache.org> Co-authored-by: Surekha <surekha.saharan@imply.io> Co-authored-by: agricenko <alexuncon@gmail.com> Co-authored-by: agritsenko <agritsenko@provectus.com> Co-authored-by: Mainak Ghosh <mghosh@twitter.com> Co-authored-by: Maytas Monsereenusorn <52679095+maytasm3@users.noreply.github.com> Co-authored-by: Yuanli Han <44718283+yuanlihan@users.noreply.github.com> Co-authored-by: Lucas Capistrant <capistrant@users.noreply.github.com> Co-authored-by: Atul Mohan <atulmohan.mec@gmail.com> Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com> Co-authored-by: Stefan Birkner <github@stefan-birkner.de> Co-authored-by: danc <danc@users.noreply.github.com> Co-authored-by: Stefan Birkner <mail@stefan-birkner.de> Co-authored-by: litao <55134131+tomscut@users.noreply.github.com> Co-authored-by: tomscut <tomscut@gmail.com> Co-authored-by: Maytas Monsereenusorn <maytasm@apache.org> Co-authored-by: Suneet Saldanha <suneet@apache.org> Co-authored-by: Dylan Wylie <dylanwylie@gmail.com> Co-authored-by: Dylan <dwylie@spotx.tv> Co-authored-by: Harshpreet Singh <singhharshpreet1993@gmail.com> Co-authored-by: Harshpreet Singh <hrshpr@twitch.tv> Co-authored-by: xhl0726 <1037989035@qq.com> Co-authored-by: xionghuilin <xionghuilin@bytedance.com> Co-au…

Atul Mohan added 2 commits March 2, 2020 15:26

Add Sql InputSource

64b2475

Add spelling

61f62fe

jihoonson added the Area - Batch Ingestion label Mar 3, 2020

nishantmonu51 approved these changes Apr 13, 2020

View reviewed changes

pjain1 reviewed Apr 28, 2020

View reviewed changes

Atul Mohan added 3 commits May 5, 2020 11:24

Merge branch 'master' of https://github.com/druid-io/druid into sqlin…

23a7c41

…putsource

Use separate DruidModule

d1185e4

Change module name

7f9743b

pjain1 approved these changes May 6, 2020

View reviewed changes

jihoonson reviewed May 7, 2020

View reviewed changes

suneet-s requested changes May 7, 2020

View reviewed changes

Atul Mohan added 4 commits May 25, 2020 07:52

Merge branch 'master' of https://github.com/druid-io/druid into sqlin…

b379d19

…putsource

Fix docs

6333234

Merge branch 'master' of https://github.com/druid-io/druid into sqlin…

af4d5fa

…putsource

Use sqltestutils for tests

0328ead

jihoonson reviewed May 31, 2020

View reviewed changes

Atul Mohan added 4 commits June 1, 2020 08:41

Merge branch 'master' of https://github.com/druid-io/druid into sqlin…

03de8c0

…putsource

Add additional tests

62dc00b

Merge branch 'master' of https://github.com/druid-io/druid into sqlin…

7869ea9

…putsource

Fix inspection

53cb148

jihoonson approved these changes Jun 3, 2020

View reviewed changes

suneet-s reviewed Jun 3, 2020

View reviewed changes

Atul Mohan added 3 commits June 4, 2020 12:18

Add module test

eefc292

Merge branch 'master' of https://github.com/druid-io/druid into sqlin…

dd9d5bb

…putsource

Fix md in docs

fe731f7

suneet-s reviewed Jun 5, 2020

View reviewed changes

Comment thread server/src/main/java/org/apache/druid/metadata/SQLMetadataStorageActionHandler.java Outdated

Atul Mohan added 2 commits June 5, 2020 08:27

Merge branch 'master' of https://github.com/druid-io/druid into sqlin…

08b6626

…putsource

Remove annotation

9550f53

suneet-s closed this Jun 9, 2020

suneet-s reopened this Jun 9, 2020

suneet-s mentioned this pull request Jun 9, 2020

Add integration tests for SqlInputSource #10009

Closed

suneet-s approved these changes Jun 9, 2020

View reviewed changes

suneet-s merged commit 17cf8ea into apache:master Jun 9, 2020

maytasm mentioned this pull request Jun 26, 2020

Add integration tests for SqlInputSource #10080

Merged

8 tasks

clintropolis added the Release Notes label Jun 26, 2020

clintropolis added this to the 0.19.0 milestone Jun 26, 2020

clintropolis mentioned this pull request Jul 6, 2020

[Draft] Druid 0.19.0 Release Notes #10139

Closed


		* During indexing, each sub-task would execute one of the SQL queries and the results are stored locally on disk. The sub-tasks then proceed to read the data from these local input files and generate segments. Presently, there isn’t any restriction on the size of the generated files and this would require the MiddleManagers or Indexers to have sufficient disk capacity based on the volume of data being indexed.

		* Filtering the SQL queries based on the intervals specified in the `granularitySpec` can avoid unwanted data being retrieved and stored locally by the indexing sub-tasks.

Conversation

a2l007 commented Mar 2, 2020

Uh oh!

vogievetsky commented Mar 5, 2020

Uh oh!

a2l007 commented Mar 5, 2020

Uh oh!

nishantmonu51 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

suneet-s commented May 6, 2020

Uh oh!

a2l007 commented May 7, 2020

Uh oh!

jihoonson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jihoonson May 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

suneet-s left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

2bethere commented May 8, 2020

Uh oh!

a2l007 commented May 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jihoonson left a comment

jihoonson May 7, 2020 •

edited

Loading

a2l007 commented May 29, 2020 •

edited

Loading

suneet-s left a comment •

edited

Loading