Skip to content

DruidInputSource can add new dimensions during re-ingestion#9590

Merged
ccaominh merged 15 commits intoapache:masterfrom
suneet-s:transform-reindex
Apr 3, 2020
Merged

DruidInputSource can add new dimensions during re-ingestion#9590
ccaominh merged 15 commits intoapache:masterfrom
suneet-s:transform-reindex

Conversation

@suneet-s
Copy link
Copy Markdown
Contributor

@suneet-s suneet-s commented Mar 31, 2020

Fixes #9593 #9592

Description

Add integration tests for using transformSpec during ingestion. The tests test transformSpecs that add metrics and dimensions using both a Druid InputSource and the deprecated ingestSegmentFirehose.

Writing these tests surfaced #9589 #9591 When these issues are fixed, those tests can be re-enabled.

@suneet-s suneet-s changed the title Transform reindex Add integration test for ingestion with transformSpec Mar 31, 2020
"version" : "v1",
"timestamp" : "2013-08-31T00:00:00.000Z",
"event" : {
"added_count_times_ten" : 9050.0,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in expected results

Suggested change
"added_count_times_ten" : 9050.0,
"added_count_times_ten" : 27150.0,

@suneet-s suneet-s changed the title Add integration test for ingestion with transformSpec Add integration tests for ingestion with transformSpec Mar 31, 2020
@maytasm
Copy link
Copy Markdown
Contributor

maytasm commented Mar 31, 2020

Can you parameterize all the query and index json files? I think it's much harder to keep track of many many many different query and index json files we have for IT. Especially when they are 80-90% the same.

@suneet-s
Copy link
Copy Markdown
Contributor Author

Can you parameterize all the query and index json files? I think it's much harder to keep track of many many many different query and index json files we have for IT. Especially when they are 80-90% the same.

Across all AbstractITBatchIndexTest or within the new tests I've added in ITTransformTest?

It's hard to parameterize ITTransformTest because it's broken for most combinations. And there are 2 categories of tests. I think properly parameterizing these tests requires a re-think of what the test environment should look like because all the re-indexing tests start by indexing a datasource which slows down the test execution.

@maytasm
Copy link
Copy Markdown
Contributor

maytasm commented Mar 31, 2020

Can you parameterize all the query and index json files? I think it's much harder to keep track of many many many different query and index json files we have for IT. Especially when they are 80-90% the same.

Across all AbstractITBatchIndexTest or within the new tests I've added in ITTransformTest?

It's hard to parameterize ITTransformTest because it's broken for most combinations. And there are 2 categories of tests. I think properly parameterizing these tests requires a re-think of what the test environment should look like because all the re-indexing tests start by indexing a datasource which slows down the test execution.

I am thinking of parameterizing the json files themself across the new json files you added

@suneet-s
Copy link
Copy Markdown
Contributor Author

I am thinking of parameterizing the json files themself across the new json files you added

I think parameterizing the json files is a great idea, but there's a lot of tight coupling between the json files and the base IT classes. I'm not sure where the best place to do this is because of that.

I'd rather do the parameterization in a separate change - when #9591 #9592 or #9589 are fixed

@suneet-s suneet-s changed the title Add integration tests for ingestion with transformSpec DruidInputSource can add new dimensions during re-ingestion Mar 31, 2020
@jihoonson jihoonson added the Bug label Apr 1, 2020
Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

"filter" : "transform-data.json"
},
"inpuFormat" : {
"inputFormat" : {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

* @param explicitDimensions sent as part of the re-ingestion InputSource.
* @param dimensionsSpec from the provided ingestion spec.
* @param timeLineSegments for the datasource that is being read.
* @return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please add description for the return value or remove this.

@ccaominh ccaominh merged commit af3337d into apache:master Apr 3, 2020
suneet-s added a commit to suneet-s/druid that referenced this pull request Apr 3, 2020
)

* WIP integration tests

* Add integration test for ingestion with transformSpec

* WIP almost working tests

* Add ignored tests

* checkstyle stuff

* remove newPage from index task ingestion spec

* more test cleanup

* still not quite working

* Actually disable the tests

* working tests

* fix codestyle

* dont use junit in integration tests

* actually fix the bug

* fix checkstyle

* bring index tests closer to reindex tests
@suneet-s suneet-s deleted the transform-reindex branch April 3, 2020 05:17
@jihoonson jihoonson added this to the 0.18.0 milestone Apr 3, 2020
jihoonson pushed a commit that referenced this pull request Apr 3, 2020
…9590) (#9606)

* DruidInputSource can add new dimensions during re-ingestion (#9590)

* WIP integration tests

* Add integration test for ingestion with transformSpec

* WIP almost working tests

* Add ignored tests

* checkstyle stuff

* remove newPage from index task ingestion spec

* more test cleanup

* still not quite working

* Actually disable the tests

* working tests

* fix codestyle

* dont use junit in integration tests

* actually fix the bug

* fix checkstyle

* bring index tests closer to reindex tests

* Remove QUICKSTART_COMPATIBLE
JulianJaffePinterest pushed a commit to JulianJaffePinterest/druid that referenced this pull request Jun 12, 2020
)

* WIP integration tests

* Add integration test for ingestion with transformSpec

* WIP almost working tests

* Add ignored tests

* checkstyle stuff

* remove newPage from index task ingestion spec

* more test cleanup

* still not quite working

* Actually disable the tests

* working tests

* fix codestyle

* dont use junit in integration tests

* actually fix the bug

* fix checkstyle

* bring index tests closer to reindex tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can not add a column when re-indexing a datasource using DruidInputSource

5 participants