Skip to content

Allow Hadoop dataSource inputSpec to be specified multiple times.#5717

Merged
fjy merged 2 commits intoapache:masterfrom
gianm:multi-datasource-hadoop
May 3, 2018
Merged

Allow Hadoop dataSource inputSpec to be specified multiple times.#5717
fjy merged 2 commits intoapache:masterfrom
gianm:multi-datasource-hadoop

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented Apr 30, 2018

It's useful for combining two datasources with similar schemas into one.

for (int i = 0; i < segments.size(); i++) {
final WindowedDataSegment segment = segments.get(i);
logger.info(
"Segment %,d/%,d for dataSource[%s] has identifier[%s], interval[%s]",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about debug?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is replacing an older log message that was also at info level so I thought it made sense to keep it that way.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i found it not very very useful to log all the segments especially if dealing with more than dozen but it is okay.

@b-slim
Copy link
Copy Markdown
Contributor

b-slim commented May 3, 2018

👍

#### `multi`

This is a composing inputSpec to combine other inputSpecs. This inputSpec is used for delta ingestion. Please note that you can have only one `dataSource` as child of `multi` inputSpec.
This is a composing inputSpec to combine other inputSpecs. This inputSpec is used for delta ingestion. You can also use a `multi` inputSpec to combine data from multiple dataSources. However, each particular dataSource can only be specified one time.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each particular dataSource can only be specified one time

Out of curiosity, do we have this restriction before this PR? There should be no issue for production environments, but I wonder why this restriction is needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, you could only do one dataSource, so yeah we still had this restriction.

It's not really strictly necessary to have this restriction - I guess we could support adding multiple input specs referring to the same dataSource, but it would make the code more complicated and I didn't see a clear use case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, sounds good.

@fjy fjy merged commit 739e347 into apache:master May 3, 2018
@fjy fjy modified the milestones: 0.12.1, 0.13.0 May 3, 2018
@gianm gianm deleted the multi-datasource-hadoop branch May 4, 2018 20:45
sathishsri88 pushed a commit to sathishs/druid that referenced this pull request May 8, 2018
…ache#5717)

* Allow Hadoop dataSource inputSpec to be specified multiple times.

* Fix test
gianm added a commit to implydata/druid-public that referenced this pull request May 11, 2018
…ache#5717)

* Allow Hadoop dataSource inputSpec to be specified multiple times.

* Fix test
gianm added a commit to gianm/druid that referenced this pull request May 22, 2018
This feature was introduced in apache#5717 but it didn't work in production
because this magical rewriter code wasn't also modified. Now, it is.
nishantmonu51 pushed a commit that referenced this pull request May 22, 2018
…#5790)

This feature was introduced in #5717 but it didn't work in production
because this magical rewriter code wasn't also modified. Now, it is.
gianm added a commit to implydata/druid-public that referenced this pull request May 23, 2018
…apache#5790)

This feature was introduced in apache#5717 but it didn't work in production
because this magical rewriter code wasn't also modified. Now, it is.
gianm added a commit to implydata/druid-public that referenced this pull request May 23, 2018
…apache#5790)

This feature was introduced in apache#5717 but it didn't work in production
because this magical rewriter code wasn't also modified. Now, it is.
gianm added a commit to implydata/druid-public that referenced this pull request Jun 7, 2018
…ache#5717)

* Allow Hadoop dataSource inputSpec to be specified multiple times.

* Fix test
jihoonson pushed a commit to jihoonson/druid that referenced this pull request Jul 5, 2018
…apache#5790)

This feature was introduced in apache#5717 but it didn't work in production
because this magical rewriter code wasn't also modified. Now, it is.
fjy pushed a commit that referenced this pull request Jul 5, 2018
…#5790) (#5942)

This feature was introduced in #5717 but it didn't work in production
because this magical rewriter code wasn't also modified. Now, it is.
leventov pushed a commit to metamx/druid that referenced this pull request Jul 20, 2018
…apache#5790) (apache#5942)

This feature was introduced in apache#5717 but it didn't work in production
because this magical rewriter code wasn't also modified. Now, it is.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants