Allow Hadoop dataSource inputSpec to be specified multiple times.#5717
Allow Hadoop dataSource inputSpec to be specified multiple times.#5717fjy merged 2 commits intoapache:masterfrom
Conversation
| for (int i = 0; i < segments.size(); i++) { | ||
| final WindowedDataSegment segment = segments.get(i); | ||
| logger.info( | ||
| "Segment %,d/%,d for dataSource[%s] has identifier[%s], interval[%s]", |
There was a problem hiding this comment.
This is replacing an older log message that was also at info level so I thought it made sense to keep it that way.
There was a problem hiding this comment.
i found it not very very useful to log all the segments especially if dealing with more than dozen but it is okay.
|
👍 |
| #### `multi` | ||
|
|
||
| This is a composing inputSpec to combine other inputSpecs. This inputSpec is used for delta ingestion. Please note that you can have only one `dataSource` as child of `multi` inputSpec. | ||
| This is a composing inputSpec to combine other inputSpecs. This inputSpec is used for delta ingestion. You can also use a `multi` inputSpec to combine data from multiple dataSources. However, each particular dataSource can only be specified one time. |
There was a problem hiding this comment.
each particular dataSource can only be specified one time
Out of curiosity, do we have this restriction before this PR? There should be no issue for production environments, but I wonder why this restriction is needed.
There was a problem hiding this comment.
Before this PR, you could only do one dataSource, so yeah we still had this restriction.
It's not really strictly necessary to have this restriction - I guess we could support adding multiple input specs referring to the same dataSource, but it would make the code more complicated and I didn't see a clear use case.
…ache#5717) * Allow Hadoop dataSource inputSpec to be specified multiple times. * Fix test
…ache#5717) * Allow Hadoop dataSource inputSpec to be specified multiple times. * Fix test
This feature was introduced in apache#5717 but it didn't work in production because this magical rewriter code wasn't also modified. Now, it is.
…apache#5790) This feature was introduced in apache#5717 but it didn't work in production because this magical rewriter code wasn't also modified. Now, it is.
…apache#5790) This feature was introduced in apache#5717 but it didn't work in production because this magical rewriter code wasn't also modified. Now, it is.
…ache#5717) * Allow Hadoop dataSource inputSpec to be specified multiple times. * Fix test
…apache#5790) This feature was introduced in apache#5717 but it didn't work in production because this magical rewriter code wasn't also modified. Now, it is.
…apache#5790) (apache#5942) This feature was introduced in apache#5717 but it didn't work in production because this magical rewriter code wasn't also modified. Now, it is.
It's useful for combining two datasources with similar schemas into one.