Skip to content

Remove unnecessary collection #7350

Merged
leventov merged 9 commits intoapache:masterfrom
surekhasaharan:use-list-instead-of-set
Apr 15, 2019
Merged

Remove unnecessary collection #7350
leventov merged 9 commits intoapache:masterfrom
surekhasaharan:use-list-instead-of-set

Conversation

@surekhasaharan
Copy link
Copy Markdown

@surekhasaharan surekhasaharan commented Mar 27, 2019

From the discussion here

Remove the collection and filter datasources from the stream.
Also remove StreamingOutput and JsonFactory constructs.

druidDataSources = druidDataSources.stream()
.filter(src -> datasources.contains(src.getName()))
.collect(Collectors.toSet());
.collect(Collectors.toList());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if we don't collect at all? can you just return a Stream and have jackson handle the lazy array materialization? I think you can avoid the odd output stream and json factory stuff if jackson supports such a thing.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@surekhasaharan please add a comment explaining that getDataSources() returns unique data sources, so expensive collection to Set is not required.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drcrallen this collection happens only in case a non-null datasources query param is passed, else this part is skipped. This api already returns a stream, I think even if I remove the output stream and json factory stuff, this particular collection would remain, as it's used to filter out the datasources. The outputStream was added here, when I was writing the original code, and IIRC, I was getting some json format exceptions in JsonParserIterator.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the collection is needed. The only place it is used is to immediately call .stream() on the next line. IMHO the more functional refactoring would be to produce Stream<DataSegment> metadataSegments by optionally applying a filter rather than trying to manage the Collections.

Not a blocker from my side, but I would like to see a conscious choice around it rather than letting it slide for a small patch

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drcrallen fair point, I didn't see that it can be just filtered while creating Stream<DataSegment> metadataSegments, will update, then we don't need this collection. thanks.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I got rid of outputstream and jsonfactory, tested it in local cluster and it seems to work, will keep it this way unless it breaks something.

@surekhasaharan surekhasaharan changed the title Collect in List instead of Set Remove unnecessary collection Apr 2, 2019
@surekhasaharan
Copy link
Copy Markdown
Author

@leventov @drcrallen do you have any more comments on this ?

.stream()
.filter((datasources != null && !datasources.isEmpty())
? src -> datasources.contains(src.getName())
: src -> true)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is not readable.

Please extract a stream variable and apply a filter only if the condition applies.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, tried to simplify with your suggestions.

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@leventov leventov merged commit 4654e1e into apache:master Apr 15, 2019
@jihoonson jihoonson added this to the 0.15.0 milestone May 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants