Remove unnecessary collection #7350
Conversation
| druidDataSources = druidDataSources.stream() | ||
| .filter(src -> datasources.contains(src.getName())) | ||
| .collect(Collectors.toSet()); | ||
| .collect(Collectors.toList()); |
There was a problem hiding this comment.
what happens if we don't collect at all? can you just return a Stream and have jackson handle the lazy array materialization? I think you can avoid the odd output stream and json factory stuff if jackson supports such a thing.
There was a problem hiding this comment.
@surekhasaharan please add a comment explaining that getDataSources() returns unique data sources, so expensive collection to Set is not required.
There was a problem hiding this comment.
@drcrallen this collection happens only in case a non-null datasources query param is passed, else this part is skipped. This api already returns a stream, I think even if I remove the output stream and json factory stuff, this particular collection would remain, as it's used to filter out the datasources. The outputStream was added here, when I was writing the original code, and IIRC, I was getting some json format exceptions in JsonParserIterator.
There was a problem hiding this comment.
I don't think the collection is needed. The only place it is used is to immediately call .stream() on the next line. IMHO the more functional refactoring would be to produce Stream<DataSegment> metadataSegments by optionally applying a filter rather than trying to manage the Collections.
Not a blocker from my side, but I would like to see a conscious choice around it rather than letting it slide for a small patch
There was a problem hiding this comment.
@drcrallen fair point, I didn't see that it can be just filtered while creating Stream<DataSegment> metadataSegments, will update, then we don't need this collection. thanks.
There was a problem hiding this comment.
Also I got rid of outputstream and jsonfactory, tested it in local cluster and it seems to work, will keep it this way unless it breaks something.
Also get rid of StreamingOutput and JsonFactory
|
@leventov @drcrallen do you have any more comments on this ? |
| .stream() | ||
| .filter((datasources != null && !datasources.isEmpty()) | ||
| ? src -> datasources.contains(src.getName()) | ||
| : src -> true) |
There was a problem hiding this comment.
This code is not readable.
Please extract a stream variable and apply a filter only if the condition applies.
There was a problem hiding this comment.
okay, tried to simplify with your suggestions.
From the discussion here
Remove the collection and filter datasources from the stream.
Also remove StreamingOutput and JsonFactory constructs.