Skip to content

Use file channel in smoosher to minimize copy overheads#46

Merged
leventov merged 6 commits intometamx:masterfrom
navis:use-filechannel-directly
Oct 5, 2016
Merged

Use file channel in smoosher to minimize copy overheads#46
leventov merged 6 commits intometamx:masterfrom
navis:use-filechannel-directly

Conversation

@navis
Copy link
Copy Markdown
Contributor

@navis navis commented Apr 7, 2016

Current write(ByteBuffer buffer) implementation of smoother is something like this,

Channels.newChannel(out).write(buffer)

And agagin,

public static ReadableByteChannel newChannel(final InputStream in) {
        if (in instanceof FileInputStream &&
            FileInputStream.class.equals(in.getClass())) {
            return ((FileInputStream)in).getChannel();
        }
        return new ReadableByteChannelImpl(in);
    }

but because out is a BufferedOuputStream, this call always makes wrapper which copies ByteBuffer handed over to byte[], negating good intention of using it. I think it would be better to use FileChannel directly.

@navis navis force-pushed the use-filechannel-directly branch from 9a12346 to 658822b Compare April 7, 2016 06:36
File[] files = baseDir.listFiles();
Assert.assertEquals(1, files.length);
Assert.assertEquals(0, files[0].length());
Assert.assertEquals(4, files[0].length());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navis any reason this is changing?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was buffered stream before but is not, making that difference. I've implemented buffered channel and it's restored as before.

@navis navis force-pushed the use-filechannel-directly branch from b33d1c7 to 3470a80 Compare April 8, 2016 07:24
open = false;
out.close();
if (buffer.position() != 0) {
flush();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if flush throws IOException, does close() still need called?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm.. good.

public int write(InputStream in) throws IOException
{
return addToOffset(Ints.checkedCast(ByteStreams.copy(in, out)));
return addToOffset(ByteStreams.copy(Channels.newChannel(in), channel));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double buffering here. Channels.newChannel() returns a buffered channel, and then ByteStreams.copy() uses a buffer internally. Made google/guava#2559 about this but it's faster and easier to fix internally in the meantime

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've neglected this because this is not used in Druid. But seemed good this to be fixed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, could be a separate PR. Also found this idiom (ByteStream.copy(Channels.newChannel(), ...) is used in Druid in several places, so it's more a Druid's issue than java-util's

@leventov
Copy link
Copy Markdown
Contributor

@navis do you have time to address the comments? I can do this myself and make a different PR based on your PR, but if it is merged with squash your authorship will be lost.

@navis navis force-pushed the use-filechannel-directly branch from 5b0684e to f11aaf2 Compare September 26, 2016 00:38
@leventov leventov closed this Sep 27, 2016
@leventov leventov reopened this Sep 27, 2016
@leventov
Copy link
Copy Markdown
Contributor

@navis could you please make two things?

  1. Remove channel buffering: it causes deterministicFileUnmapping() test to fail now, because it 1) allocates a direct buffer 2) duplicates mapped buffers.
  2. Close FOS in Outer as suggested here: Use file channel in smoosher to minimize copy overheads #46 (comment)

@leventov
Copy link
Copy Markdown
Contributor

leventov commented Oct 4, 2016

And after those changes SmooshedFileMapperTest.testBehaviorWhenReportedSizesSmall() will fail. The last assertion in this test, namely Assert.assertEquals(0, files[0].length()); should simply be removed because it seems to rely on FileSmoosher implementation details. Having or not some bytes written to the last smooshed file after ISE is thrown doesn't seem to be an important difference, because metadata is not written anyway and this smooshed dir couldn't be read later.

@leventov
Copy link
Copy Markdown
Contributor

leventov commented Oct 4, 2016

@navis do you have capacity for doing this?

@navis
Copy link
Copy Markdown
Contributor Author

navis commented Oct 5, 2016

@leventov Addressed comments

@leventov leventov merged commit 3c7b49c into metamx:master Oct 5, 2016
@leventov
Copy link
Copy Markdown
Contributor

leventov commented Oct 5, 2016

@navis Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants