Skip to content

Refinements to filesystem and mmap#981

Closed
ting-yuan wants to merge 9 commits intoemscripten-core:incomingfrom
ting-yuan:incoming
Closed

Refinements to filesystem and mmap#981
ting-yuan wants to merge 9 commits intoemscripten-core:incomingfrom
ting-yuan:incoming

Conversation

@ting-yuan
Copy link
Copy Markdown

This is a DRAFT pull request that aims at demonstrating the ideas and making discussions more concrete. There could be lots of refinements to be done. Would you please give me some suggestions?

One known problem is that valloc() is sometimes not included. This breaks the testcase: test_mmap_file. Should we always include dlmalloc or write a simple static valloc()?

@kripken
Copy link
Copy Markdown
Member

kripken commented Mar 25, 2013

A few comments:

  1. valloc aligns to the page size I think? Not sure that is necessary for us, normal malloc should be better.
  2. Can you explain the mmap approach here from a high level?

@ting-yuan
Copy link
Copy Markdown
Author

Using malloc() for aligned address requires the extra space for manual padding. With a good allocator, valloc() could be more space efficient. With the static allocator, the implementation of valloc() is almost the same to malloc().

The mmap() first checks if the underlying file resides in HEAP and if it is mapped for MAP_SHARED.

  • If so, the byteOffset of the file, which is the pointer to the buffer of the file in terms of C,
    can be returned immediately and the subsequent memory operations are done directly on the file.
    This is the behavior specified by POSIX.
  • If it is MAP_PRIVATE, the implementation allocates a block of memory and copies the contents
    so as to meet the behavior of MAP_PRIVATE.
  • In fact, we can't emulate the behavior when it is not in HEAP and the flag is MAP_SHARED.
    Although MAP_SHARED and MAP_PRIVATE are occasionally the same (when there are no memory
    writes), should I place an assertion here?

When it is mapped for MAP_PRIVATE, the implementation checks if the whole file is mapped. If so, Array.slice() can be omitted. If the file is backed by a TypedArray, Array.slice() can be replaced by subarray(), since it is used read-only. Those checks eliminate memory copies introduced Array.slice().

@kripken
Copy link
Copy Markdown
Member

kripken commented Apr 1, 2013

We don't need to actually have things on page boundaries, though, so no extra manual padding is needed. malloc's alignment should be fine I think?

Yes, definitely assert on the unsupported case(s).

Sounds good overall.

@ting-yuan
Copy link
Copy Markdown
Author

Although POSIX allows the address to be unaligned (with some constraints), Linux implements it to be aligned. Should we follow Linux for better compatibility?

http://linux.die.net/man/2/mmap
...
If addr is NULL, then the kernel chooses the address at which to create the mapping; this is the most portable method of creating a new mapping. If addr is not NULL, then the kernel takes it as a hint about where to place the mapping; on Linux, the mapping will be created at a nearby page boundary.
...
MAP_FIXED
Don't interpret addr as a hint: place the mapping at exactly that address. addr must be a multiple of the page size.
...

@kripken
Copy link
Copy Markdown
Member

kripken commented Apr 9, 2013

Well, we can say that our "page boundaries" are 4-byte aligned ;)

Unless in your experience, code may rely on this?

@kripken
Copy link
Copy Markdown
Member

kripken commented Apr 9, 2013

Note that we don't need to follow Linux; we are not identical to a Linux target for unavoidable reasons due to the web platform. So I am not too worried in general about places we are not identical.

@ting-yuan
Copy link
Copy Markdown
Author

OK, I tried but couldn't find an example that requires page alignment either explicitly or implicitly. Maybe we shouldn't bother about this alignment issue that may not actually happen.

I added two more commits that follow your comments:

  1. Replace valloc() with malloc() in mmap() and when putting files into HEAP.
  2. Don't allow the case that MAP_SHARED is specified but the file is not on HEAP.

@kripken
Copy link
Copy Markdown
Member

kripken commented Apr 16, 2013

Ok, sounds good. Did you run this on the test suite?

@ting-yuan
Copy link
Copy Markdown
Author

All tests passed except for test_scons(), test_crunch() and test_mmap() and I assume the former two are irrelevant.

test_mmap() failed because there's an explicit check to the 4k alignment on the returned addresses by mmap(). After commenting it out the test passed.

Comment thread tools/file_packager.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a var here I think. And it looks like ptr is never saved anywhere - so we are allocating memory that can only be accessed indirectly by curr.response - which means it would need to be copied to a new position? I thought the goal was to be able to directly use it as written in memory?

@ting-yuan
Copy link
Copy Markdown
Author

Sure, you are right! It's used only locally.

@kripken
Copy link
Copy Markdown
Member

kripken commented Apr 18, 2013

Hmm, then I think I don't understand what is being done here. It seems like we might as well store file data inside the big typed array, so that mmap can just provide access to it, without doing any copies. In other words, save ptr, and when we are asked to mmap the file, return ptr (assuming it is the right kind of mmap, etc.). Is there a reason not to do that?

@ting-yuan
Copy link
Copy Markdown
Author

A generated snippet looks like this:

        curr = DataRequest.prototype.requests['data/symbols.dat'];
        var data = byteArray.subarray(4840465, 4841432);
        var ptr = _malloc(967);
        HEAPU8.set(data, ptr);
        curr.response = HEAPU8.subarray(ptr, ptr + 967);
        curr.onload();

ptr is a byteOffset into HEAP and is propagated into curr.response. The content of the file is copied into HEAP, starting at ptr:

        HEAPU8.set(data, ptr);

Note that curr.response shares the same buffer/memory with HEAP; It just has it's own byteOffset, which is ptr in this case.

@kripken
Copy link
Copy Markdown
Member

kripken commented Apr 18, 2013

curr.response is another view into the big typed array, that shows the memory at ptr, yes. But, when you mmap, don't you request a pointer to the file in memory? If we saved ptr, we could just pass it directly back? Otherwise, if all we have is curr.response, we must first copy it to a location, and return the address of that location.

@ting-yuan
Copy link
Copy Markdown
Author

Yes, we want to return ptr directly in mmap(), but we don't have to save it explicitly; It's effectively a byteOffset. In other words,

ptr === curr.response.byteOffset === FS.streams[stream].object.contents.byteOffset === ptr

where the first ptr and curr.response.byteOffset are in file_packager.py and the others are in mmap().

Or did you mean ptr itself should reside in HEAP?

@ting-yuan
Copy link
Copy Markdown
Author

Oh, I see. I made an assumption that may be too strong. I assumed that the emulated memory addresses are simply indexes into the large TypedArray, HEAP. Although this is true (right?) in current implementation, will this change in the future?

@kripken
Copy link
Copy Markdown
Member

kripken commented Apr 19, 2013

Oh, I see, you are using byteOffset to remember the location, basically? I didn't know that was possible ;) but now I see that it is.

@kripken
Copy link
Copy Markdown
Member

kripken commented Apr 19, 2013

Memory addresses are always pointers in the large typed array, yes. We experimented with other approaches a long time ago, and I just saw a new compiler called duetto is going that route, but it adds significant overhead. For us, pointers are always just ints, and always represent offsets into a single typed array.

Comment thread src/library.js
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need curly braces if on another line. But, can just move it up to the line with the if.

@kripken
Copy link
Copy Markdown
Member

kripken commented Apr 19, 2013

Ok, looks great, I had just one tiny syntax comment. And sorry for not understanding the byteOffset stuff earlier.

@ting-yuan
Copy link
Copy Markdown
Author

Hi Kripken, I updated the pull request according to your comment. Is there anything else to fix?

@kripken
Copy link
Copy Markdown
Member

kripken commented Apr 29, 2013

No, looks good (github doesn't send emails on commits added, just comments, so I missed this - sorry).

@kripken
Copy link
Copy Markdown
Member

kripken commented Apr 29, 2013

Although now I see this does not merge cleanly, can you please rebase?

@kripken
Copy link
Copy Markdown
Member

kripken commented May 22, 2013

I rebased and pushed this.

@kripken kripken closed this May 22, 2013
@ting-yuan
Copy link
Copy Markdown
Author

I'm really sorry for not noticing your comment 22 days ago and even wondered if you are too busy to response. I'm very happy to be able to contribute to this project. Please feel free to tell me if there's anything else I can help.

@kripken
Copy link
Copy Markdown
Member

kripken commented May 22, 2013

No worries, I was just cleaning up older pull requests when I saw this one,
so I just rebased it myself. Thanks for writing it!

On Tue, May 21, 2013 at 7:18 PM, ting-yuan notifications@github.com wrote:

I'm really sorry for not noticing your comment 22 days ago and even
wondered if you are too busy to response. I'm very happy to be able to
contribute to this project. Please feel free to tell me if there's anything
else I can help.


Reply to this email directly or view it on GitHubhttps://github.com//pull/981#issuecomment-18253703
.

axodox pushed a commit to axodox/emscripten that referenced this pull request Mar 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants