Skip to content

Conversation

@rymurr
Copy link
Contributor

@rymurr rymurr commented Jul 2, 2020

This finishes the netty split in arrow-memory and creates 3 new modules

  • memory-core: core memory implementation
  • memory-netty: netty allocation manager
  • memory-unsafe: unsafe allocation manager

The bulk of the changes here are moving files and adding the correct dependencies to poms

@rymurr rymurr force-pushed the ARROW-9300 branch 2 times, most recently from d8a34cc to 6ea28a3 Compare July 2, 2020 16:43
@github-actions
Copy link

github-actions bot commented Jul 2, 2020

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a frequently called method, I think it would be beneficial to reuse the empty ArrowBuf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only in tests, but I agree. Fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we also need the memory-unsafe dependency here?
Otherwise, our integeration tests will not be able to pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the integration tests use UnsafeAllocator do they?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran mvn clean install -Pintegration-tests and it was clean. So the integration tests went ok w/ Netty

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback.
It is weird. I thought only unsafe allocator supports allocating a buffer larger than 2GB, because the netty allocator uses int32 internally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the netty allocator can allocate 64-bit spaces but netty buffers can't directly reference them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should run the unit tests once for each of the 2 allocators?

@rymurr
Copy link
Contributor Author

rymurr commented Jul 3, 2020

Thanks for the comments @liyafan82 . I have updated and rebased to pull in your fix from #7628

@liyafan82
Copy link
Contributor

Thanks for the comments @liyafan82 . I have updated and rebased to pull in your fix from #7628

Sorry for the trouble, and thank you for the effort.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have 3 classes in the code base named "DefaultAllocationManagerFactory". Do we need all of them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @liyafan82, yes i think we do:

  1. lives in the netty module.
  2. lives in the unsafe module
  3. lives in the tests portion of the core memory module.

On startup we first check if a specific memory allocator has been requested. If yes we instantiate that. If no specific impl has been requested then we search the classpath for a DefaultAllocationManagerFactory. If we find one then we instantiate it, else we throw a Runtime exception (can't continue without an allocation manager).

So each concrete impl of AllocationManager requires a factory to create it and we have a trivial allocation manager in arrow-memory-core tests to help the core tests finish.

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine for the most part, but I'm not really sure why we need to separate arrow-memory-core and arrow-memory-unsafe? Couldn't those be combined since it wouldn't add any other dependencies, and that would also simplify things. Plus, it doesn't really make sense to have a module arrow-memory-core without a default allocator that would probably build fine with arrow-vector, but then blow up at runtime. What do you think @rymurr and @liyafan82 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, don't know how that snuck through. Fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be removed now right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add newline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it still looks like it needs a newline at the end of the file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should run the unit tests once for each of the 2 allocators?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessary here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@jacques-n
Copy link
Contributor

Looks fine for the most part, but I'm not really sure why we need to separate arrow-memory-core and arrow-memory-unsafe? Couldn't those be combined since it wouldn't add any other dependencies, and that would also simplify things. Plus, it doesn't really make sense to have a module arrow-memory-core without a default allocator that would probably build fine with arrow-vector, but then blow up at runtime. What do you think @rymurr and @liyafan82 ?

This is modeled after the slf4j pattern where the logging implementation is separated from the core apis. That way the default pattern can people sourcing the desired allocator via dependency without having to configure one. This seems much cleaner that the previous approaches where people had to manually configure or override via system properties.

Additionally, I'd add that for new users I think we would suggest using the Netty one, not the unsafe one. It is much more complete/comprehensive and intelligent. So having a default implementation that we always tell people to override seems worse than they having a hard failure if they don't source any. If we want to make things easier, we could also introduce some vector + allocator depedency poms and then use those in documentation, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems duplicate with the system property defined above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@liyafan82
Copy link
Contributor

Looks fine for the most part, but I'm not really sure why we need to separate arrow-memory-core and arrow-memory-unsafe? Couldn't those be combined since it wouldn't add any other dependencies, and that would also simplify things. Plus, it doesn't really make sense to have a module arrow-memory-core without a default allocator that would probably build fine with arrow-vector, but then blow up at runtime. What do you think @rymurr and @liyafan82 ?

@BryanCutler I agree with you that it makes things simpler.

Since we may need to continue to use netty implementation as the default one (as indicated by @jacques-n ), maybe it is beneficial to keep the arrow-memory-unsafe module, at least for now.

@BryanCutler
Copy link
Member

I agree that the recommended allocator should still be the netty one for now, so I guess it wouldn't be good to bundle the unsafe allocator as a possible default. I'm good with raising an error then if a default allocator is not on the classpath, as long as it's clear to the user what they need to do. Right now it looks like the error is:

java.lang.RuntimeException: No DefaultAllocationManager found on classpath. Can't allocate Arrow buffers.

@rymurr would you mind adding something to the message like "it is recommended to add arrow-memory-netty or arrow-memory-unsafe as a dependency to provide a `DefaultAllocatorManager"?

@BryanCutler
Copy link
Member

On a related note, it seems like our netty version 4.1.27 is pretty old now, ~2 years, do you all think it would be good to upgrade this before the 1.0.0 release? It looks like there is a security vulnerability pre 4.1.44 https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-20445

@rymurr
Copy link
Contributor Author

rymurr commented Jul 8, 2020

Hey all,

Thanks @BryanCutler and @liyafan82 for the comments and thanks @jacques-n for the reasoning behind unsafe/netty module split.

I have addressed your comments in the most recent update and have created two new tickets:

  1. bump netty version: https://issues.apache.org/jira/browse/ARROW-9370
  2. run tests twice: https://issues.apache.org/jira/browse/ARROW-9371

I will raise PRs for these soon

rymurr pushed a commit to rymurr/arrow that referenced this pull request Jul 8, 2020
As per apache#7619 (comment) there is a security
vulnerability in the current version of Netty. This upgrades to the latest version.

A compatible upgrade of grpc was also required
Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor nit on making sure there are newlines at the end of the new pom.xml files for consistency. Also, would you mind improving the error message if no DefaultAllocationManagerFactor mentioned at #7619 (comment)? Otherwise LGTM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it still looks like it needs a newline at the end of the file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come you want a newline? I notice that most/all poms have it but it isn't picked up by the style checker. Not a problem either way for me, just curious

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's usually conventional to have it at the end of text files and some command line tools expect it. Not sure why it's not caught by the style checker, but that's why github marks it with a red icon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a new line in all 3 locations

@rymurr rymurr force-pushed the ARROW-9300 branch 2 times, most recently from 5dff1f8 to 928d971 Compare July 8, 2020 18:32
@rymurr
Copy link
Contributor Author

rymurr commented Jul 8, 2020

Just a minor nit on making sure there are newlines at the end of the new pom.xml files for consistency. Also, would you mind improving the error message if no DefaultAllocationManagerFactor mentioned at #7619 (comment)? Otherwise LGTM

Both are done now. I mistakenly missed the DefaultAllocationManagerFactory update in the previous change set

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this one is a little different. It looks like it could be from getUnsafeFactory(), getNettyFactory(), or some custom class. I would leave the original exception with clazzName, then in getUnsafeFactory(), getNettyFactory() catch the RuntimeException and add a message like "Please ensure [arrow-memory-netty, arrow-memory-unsafe] or equivalent class containing the DefaultAllocationManager is on the classpath"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea @BryanCutler that is much cleaner. Fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this is duplicate with arrow.vector.max_allocation_bytes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need the dependency of arrow-memory-netty here?

@liyafan82
Copy link
Contributor

Hey all,

Thanks @BryanCutler and @liyafan82 for the comments and thanks @jacques-n for the reasoning behind unsafe/netty module split.

I have addressed your comments in the most recent update and have created two new tickets:

  1. bump netty version: https://issues.apache.org/jira/browse/ARROW-9370
  2. run tests twice: https://issues.apache.org/jira/browse/ARROW-9371

I will raise PRs for these soon

@rymurr Mostly look good to me. Thanks for your effort.

BryanCutler pushed a commit that referenced this pull request Jul 9, 2020
As per #7619 (comment) there is a security
vulnerability in the current version of Netty. This upgrades to the latest version.

A compatible upgrade of grpc was also required

Closes #7677 from rymurr/ARROW-9370

Authored-by: Ryan Murray <rymurr@dremio.com>
Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
@rymurr
Copy link
Contributor Author

rymurr commented Jul 9, 2020

Thanks again @liyafan82 and @BryanCutler have addressed your comments.

Copy link
Contributor

@liyafan82 liyafan82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, just need to fix the typo in the error message and I think this will be good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be arrow-memory-netty and NettyAllocationManager below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigh....sorry. Copy paste for the win. Fixed.

This finishes the netty split in arrow-memory and creates 3 new modules

* memory-core: core memory implementation
* memory-netty: netty allocation manager
* memory-unsafe: unsafe allocation manager

The bulk of the changes here are moving files and adding the correct dependencies to poms
Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BryanCutler
Copy link
Member

merged to master, thanks @rymurr !

@rymurr rymurr deleted the ARROW-9300 branch July 22, 2020 11:58
rymurr pushed a commit to rymurr/arrow that referenced this pull request Jul 24, 2020
As per apache#7619 (comment) the vector tests should be run for
 both netty and unsafe allocators
BryanCutler pushed a commit that referenced this pull request Jul 25, 2020
As per #7619 (comment) the vector tests should be run for both netty and unsafe allocators. The default tests are run with the Netty allocator, and `run-unsafe` tests are done with the Unsafe Allocator.

Closes #7676 from rymurr/ARROW-9371

Authored-by: Ryan Murray <rymurr@dremio.com>
Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
kou pushed a commit to apache/arrow-java that referenced this pull request Nov 25, 2024
As per apache/arrow#7619 (comment) there is a security
vulnerability in the current version of Netty. This upgrades to the latest version.

A compatible upgrade of grpc was also required

Closes #7677 from rymurr/ARROW-9370

Authored-by: Ryan Murray <rymurr@dremio.com>
Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
This finishes the netty split in arrow-memory and creates 3 new modules

* memory-core: core memory implementation
* memory-netty: netty allocation manager
* memory-unsafe: unsafe allocation manager

The bulk of the changes here are moving files and adding the correct dependencies to poms

Closes apache#7619 from rymurr/ARROW-9300

Authored-by: Ryan Murray <rymurr@dremio.com>
Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
As per apache#7619 (comment) the vector tests should be run for both netty and unsafe allocators. The default tests are run with the Netty allocator, and `run-unsafe` tests are done with the Unsafe Allocator.

Closes apache#7676 from rymurr/ARROW-9371

Authored-by: Ryan Murray <rymurr@dremio.com>
Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants