Skip to content

Duplicated binaries in the python package #23968

@asfimport

Description

@asfimport

Hello,

 

I'm not sure if it is a desired feature or not, but there's no "question" issue type, so I'm opening it as a bug - please correct if necessary.

 

Most of binary files in the python "pyarrow" package are present in two versions, e.g.:

 

libarrow.so
libarrow.so.15

or  

libarrow.dylib
libarrow.15.dylib

(I presume, that ".15" correspond to the version of pyarrow?).

Which are actually identical:

$ diff libarrow.15.dylib libarrow.dylib  # returns nothing

So let me ask:

  • Is it necessary to have both of them in the distribution?

  • Which one is actually imported, and is it safe to remove another one?

     

    Out of 130 MB of full pyarrow, 105 MB are those binaries, so removing duplicates would save quite some space (especially important if using pyarrow in AWS lambdas where the function is limited in size). 

Reporter: Vladimir

Related issues:

Note: This issue was originally created as ARROW-7728. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions