Skip to content

Fix visibility of libprotobuf symbols in protoc_compiler.so on Mac#24992

Merged
gnossen merged 4 commits intogrpc:masterfrom
gnossen:fix_grpc_tools_so_mac_symbols
Dec 15, 2020
Merged

Fix visibility of libprotobuf symbols in protoc_compiler.so on Mac#24992
gnossen merged 4 commits intogrpc:masterfrom
gnossen:fix_grpc_tools_so_mac_symbols

Conversation

@gnossen
Copy link
Copy Markdown
Contributor

@gnossen gnossen commented Dec 15, 2020

Fixes #24897

@yulin-liang This will require a backport (and patch to) to v1.34.x.

Our Mac OS build currently produces many global weak symbols corresponding to static variables in libprotobuf:

000000000041a008 gw    O __DATA,__data guard variable for google::protobuf::FileDescriptorTables::GetEmptyInstance()::file_descriptor_tables
0000000000419708 gw    O __DATA,__data guard variable for google::protobuf::util::converter::ProtoStreamObjectWriter::Options::Defaults()::defaults
000000000041a198 gw    O __DATA,__data guard variable for google::protobuf::internal::ShutdownData::get()::data
0000000000209940 gw    F __TEXT,__text google::protobuf::safe_strtod(google::protobuf::StringPiece, double*)

On Mac OS, clang will produce global weak symbols for function-local static variables when the function is a class method. For example:

class Bar {                                                                                                                    
public:                                                                                                                        
  int bar() {                                                                                                                  
    static int foo;                                                                                                            
    return foo;                                                                                                                
  }                                                                                                                            
};                                                                                                                             
                                                                                                                               
int main() {                                                                                                                   
  Bar bar;                                                                                                                     
  auto foo = bar.bar();                                                                                                        
  return foo;                                                                                                                  
}  

This C++ snippet produces the following symbol table:

$ objdump -t -wide foo.o | c++filt

foo.o:	file format Mach-O 64-bit x86-64

SYMBOL TABLE:
0000000000000030 gw    F __TEXT,__text Bar::bar()
0000000000000044 gw    O __DATA,__data Bar::bar()::foo
0000000000000000 __float128     F __TEXT,__text _main

When given a choice between two weak symbols with the same name in the same process, the Mac OS runtime linker will prefer the first one:

dyld: weak bind: _protoc_compiler.cpython-36m-darwin.so:0x110772158 = _message.cpython-36m-darwin.so:guard variable for google::protobuf::FileDescriptorTables::GetEmptyInstance()::file_descriptor_tables, *0x110772158 = 0x10EE03388
dyld: weak bind: _protoc_compiler.cpython-36m-darwin.so:0x110772168 = _message.cpython-36m-darwin.so:guard variable for google::protobuf::internal::ShutdownData::get()::data, *0x110772168 = 0x10EE03370
dyld: weak bind: _protoc_compiler.cpython-36m-darwin.so:0x110772968 = _message.cpython-36m-darwin.so:google::protobuf::RepeatedField<bool>::SwapElements(int, int), *0x110772968 = 0x10ECF2CB0
dyld: weak bind: _protoc_compiler.cpython-36m-darwin.so:0x110772970 = _message.cpython-36m-darwin.so:google::protobuf::RepeatedField<bool>::Swap(google::protobuf::RepeatedField<bool>*), *0x110772970 = 0x10ECF2B50
dyld: weak bind: _protoc_compiler.cpython-36m-darwin.so:0x110772978 = _message.cpython-36m-darwin.so:google::protobuf::RepeatedField<bool>::Reserve(int), *0x110772978 = 0x10ECF1710

Above, you see references to libprotobuf symbols in _protoc_compiler.so (grpcio-tools) resolving to addresses corresponding to symbols in _message.so (protobuf wheel).

When this happens, as in #24897, the initializer for static variables may come from one version of libprotobuf while, the runtime code may come from another version of it. When data layouts or other pertinent invariants change, this may result in segmentation faults, data corruption, or other "bad behavior."

The fix in this PR is to make all symbols in _protoc_compiler.so besides _PyInit__protoc_compiler (the Python interpreter's entrypoint to the grpcio-tools C extension) local:

00000000003f87a0 lw    O __DATA,__data google::protobuf::FileDescriptorTables::GetEmptyInstance()::file_descriptor_tables
00000000003f87a8 lw    O __DATA,__data guard variable for google::protobuf::FileDescriptorTables::GetEmptyInstance()::file_descriptor_tables
00000000003f8930 lw    O __DATA,__data google::protobuf::internal::ShutdownData::get()::data
00000000003f8938 lw    O __DATA,__data guard variable for google::protobuf::internal::ShutdownData::get()::data

This issue only ever affected Mac OS. Our Linux shared object libraries contain no global weak symbols and (somewhat perplexingly), the .pyd files in our Windows artifacts contain no symbols at all, according to winedump.

@gnossen gnossen marked this pull request as ready for review December 15, 2020 17:37
@gnossen gnossen requested a review from lidizheng December 15, 2020 17:37
Copy link
Copy Markdown
Contributor

@lidizheng lidizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really nice fix! TIL.

s/#4897/#24897/ in description

Comment thread tools/distrib/python/grpcio_tools/setup.py
Comment thread tools/distrib/python/grpcio_tools/setup.py Outdated
@gnossen
Copy link
Copy Markdown
Contributor Author

gnossen commented Dec 15, 2020

Known failure.

@gnossen gnossen merged commit de32de4 into grpc:master Dec 15, 2020
@gnossen gnossen deleted the fix_grpc_tools_so_mac_symbols branch December 15, 2020 23:40
gnossen added a commit to gnossen/grpc that referenced this pull request Dec 15, 2020
Fix visibility of libprotobuf symbols in protoc_compiler.so on Mac
ravidcottopia pushed a commit to ottopia-tech/grpc that referenced this pull request Dec 29, 2020
Fix visibility of libprotobuf symbols in protoc_compiler.so on Mac
rowillia pushed a commit to rowillia/protobuf that referenced this pull request Feb 25, 2021
@gnossen gave a great overview in grpc/grpc#24992 of the overall problem.

If a python process using both protobuf _and_ another native library linking in libprotobuf
frequently can cause crashes.  This seems to frequently affect tensorflow as well:

tensorflow/tensorflow#8394,
tensorflow/tensorflow#9525 (comment)
tensorflow/tensorflow#24976,
tensorflow/tensorflow#35573,
https://github.com/tensorflow/tensorflow/blob/v2.0.0/tensorflow/contrib/makefile/rename_protobuf.sh,
tensorflow/tensorflow#16104

Testing locally this fixes both crashes when linking in multiple versions of protobuf
and fixes `DescriptorPool` clashes as well (e.g. Python and Native code import different versions of the same message).
TeBoring pushed a commit to protocolbuffers/protobuf that referenced this pull request Apr 24, 2021
@gnossen gave a great overview in grpc/grpc#24992 of the overall problem.

If a python process using both protobuf _and_ another native library linking in libprotobuf
frequently can cause crashes.  This seems to frequently affect tensorflow as well:

tensorflow/tensorflow#8394,
tensorflow/tensorflow#9525 (comment)
tensorflow/tensorflow#24976,
tensorflow/tensorflow#35573,
https://github.com/tensorflow/tensorflow/blob/v2.0.0/tensorflow/contrib/makefile/rename_protobuf.sh,
tensorflow/tensorflow#16104

Testing locally this fixes both crashes when linking in multiple versions of protobuf
and fixes `DescriptorPool` clashes as well (e.g. Python and Native code import different versions of the same message).
goaway pushed a commit to envoyproxy/envoy-mobile that referenced this pull request Jun 8, 2021
Description: This PR modifies the build recipe for `envoy_engine.so` to hide all symbols except for `_PyInit_envoy_engine` by default on maOS, so that envoy_engine can be loaded alongside a [protobuf](http://pypi.org/project/protobuf) wheel without breaking.

Loading in protobuf alongside `envoy_engine.so` would register two sets of the same symbols and, on macOS, the runtime linker chooses the first symbol it finds causing problems. See the [similar protobuf change](protocolbuffers/protobuf#8346) and its [sister change in grpc](grpc/grpc#24992) for more information on what's going on.

Risk Level: Low
Testing: See the now-closed #1504 for how I've been testing this.
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Cerek Hillen <chillen@lyft.com>
acozzette pushed a commit to acozzette/protobuf that referenced this pull request Jan 22, 2022
@gnossen gave a great overview in grpc/grpc#24992 of the overall problem.

If a python process using both protobuf _and_ another native library linking in libprotobuf
frequently can cause crashes.  This seems to frequently affect tensorflow as well:

tensorflow/tensorflow#8394,
tensorflow/tensorflow#9525 (comment)
tensorflow/tensorflow#24976,
tensorflow/tensorflow#35573,
https://github.com/tensorflow/tensorflow/blob/v2.0.0/tensorflow/contrib/makefile/rename_protobuf.sh,
tensorflow/tensorflow#16104

Testing locally this fixes both crashes when linking in multiple versions of protobuf
and fixes `DescriptorPool` clashes as well (e.g. Python and Native code import different versions of the same message).
acozzette pushed a commit to protocolbuffers/protobuf that referenced this pull request Jan 25, 2022
@gnossen gave a great overview in grpc/grpc#24992 of the overall problem.

If a python process using both protobuf _and_ another native library linking in libprotobuf
frequently can cause crashes.  This seems to frequently affect tensorflow as well:

tensorflow/tensorflow#8394,
tensorflow/tensorflow#9525 (comment)
tensorflow/tensorflow#24976,
tensorflow/tensorflow#35573,
https://github.com/tensorflow/tensorflow/blob/v2.0.0/tensorflow/contrib/makefile/rename_protobuf.sh,
tensorflow/tensorflow#16104

Testing locally this fixes both crashes when linking in multiple versions of protobuf
and fixes `DescriptorPool` clashes as well (e.g. Python and Native code import different versions of the same message).

Co-authored-by: Roy Williams <roy.williams.iii@gmail.com>
jpsim pushed a commit to envoyproxy/envoy that referenced this pull request Nov 28, 2022
Description: This PR modifies the build recipe for `envoy_engine.so` to hide all symbols except for `_PyInit_envoy_engine` by default on maOS, so that envoy_engine can be loaded alongside a [protobuf](http://pypi.org/project/protobuf) wheel without breaking.

Loading in protobuf alongside `envoy_engine.so` would register two sets of the same symbols and, on macOS, the runtime linker chooses the first symbol it finds causing problems. See the [similar protobuf change](protocolbuffers/protobuf#8346) and its [sister change in grpc](grpc/grpc#24992) for more information on what's going on.

Risk Level: Low
Testing: See the now-closed #1504 for how I've been testing this.
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Cerek Hillen <chillen@lyft.com>
Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit to envoyproxy/envoy that referenced this pull request Nov 29, 2022
Description: This PR modifies the build recipe for `envoy_engine.so` to hide all symbols except for `_PyInit_envoy_engine` by default on maOS, so that envoy_engine can be loaded alongside a [protobuf](http://pypi.org/project/protobuf) wheel without breaking.

Loading in protobuf alongside `envoy_engine.so` would register two sets of the same symbols and, on macOS, the runtime linker chooses the first symbol it finds causing problems. See the [similar protobuf change](protocolbuffers/protobuf#8346) and its [sister change in grpc](grpc/grpc#24992) for more information on what's going on.

Risk Level: Low
Testing: See the now-closed #1504 for how I've been testing this.
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Cerek Hillen <chillen@lyft.com>
Signed-off-by: JP Simard <jp@jpsim.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/build kind/bug lang/Python release notes: yes Indicates if PR needs to be in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

grpcio-tools==1.34.0 seg faults on macOS Big Sur, depending on import order

2 participants