Skip to content

Conversation

@coryan
Copy link
Contributor

@coryan coryan commented Oct 9, 2021

No description provided.

@github-actions
Copy link

github-actions bot commented Oct 9, 2021

@coryan coryan marked this pull request as draft October 9, 2021 19:16
@coryan coryan marked this pull request as ready for review October 9, 2021 23:20
private:
Result<FileInfo> GetFileInfoImpl(GcsPath const& path, google::cloud::Status status,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should status be passed with const ref?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@emkornfield
Copy link
Contributor

Left a few comments. boost process seems like a new dependency on boost? I don't know if there are any issue taking it (I guess this is only for tests)? CC @pitrou @kou

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already use boost::process for S3 tests. So it's not a new dependency.

@coryan
Copy link
Contributor Author

coryan commented Oct 11, 2021

We already use boost::process for S3 tests. So it's not a new dependency.

Could you clarify what part of the change are you commenting on? Is it the changes to vcpkg.json? If so, while it is true that boost::process was used already, it was not declared as a vcpkg dependency and thus not built by default.

@kou
Copy link
Member

kou commented Oct 11, 2021

We already use boost::process for S3 tests. So it's not a new dependency.

Could you clarify what part of the change are you commenting on? Is it the changes to vcpkg.json? If so, while it is true that boost::process was used already, it was not declared as a vcpkg dependency and thus not built by default.

Sure. This is an answer of

#11375 (comment)

boost process seems like a new dependency on boost? I don't know if there are any issue taking it (I guess this is only for tests)?

.

I wanted to say that we don't need to think about boost::process in this pull request.

@coryan
Copy link
Contributor Author

coryan commented Oct 11, 2021

I wanted to say that we don't need to think about boost::process in this pull request.

Thanks. I removed the change to vcpkg.json.

@emkornfield
Copy link
Contributor

C++ LGTM, @kou any more concerns?

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the change to vcpkg.json.

Sorry for my poor English skill. It's OK that we add boost-process to vcpkg.json in this pull request. I just wanted to say that there is no problem that we use boost::process for GCS test...

I think that we can merge this with minor Dockerfile related improvements.

@kou
Copy link
Member

kou commented Oct 12, 2021

@github-actions crossbow submit -g nightly

@github-actions
Copy link

Revision: 6aa1b3e36fee6231fe303e0aeafffac416bae6b8

Submitted crossbow builds: ursacomputing/crossbow @ actions-916

Task Status
almalinux-8-amd64 Github Actions
almalinux-8-arm64 TravisCI
amazon-linux-2-amd64 Github Actions
centos-7-amd64 Github Actions
centos-8-amd64 Github Actions
centos-8-arm64 TravisCI
conda-clean Azure
conda-linux-gcc-py36-arm64 Azure
conda-linux-gcc-py36-cpu-r40 Azure
conda-linux-gcc-py36-cuda Azure
conda-linux-gcc-py37-arm64 Azure
conda-linux-gcc-py37-cpu-r41 Azure
conda-linux-gcc-py37-cuda Azure
conda-linux-gcc-py38-arm64 Azure
conda-linux-gcc-py38-cpu Azure
conda-linux-gcc-py38-cuda Azure
conda-linux-gcc-py39-arm64 Azure
conda-linux-gcc-py39-cpu Azure
conda-linux-gcc-py39-cuda Azure
conda-osx-arm64-clang-py38 Azure
conda-osx-arm64-clang-py39 Azure
conda-osx-clang-py36-r40 Azure
conda-osx-clang-py37-r41 Azure
conda-osx-clang-py38 Azure
conda-osx-clang-py39 Azure
conda-win-vs2017-py36-r40 Azure
conda-win-vs2017-py37-r41 Azure
conda-win-vs2017-py38 Azure
conda-win-vs2017-py39 Azure
debian-bookworm-amd64 Github Actions
debian-bookworm-arm64 TravisCI
debian-bullseye-amd64 Github Actions
debian-bullseye-arm64 TravisCI
debian-buster-amd64 Github Actions
debian-buster-arm64 TravisCI
example-cpp-minimal-build-static Github Actions
example-cpp-minimal-build-static-system-dependency Github Actions
homebrew-cpp Github Actions
homebrew-r-autobrew Github Actions
java-jars Github Actions
nuget Github Actions
python-sdist Github Actions
test-build-vcpkg-win Github Actions
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-conda-python-3.6 Github Actions
test-conda-python-3.6-pandas-0.23 Github Actions
test-conda-python-3.7 Github Actions
test-conda-python-3.7-dask-latest Github Actions
test-conda-python-3.7-dask-master Github Actions
test-conda-python-3.7-hdfs-2.9.2 Github Actions
test-conda-python-3.7-hdfs-3.2.1 Github Actions
test-conda-python-3.7-kartothek-latest Github Actions
test-conda-python-3.7-kartothek-master Github Actions
test-conda-python-3.7-pandas-0.24 Github Actions
test-conda-python-3.7-pandas-latest Github Actions
test-conda-python-3.7-spark-branch-3.0 Github Actions
test-conda-python-3.8 Github Actions
test-conda-python-3.8-hypothesis Github Actions
test-conda-python-3.8-pandas-latest Github Actions
test-conda-python-3.8-pandas-nightly Github Actions
test-conda-python-3.8-spark-master Github Actions
test-conda-python-3.9 Github Actions
test-conda-python-3.9-pandas-master Github Actions
test-debian-11-cpp Github Actions
test-debian-11-go-1.15 Azure
test-debian-11-python-3 Azure
test-debian-c-glib Github Actions
test-debian-ruby Github Actions
test-fedora-33-cpp Github Actions
test-fedora-33-python-3 Azure
test-r-arrow-backwards-compatibility Github Actions
test-r-depsource-auto Azure
test-r-depsource-system Github Actions
test-r-devdocs Github Actions
test-r-gcc-11 Github Actions
test-r-install-local Github Actions
test-r-linux-as-cran Github Actions
test-r-linux-rchk Github Actions
test-r-linux-valgrind Azure
test-r-minimal-build Azure
test-r-offline-maximal Github Actions
test-r-offline-minimal Azure
test-r-rhub-debian-gcc-devel-lto-latest Azure
test-r-rhub-ubuntu-gcc-release-latest Azure
test-r-rocker-r-base-latest Azure
test-r-rstudio-r-base-3.6-bionic Azure
test-r-rstudio-r-base-3.6-centos7-devtoolset-8 Azure
test-r-rstudio-r-base-3.6-centos8 Azure
test-r-rstudio-r-base-3.6-opensuse15 Azure
test-r-rstudio-r-base-3.6-opensuse42 Azure
test-r-ubuntu-21.04 Github Actions
test-r-versions Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-18.04-python-3 Azure
test-ubuntu-18.04-r-sanitizer Azure
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions
test-ubuntu-20.10-docs Azure
test-ubuntu-c-glib Github Actions
test-ubuntu-default-docs Azure
test-ubuntu-ruby Github Actions
ubuntu-bionic-amd64 Github Actions
ubuntu-bionic-arm64 TravisCI
ubuntu-focal-amd64 Github Actions
ubuntu-focal-arm64 TravisCI
ubuntu-hirsute-amd64 Github Actions
ubuntu-hirsute-arm64 TravisCI
wheel-macos-big-sur-cp39-arm64 Github Actions
wheel-macos-big-sur-cp39-universal2 Github Actions
wheel-macos-big-sur-py38-arm64 Github Actions
wheel-macos-high-sierra-cp36-amd64 Github Actions
wheel-macos-high-sierra-cp37-amd64 Github Actions
wheel-macos-high-sierra-cp38-amd64 Github Actions
wheel-macos-high-sierra-cp39-amd64 Github Actions
wheel-macos-mavericks-cp36-amd64 Github Actions
wheel-macos-mavericks-cp37-amd64 Github Actions
wheel-macos-mavericks-cp38-amd64 Github Actions
wheel-macos-mavericks-cp39-amd64 Github Actions
wheel-manylinux2010-cp36-amd64 Github Actions
wheel-manylinux2010-cp37-amd64 Github Actions
wheel-manylinux2010-cp38-amd64 Github Actions
wheel-manylinux2010-cp39-amd64 Github Actions
wheel-manylinux2014-cp36-amd64 Github Actions
wheel-manylinux2014-cp36-arm64 TravisCI
wheel-manylinux2014-cp37-amd64 Github Actions
wheel-manylinux2014-cp37-arm64 TravisCI
wheel-manylinux2014-cp38-amd64 Github Actions
wheel-manylinux2014-cp38-arm64 TravisCI
wheel-manylinux2014-cp39-amd64 Github Actions
wheel-manylinux2014-cp39-arm64 TravisCI
wheel-windows-cp36-amd64 Github Actions
wheel-windows-cp37-amd64 Github Actions
wheel-windows-cp38-amd64 Github Actions
wheel-windows-cp39-amd64 Github Actions

@kou
Copy link
Member

kou commented Oct 12, 2021

Can we set ARROW_GCS=ON like ARROW_S3=ON in our CI?

@pitrou
Copy link
Member

pitrou commented Oct 12, 2021

Can we set ARROW_GCS=ON like ARROW_S3=ON in our CI?

How large is the slowdown? Compiling embedded libraries from scratch adds to CI delays.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @coryan . Here are a number of minor comments.

@@ -70,7 +131,9 @@ bool GcsFileSystem::Equals(const FileSystem& other) const {
}

Result<FileInfo> GcsFileSystem::GetFileInfo(const std::string& path) {
return Status::NotImplemented("The GCS FileSystem is not fully implemented");
auto p = GcsPath::FromString(path);
if (!p.ok()) return std::move(p).status();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simply write:

ARROW_ASSIGN_OR_RAISE(auto p, GcsPath::FromString(path));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"simply" ... hmmm ... Done.

return Status::Invalid(os.str());
case google::cloud::StatusCode::kDeadlineExceeded:
case google::cloud::StatusCode::kNotFound:
// TODO: it is unclear if a better mapping would be possible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest IOError for all external errors (such as kDeadlineExceeded, kPermissionDenied, kUnauthenticated...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

server_process_ = bp::child(boost::this_process::environment(), exe_path, "-m",
"testbench", "--port", port_);

// Create a bucket in the testbench so additional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sentence incomplete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -59,6 +155,11 @@ TEST(GcsFileSystem, FileSystemCompare) {
EXPECT_FALSE(a->Equals(*b));
}

TEST_F(GcsIntegrationTest, MakeBucket) {
auto fs = internal::MakeGcsFileSystemForTest(TestGcsOptions());
ASSERT_OK(fs->GetFileInfo(kPreexistingBucket));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note we have utilities for testing in filesystem/test_util.h that would allow this test to be a bit more interesting, e.g.:

  AssertFileInfo(fs.get(), kPreexistingBucket, FileType::Directory);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@pitrou
Copy link
Member

pitrou commented Oct 12, 2021

The tests hang here in GcsFileSystem.FileSystemCompare. Here is the gdb backtrace:

#0  0x00007fa8e4cc8aff in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fa8e542a769 in ?? () from /lib/x86_64-linux-gnu/libcurl.so.4
#2  0x00007fa8e5424342 in ?? () from /lib/x86_64-linux-gnu/libcurl.so.4
#3  0x00007fa8e5424536 in curl_multi_poll () from /lib/x86_64-linux-gnu/libcurl.so.4
#4  0x00007fa8e541ce24 in curl_easy_perform () from /lib/x86_64-linux-gnu/libcurl.so.4
#5  0x00007fa8e83a4b47 in google::cloud::storage::v1::internal::CurlHandle::EasyPerform (this=0x7ffe44de3958)
    at /build/cpp/google_cloud_cpp_ep-prefix/src/google_cloud_cpp_ep/google/cloud/storage/internal/curl_handle.h:130
#6  0x00007fa8e83a460e in google::cloud::storage::v1::internal::CurlRequest::MakeRequestImpl (this=0x7ffe44de3880)
    at /build/cpp/google_cloud_cpp_ep-prefix/src/google_cloud_cpp_ep/google/cloud/storage/internal/curl_request.cc:125
#7  0x00007fa8e83a4031 in google::cloud::storage::v1::internal::CurlRequest::MakeRequest (this=0x7ffe44de3880, payload="")
    at /build/cpp/google_cloud_cpp_ep-prefix/src/google_cloud_cpp_ep/google/cloud/storage/internal/curl_request.cc:77
#8  0x00007fa8e84598db in google::cloud::storage::v1::oauth2::ComputeEngineCredentials<google::cloud::storage::v1::internal::CurlRequestBuilder, std::chrono::_V2::system_clock>::DoMetadataServerGetRequest (this=0x55c6ae7c7700, path="/computeMetadata/v1/instance/service-accounts/default/", recursive=true)
    at /build/cpp/google_cloud_cpp_ep-prefix/src/google_cloud_cpp_ep/google/cloud/storage/oauth2/compute_engine_credentials.h:147
#9  0x00007fa8e845935b in google::cloud::storage::v1::oauth2::ComputeEngineCredentials<google::cloud::storage::v1::internal::CurlRequestBuilder, std::chrono::_V2::system_clock>::RetrieveServiceAccountInfo (this=0x55c6ae7c7700)
    at /build/cpp/google_cloud_cpp_ep-prefix/src/google_cloud_cpp_ep/google/cloud/storage/oauth2/compute_engine_credentials.h:158
#10 0x00007fa8e845835c in google::cloud::storage::v1::oauth2::ComputeEngineCredentials<google::cloud::storage::v1::internal::CurlRequestBuilder, std::chrono::_V2::system_clock>::Refresh (this=0x55c6ae7c7700) at /build/cpp/google_cloud_cpp_ep-prefix/src/google_cloud_cpp_ep/google/cloud/storage/oauth2/compute_engine_credentials.h:179
#11 0x00007fa8e8456a21 in google::cloud::storage::v1::oauth2::ComputeEngineCredentials<google::cloud::storage::v1::internal::CurlRequestBuilder, std::chrono::_V2::system_clock>::AuthorizationHeader[abi:cxx11]()::{lambda()#1}::operator()() const (this=0x55c6ae7c7700)
    at /build/cpp/google_cloud_cpp_ep-prefix/src/google_cloud_cpp_ep/google/cloud/storage/oauth2/compute_engine_credentials.h:91
#12 0x00007fa8e845874e in google::cloud::storage::v1::oauth2::RefreshingCredentialsWrapper::AuthorizationHeader<google::cloud::storage::v1::oauth2::ComputeEngineCredentials<google::cloud::storage::v1::internal::CurlRequestBuilder, std::chrono::_V2::system_clock>::AuthorizationHeader()::{lambda()#1}>(std::chrono::time_point<std::chrono::_V2::system_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, google::cloud::storage::v1::oauth2::ComputeEngineCredentials<google::cloud::storage::v1::internal::CurlRequestBuilder, std::chrono::_V2::system_clock>::AuthorizationHeader()::{lambda()#1}) const (this=0x55c6ae7c7738, now=..., refresh_fn=...)
    at /build/cpp/google_cloud_cpp_ep-prefix/src/google_cloud_cpp_ep/google/cloud/storage/oauth2/refreshing_credentials_wrapper.h:49
#13 0x00007fa8e8456b2f in google::cloud::storage::v1::oauth2::ComputeEngineCredentials<google::cloud::storage::v1::internal::CurlRequestBuilder, std::chrono::_V2::system_clock>::AuthorizationHeader[abi:cxx11]() (this=0x55c6ae7c7700)
    at /build/cpp/google_cloud_cpp_ep-prefix/src/google_cloud_cpp_ep/google/cloud/storage/oauth2/compute_engine_credentials.h:91
#14 0x00007fa8e8453294 in google::cloud::storage::v1::oauth2::GoogleDefaultCredentials (options=...)
    at /build/cpp/google_cloud_cpp_ep-prefix/src/google_cloud_cpp_ep/google/cloud/storage/oauth2/google_credentials.cc:168
#15 0x00007fa8e844cf70 in google::cloud::storage::v1::internal::Visitor::visit (this=0x7ffe44de4020)
    at /build/cpp/google_cloud_cpp_ep-prefix/src/google_cloud_cpp_ep/google/cloud/storage/internal/unified_rest_credentials.cc:45
#16 0x00007fa8e8741c8c in google::cloud::v1::internal::GoogleDefaultCredentialsConfig::dispatch (this=0x55c6ae7848d0, v=warning: RTTI symbol not found for class 'google::cloud::storage::v1::internal::MapCredentials(std::shared_ptr<google::cloud::v1::Credentials> const&)::Visitor'
...)
    at /build/cpp/google_cloud_cpp_ep-prefix/src/google_cloud_cpp_ep/google/cloud/internal/credentials_impl.h:71
#17 0x00007fa8e874b020 in google::cloud::v1::internal::CredentialsVisitor::dispatch (credentials=..., visitor=warning: RTTI symbol not found for class 'google::cloud::storage::v1::internal::MapCredentials(std::shared_ptr<google::cloud::v1::Credentials> const&)::Visitor'
...)
    at /build/cpp/google_cloud_cpp_ep-prefix/src/google_cloud_cpp_ep/google/cloud/internal/credentials_impl.cc:24
#18 0x00007fa8e844d362 in google::cloud::storage::v1::internal::MapCredentials (
    credentials=std::shared_ptr<class google::cloud::v1::Credentials> (use count 1, weak count 0) = {...})
    at /build/cpp/google_cloud_cpp_ep-prefix/src/google_cloud_cpp_ep/google/cloud/storage/internal/unified_rest_credentials.cc:71
#19 0x00007fa8e832bd74 in google::cloud::storage::v1::internal::DefaultOptionsWithCredentials (opts=...)
    at /build/cpp/google_cloud_cpp_ep-prefix/src/google_cloud_cpp_ep/google/cloud/storage/client_options.cc:230
#20 0x00007fa8e8311112 in google::cloud::storage::v1::Client::Client (this=0x55c6ae7c76d0, opts=...)
    at /build/cpp/google_cloud_cpp_ep-prefix/src/google_cloud_cpp_ep/google/cloud/storage/client.cc:41
#21 0x00007fa8e7e40a9d in arrow::fs::GcsFileSystem::Impl::Impl (this=0x55c6ae7c7690, o=...) at /arrow/cpp/src/arrow/filesystem/gcsfs.cc:85
#22 0x00007fa8e7e45a69 in __gnu_cxx::new_allocator<arrow::fs::GcsFileSystem::Impl>::construct<arrow::fs::GcsFileSystem::Impl, arrow::fs::GcsOptions const&> (
    this=0x7ffe44de439f, __p=0x55c6ae7c7690) at /usr/include/c++/9/ext/new_allocator.h:147
#23 0x00007fa8e7e45746 in std::allocator_traits<std::allocator<arrow::fs::GcsFileSystem::Impl> >::construct<arrow::fs::GcsFileSystem::Impl, arrow::fs::GcsOptions const&> (__a=..., __p=0x55c6ae7c7690) at /usr/include/c++/9/bits/alloc_traits.h:484
#24 0x00007fa8e7e45314 in std::_Sp_counted_ptr_inplace<arrow::fs::GcsFileSystem::Impl, std::allocator<arrow::fs::GcsFileSystem::Impl>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<arrow::fs::GcsOptions const&> (this=0x55c6ae7c7680, __a=...) at /usr/include/c++/9/bits/shared_ptr_base.h:548
#25 0x00007fa8e7e44a70 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<arrow::fs::GcsFileSystem::Impl, std::allocator<arrow::fs::GcsFileSystem::Impl>, arrow::fs::GcsOptions const&> (this=0x55c6ae7c7670, __p=@0x55c6ae7c7668: 0x0, __a=...) at /usr/include/c++/9/bits/shared_ptr_base.h:679
#26 0x00007fa8e7e43f0a in std::__shared_ptr<arrow::fs::GcsFileSystem::Impl, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<arrow::fs::GcsFileSystem::Impl>, arrow::fs::GcsOptions const&> (this=0x55c6ae7c7668, __tag=...) at /usr/include/c++/9/bits/shared_ptr_base.h:1344
#27 0x00007fa8e7e433a1 in std::shared_ptr<arrow::fs::GcsFileSystem::Impl>::shared_ptr<std::allocator<arrow::fs::GcsFileSystem::Impl>, arrow::fs::GcsOptions const&> (
    this=0x55c6ae7c7668, __tag=...) at /usr/include/c++/9/bits/shared_ptr.h:359
#28 0x00007fa8e7e42a33 in std::allocate_shared<arrow::fs::GcsFileSystem::Impl, std::allocator<arrow::fs::GcsFileSystem::Impl>, arrow::fs::GcsOptions const&> (__a=...)
    at /usr/include/c++/9/bits/shared_ptr.h:702
#29 0x00007fa8e7e41ad3 in std::make_shared<arrow::fs::GcsFileSystem::Impl, arrow::fs::GcsOptions const&> () at /usr/include/c++/9/bits/shared_ptr.h:718
#30 0x00007fa8e7e3ed70 in arrow::fs::GcsFileSystem::GcsFileSystem (this=0x55c6ae7c7620, options=..., context=...) at /arrow/cpp/src/arrow/filesystem/gcsfs.cc:202
#31 0x00007fa8e7e3ede8 in arrow::fs::internal::MakeGcsFileSystemForTest (options=...) at /arrow/cpp/src/arrow/filesystem/gcsfs.cc:209
#32 0x000055c6ac5fca88 in arrow::fs::(anonymous namespace)::GcsFileSystem_FileSystemCompare_Test::TestBody (this=0x55c6ae784800)
    at /arrow/cpp/src/arrow/filesystem/gcsfs_test.cc:145
#33 0x00007fa8e55d1634 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (object=0x55c6ae784800, method=&virtual testing::Test::TestBody(), location=0x7fa8e55e857b "the test body") at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2607
#34 0x00007fa8e55c932b in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0x55c6ae784800, method=&virtual testing::Test::TestBody(), location=0x7fa8e55e857b "the test body") at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2643
#35 0x00007fa8e559d5ec in testing::Test::Run (this=0x55c6ae784800) at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2682
#36 0x00007fa8e559e02e in testing::TestInfo::Run (this=0x55c6ae7c5870) at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2861
#37 0x00007fa8e559e948 in testing::TestSuite::Run (this=0x55c6ae7c5470) at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:3015
#38 0x00007fa8e55ae3f6 in testing::internal::UnitTestImpl::RunAllTests (this=0x55c6ae7c51d0) at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:5855
#39 0x00007fa8e55d2b62 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x55c6ae7c51d0, method=(bool (testing::internal::UnitTestImpl::*)(class testing::internal::UnitTestImpl * const)) 0x7fa8e55adfc2 <testing::internal::UnitTestImpl::RunAllTests()>, location=0x7fa8e55e9068 "auxiliary test code (environments or event listeners)") at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2607
#40 0x00007fa8e55ca569 in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x55c6ae7c51d0, method=(bool (testing::internal::UnitTestImpl::*)(class testing::internal::UnitTestImpl * const)) 0x7fa8e55adfc2 <testing::internal::UnitTestImpl::RunAllTests()>, location=0x7fa8e55e9068 "auxiliary test code (environments or event listeners)") at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2643
#41 0x00007fa8e55acb0d in testing::UnitTest::Run (this=0x7fa8e5616ba0 <testing::UnitTest::GetInstance()::instance>) at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:5438
#42 0x00007fa8e561829a in RUN_ALL_TESTS () at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/include/gtest/gtest.h:2490
#43 0x00007fa8e561821c in main (argc=1, argv=0x7ffe44de4ba8) at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest_main.cc:52
#44 0x00007fa8e4bda0b3 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#45 0x000055c6ac5fb0de in _start ()

@coryan
Copy link
Contributor Author

coryan commented Oct 12, 2021

Can we set ARROW_GCS=ON like ARROW_S3=ON in our CI?

Which ones? Some of them already have these settings, e.g.:

https://github.com/apache/arrow/pull/11375/checks?check_run_id=3863850315#step:8:2266

How large is the slowdown? Compiling embedded libraries from scratch adds to CI delays.

That obviously depends on how much caching is going on and what the CI machine specs are. For reference, I used vcpkg and only 4 cores to "benchmark" things. I can build google-cloud-cpp[core,storage] in 3 minutes (both the debug and release). Its dependencies take another 3 minutes, most likely you already have some of those (e.g. zlib, curl, openssl, and even abseil for gRPC).

@pitrou
Copy link
Member

pitrou commented Oct 12, 2021

The tests hang here in GcsFileSystem.FileSystemCompare. Here is the gdb backtrace:

I've now run strace on the test process and it seems it spawns a child process that tries to resolve metadata.google.internal. That's where it hangs.

@kou
Copy link
Member

kou commented Oct 12, 2021

Some of them already have these settings

Oh, sorry. I missed them.

@coryan
Copy link
Contributor Author

coryan commented Oct 13, 2021

I think I addressed all the comments, please let me know if I missed something.

@pitrou
Copy link
Member

pitrou commented Oct 13, 2021

Hmm, gcfs_test is not hanging anymore but it's still taking a lot of time:

[ RUN      ] GcsFileSystem.FileSystemCompare
[       OK ] GcsFileSystem.FileSystemCompare (20025 ms)

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 (but would be nice to eliminate the slowness in GcsFileSystem.FileSystemCompare)

@coryan
Copy link
Contributor Author

coryan commented Oct 13, 2021

Hmm, gcfs_test is not hanging anymore but it's still taking a lot of time:

[ RUN      ] GcsFileSystem.FileSystemCompare
[       OK ] GcsFileSystem.FileSystemCompare (20025 ms)

Ugh, that looks painful. On my workstation that particular test completes in milliseconds:

[ RUN      ] GcsFileSystem.FileSystemCompare
[       OK ] GcsFileSystem.FileSystemCompare (5 ms)

And the full test with with archery docker run ubuntu-cpp takes about 3 seconds:

54/85 Test #49: arrow-gcsfs-test ..........................   Passed    3.05 sec

Which is to say: this would probably take some more effort to troubleshoot. I will open a separate bug.

@coryan
Copy link
Contributor Author

coryan commented Oct 13, 2021

I will open a separate bug.

https://issues.apache.org/jira/browse/ARROW-14311

@pitrou
Copy link
Member

pitrou commented Oct 13, 2021

Thank you, I'll merge this one then.

@pitrou
Copy link
Member

pitrou commented Oct 13, 2021

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: f3edc48

Submitted crossbow builds: ursacomputing/crossbow @ actions-935

Task Status
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-debian-11-cpp Github Actions
test-fedora-33-cpp Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions

@pitrou pitrou changed the title ARROW-14236: [C++] add GCS testbench for testing ARROW-14236: [C++] Add GCS testbench for testing Oct 13, 2021
@pitrou pitrou closed this in 74caee2 Oct 13, 2021
@ursabot
Copy link

ursabot commented Oct 13, 2021

Benchmark runs are scheduled for baseline = 42b0665 and contender = 74caee2. 74caee2 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.85% ⬆️0.0%] ursa-thinkcentre-m75q
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

@coryan coryan deleted the ARROW-14236-install-gcs-testbench branch October 13, 2021 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants