Skip to content

Conversation

@lidavidm
Copy link
Member

@lidavidm lidavidm commented Feb 16, 2022

This PR implements a UCX-based transport for Arrow Flight in C++ based on the transport interfaces added in ARROW-15282. Currently, it supports DoExchange/DoGet/DoPut (i.e., all the data methods) and GetFlightInfo. It supports multiple concurrent calls on a single client, like gRPC (though with caveats; see the documentation) and can run the Flight benchmark.

@lidavidm
Copy link
Member Author

lidavidm commented Feb 16, 2022

TODOs

Prerequisite JIRAs:

Remaining work here:

  • Implement (or explicitly deny) concurrent client calls
  • Finish implementing DoPut/DoExchange
  • Document the wire protocol here
  • Document caveats here
  • Split out gRPC changes to slim down this PR
  • Provide benchmark figures

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@lidavidm lidavidm changed the title ARROW-15706: [C++][FlightRPC] Implement a UCX transport ARROW-15706: [C++][FlightRPC] WIP: Implement a UCX transport Feb 16, 2022
@lidavidm lidavidm force-pushed the flight-ucx branch 4 times, most recently from e67b653 to 0a10ba1 Compare February 23, 2022 19:41
@lidavidm lidavidm force-pushed the flight-ucx branch 7 times, most recently from 31f89ca to 9bce9d3 Compare March 9, 2022 18:52
@lidavidm
Copy link
Member Author

Docs: implementation status table

image

Docs: UCX transport section

image

@lidavidm lidavidm changed the title ARROW-15706: [C++][FlightRPC] WIP: Implement a UCX transport ARROW-15706: [C++][FlightRPC] Implement a UCX transport Mar 14, 2022
@lidavidm lidavidm force-pushed the flight-ucx branch 3 times, most recently from 839109c to 3a0279e Compare March 17, 2022 19:28
@lidavidm lidavidm marked this pull request as ready for review March 17, 2022 19:30
@cyb70289 cyb70289 self-requested a review March 18, 2022 01:53
@cyb70289
Copy link
Contributor

Thanks @lidavidm !
I'll do some tests and benchmarks and give feedback, probably in next week.

@cyb70289
Copy link
Contributor

cc @shamisp if you are interested.

@cyb70289
Copy link
Contributor

cyb70289 commented Apr 5, 2022

arrow-threading-utility-test failed on macos: https://github.com/apache/arrow/runs/5818260165?check_suite_focus=true#step:8:420
The nested fork test is supposed to be skipped on macos, but looks we reversed the condtion check: https://github.com/apache/arrow/blob/91c62bdb7e9b802009ac96fedd1985377d00d585/cpp/src/arrow/util/thread_pool_test.cc#L725
cc @westonpace
Please note this PR is not rebased to tip, line number of thread_pool_test.cc doesn't match current code.

Copy link
Contributor

@cyb70289 cyb70289 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with current code.
We can leave TODOs and other improvements as follow up tasks.

Unfortunately I'm not able to benchmark this PR soon and there's some issue of the lab test network but I cannot go to office recently.

@lidavidm
Copy link
Member Author

lidavidm commented Apr 5, 2022

@cyb70289 thanks for all the feedback. I'll file JIRA issues for any TODOs before merging.

@lidavidm
Copy link
Member Author

lidavidm commented Apr 5, 2022

TODOs filed (and noted in the source)

ARROW-10787 [C++][Flight] DoExchange doesn't support dictionary replacement
ARROW-15756 [C++][FlightRPC] Benchmark in-process Flight performance
ARROW-15835 [C++][FlightRPC] Refactor auth, middleware into the transport-agnostic layer
ARROW-15836 [C++][FlightRPC] Refactor remaining methods into transport-agnostic handlers
ARROW-16069 [C++][FlightRPC] Refactor error statuses/codes into the transport-agnostic layer
ARROW-16124 [C++][FlightRPC] UCX server should be able to shed load
ARROW-16125 [C++][FlightRPC] Implement shutdown with deadline for UCX
ARROW-16126 [C++][FlightRPC] Pipeline memory allocation/registration
ARROW-16127 [C++][FlightRPC] Improve concurrent call implementation in UCX client
ARROW-16135 [C++][FlightRPC] Investigate TSAN with gRPC/UCX tests

@cyb70289
Copy link
Contributor

cyb70289 commented Apr 6, 2022

I see many TSAN errors from flight tests (I tested on an Arm server).
Looks we didn't enable TSAN checks for flight tests. Maybe clean it up in a follow up task.

  • Many data race warnings from arrow-flight-test
  • arrow-flight-transport-ucx-test segfaults (from inside ucx library)

@lidavidm
Copy link
Member Author

lidavidm commented Apr 6, 2022

Hmm, indeed. Filed ARROW-16135.

It's old, but grpc/grpc#16749 claims you need to build gRPC itself with TSAN. Maybe this also applies to UCX.

@lidavidm lidavidm closed this in 542158f Apr 7, 2022
@lidavidm lidavidm deleted the flight-ucx branch April 7, 2022 17:31
@ursabot
Copy link

ursabot commented Apr 8, 2022

Benchmark runs are scheduled for baseline = e5072bd and contender = 542158f. 542158f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.21% ⬆️0.0%] test-mac-arm
[Failed ⬇️0.36% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.04% ⬆️0.04%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/464| 542158fa ec2-t3-xlarge-us-east-2>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/449| 542158fa test-mac-arm>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/450| 542158fa ursa-i9-9960x>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/459| 542158fa ursa-thinkcentre-m75q>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/463| e5072bda ec2-t3-xlarge-us-east-2>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/448| e5072bda test-mac-arm>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/449| e5072bda ursa-i9-9960x>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/458| e5072bda ursa-thinkcentre-m75q>
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants