Skip to content

Conversation

@carols10cents
Copy link
Contributor

@carols10cents carols10cents commented Dec 18, 2020

(repeating context from the Jira issue)

Problem

ProtoBuf proto3 specifies that if a message does not contain a particular singular element, the field should get the default value. However, when the C++ flight-test-integration-server gets a DoPut request with a FlightData message for a record batch containing no items, and the FlightData is missing the data_body field, the server responds with an error "Expected body in IPC message of type record batch".

What happens

If I run the C++ flight-test-integration-server and the C++ flight-test-integration-client with the generated_null_trivial test case, the test passes and I see this in wireshark:

cpp-client-empty-data-body

Note the data_body field is present but has no value.

If I run the Rust flight-test-integration-client that I'm working on developing, it does not send the data_body field at all if there are no bytes to send. I see this in wireshark:

rust-client-missing-data-body

Note the data_body field is not present.

The C++ server then returns the error message Expected body in IPC message of type record batch, which comes from this check for message body called in ReadNext of the record batch stream reader.

What I expect to happen

Instead of returning an error message because of a null pointer, the Message should get the default value of empty bytes.

The fix

@shepmaster and I worked on this fix together, but I'm not at all confident this is the exact right fix.

It's hard for me to trace through the code from a FlightData data_body to an IPC Message body, but this does fix the problem. I'm also not sure what other cases this change might affect.

I don't know how to write a unit test for this because the C++ code doesn't generate this case, but there will be a test coming in the form of a Rust flight integration test client.

Feedback much appreciated!

@github-actions
Copy link

@lidavidm
Copy link
Member

Thanks for catching & taking a look at this!

I think, rather than change the IPC class, we should make the FlightData Protobuf deserializer act more like a real Protobuf parser, and fill in an empty body instead of leaving the buffer as null if there was not a buffer read from the wire. That should probably happen here:

// TODO(wesm): Where and when should we verify that the FlightData is not
// malformed or missing components?

I'll admit that even having written the code, it's hard to trace the path from here to where you currently have the fix...

Here is where we invoke the deserializer:

bool ReadPayload(grpc::ClientReader<pb::FlightData>* reader, FlightData* data) {
// Pretend to be pb::FlightData and intercept in SerializationTraits
return reader->Read(reinterpret_cast<pb::FlightData*>(data));
}

That in turn is called from a peekable reader:
if (!internal::ReadPayload(&*stream_, &peek_)) {

The reader fills in a FlightData which gets passed to the RecordBatchReader here:
RETURN_NOT_OK(batch_reader_->ReadNext(&out->data));

Calling ReadNext implicitly bounces through this glue class:
class FlightIpcMessageReader : public ipc::MessageReader {
public:
explicit FlightIpcMessageReader(
std::shared_ptr<internal::PeekableFlightDataReader<Reader*>> peekable_reader,
std::shared_ptr<Buffer>* app_metadata)
: peekable_reader_(peekable_reader), app_metadata_(app_metadata) {}
::arrow::Result<std::unique_ptr<ipc::Message>> ReadNextMessage() override {
if (stream_finished_) {
return nullptr;
}
internal::FlightData* data;
peekable_reader_->Next(&data);
if (!data) {
stream_finished_ = true;
if (first_message_) {
return Status::Invalid(
"Client provided malformed message or did not provide message");
}
return nullptr;
}
*app_metadata_ = std::move(data->app_metadata);
return data->OpenMessage();
}

Which finally actually constructs the IPC Message in this PR:
::arrow::Result<std::unique_ptr<ipc::Message>> FlightData::OpenMessage() {
return ipc::Message::Open(metadata, body);
}

@lidavidm
Copy link
Member

And FWIW, there's a long-standing issue (ARROW-4419) for testing the Flight implementation against a 'plain gRPC' client/server to catch issues like this; we intend for Flight to still be compatible with regular gRPC clients that haven't been specially optimized.

@carols10cents
Copy link
Contributor Author

@lidavidm Thank you for the pointers! That's very helpful; I'm going to try making the fix in the spot you suggested instead and push an update in a bit.

@lidavidm
Copy link
Member

Note I took a look at the Java side of this and the fix may not be so simple, since some code paths in Java expect there to be 0 buffers, and providing a default buffer breaks them; C++ may have the same issue. (Those paths are arguably also wrong, since Protobuf implementations can write the empty field...)

@carols10cents carols10cents force-pushed the cpp-default-data-body branch 2 times, most recently from df24005 to 71cd7bd Compare December 18, 2020 19:33
@carols10cents
Copy link
Contributor Author

I made the same sort of fix in the spot you pointed out, to each of the 3 byte fields in FlightData. What do you think?

Note I took a look at the Java side of this and the fix may not be so simple, since some code paths in Java expect there to be 0 buffers, and providing a default buffer breaks them; C++ may have the same issue.

Yuck. Should I open a new Jira ticket to track the Java side or should this ticket cover the same problem in the Java and C++ implementations?

(Those paths are arguably also wrong, since Protobuf implementations can write the empty field...)

Hm, isn't the C++ client currently writing the empty field? Or is it writing the field to null, rather than an empty list of bytes?

@lidavidm
Copy link
Member

I made the same sort of fix in the spot you pointed out, to each of the 3 byte fields in FlightData. What do you think?

Should be good once CI passes, thank you!

Note I took a look at the Java side of this and the fix may not be so simple, since some code paths in Java expect there to be 0 buffers, and providing a default buffer breaks them; C++ may have the same issue.

Yuck. Should I open a new Jira ticket to track the Java side or should this ticket cover the same problem in the Java and C++ implementations?

I filed ARROW-10939 and am looking at it right now. I also cross-linked the three outstanding issues on Jira.

(Those paths are arguably also wrong, since Protobuf implementations can write the empty field...)

Hm, isn't the C++ client currently writing the empty field? Or is it writing the field to null, rather than an empty list of bytes?

Sorry, I was talking about the Java implementation - unilaterally filling in an empty buffer on deserialization when not provided there broke other code which assumed there would be no buffer instead of an empty one. (So what I meant is that while the original code would be fine with any Protobuf implementation that doesn't write out empty fields, it would break with one that did write them out.)

@carols10cents
Copy link
Contributor Author

Sorry, I was talking about the Java implementation - unilaterally filling in an empty buffer on deserialization when not provided there broke other code which assumed there would be no buffer instead of an empty one. (So what I meant is that while the original code would be fine with any Protobuf implementation that doesn't write out empty fields, it would break with one that did write them out.)

Yes, I meant that because the C++ client is currently a Protobuf implementation that is writing out an empty field (I think), wouldn't the archery integration tests with the Java server and C++ client on the generated_null test currently be failing?

@lidavidm
Copy link
Member

Sorry, I was talking about the Java implementation - unilaterally filling in an empty buffer on deserialization when not provided there broke other code which assumed there would be no buffer instead of an empty one. (So what I meant is that while the original code would be fine with any Protobuf implementation that doesn't write out empty fields, it would break with one that did write them out.)

Yes, I meant that because the C++ client is currently a Protobuf implementation that is writing out an empty field (I think), wouldn't the archery integration tests with the Java server and C++ client on the generated_null test currently be failing?

Ah sorry - C++ is fine because it only writes out those bytes conditionally, for message types that require a body (e.g. a RecordBatch). But if an implementation wrote out an (empty) body for a Schema, the Java client/server would break.

Comment on lines 379 to 385
// Set default values for unspecified FlightData fields
if (out->app_metadata == nullptr) {
out->app_metadata = std::make_shared<Buffer>(nullptr, 0);
}
if (out->metadata == nullptr) {
out->metadata = std::make_shared<Buffer>(nullptr, 0);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test failures might come from this, actually - if there's a metadata buffer, that causes some code to assume this must be a schema or record batch and try to parse it accordingly (but it'll then encounter the empty buffer and fail to parse it).

Also, the app_metadata buffer can be omitted - it's meant to be optional so having an empty vs null one is no big deal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops! So much for trying to be overly helpful 😅 Pushed a change so this just handles body.

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

The three CI failures look like build environment flakes (and are failing on master).

@lidavidm lidavidm changed the title ARROW-10960: [C++] [Flight] Default to empty buffer instead of null ARROW-10960: [C++][FlightRPC] Default to empty buffer instead of null Dec 18, 2020
@lidavidm lidavidm closed this in 5c15af9 Dec 18, 2020
@carols10cents carols10cents deleted the cpp-default-data-body branch December 18, 2020 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants