Skip to content

Conversation

@tongsucn
Copy link
Contributor

@tongsucn tongsucn commented Oct 24, 2022

Fixes #65

Motivation

Improve data consuming performance by adding new interfaces on class Message and class SharedBuffer.

Modifications

Adding two functions:

lib/SharedBuffer.h

void SharedBuffer::pop(std::string &target)

pulsar/Message.h, lib/Message.cc

std::string Message::release()

Verifying this change

This change added tests and can be verified as follows:

Run client's unit test, it's added in MessageTest.cc.

Documentation

  • doc-required
    (Your PR needs to update docs and you will update later)

  • doc-not-needed
    (Please explain why)

  • doc
    (Your PR contains doc changes)

  • doc-complete
    (Docs have been already added)

Copy link
Contributor

@BewareMyPower BewareMyPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about returning a value? e.g.

std::string releaseData();

Though the output parameter is widely used in Pulsar C++ client, it's because the returned value is Result. In this case, using an output parameter looks ugly IMO.

Copy link
Member

@RobertIndie RobertIndie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some unit tests.

@shibd
Copy link
Member

shibd commented Oct 28, 2022

@tongsucn Please rebase the main branch to pass the failed job.

@tongsucn tongsucn force-pushed the tongsucn_msg_improvement branch from 5df859a to f99c37a Compare October 28, 2022 10:10
@tongsucn
Copy link
Contributor Author

Please add some unit tests.

Done

@tongsucn tongsucn requested review from BewareMyPower and RobertIndie and removed request for BewareMyPower October 31, 2022 10:13
@tongsucn tongsucn requested review from BewareMyPower and removed request for RobertIndie November 1, 2022 02:50
ASSERT_FALSE(strncmp("content", (char*)msg.getData(), msg.getLength()));
ASSERT_EQ(content, (char*)msg.getData());

ASSERT_THROW(msg.releaseData(), std::runtime_error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test case for releasing the data successfully and verify the data is actually swapped.

@tongsucn
Copy link
Contributor Author

tongsucn commented Nov 1, 2022

@BewareMyPower @shibd @RobertIndie I found a problem on the new interface:

When batching is enabled on producer side, Consumer::receive actually creates a shallow copy of the whole message batch and sets reading offset (i.e. SharedBuffer::readIdx_) via SharedBuffer::slice in Commands::deSerializeSingleMessageInBatch.

It means that the messages from a batch cannot be moved into another std::string without exposing the reading offset and the knowledge on batching. Therefore I think the current implementation of Message::releaseData or Message::releaseData itself is not a good design. If you have any better ideas, please tell me, or just close the issue and PR.

@BewareMyPower
Copy link
Contributor

@tongsucn Good point. I think it's determined by how do you use that.

If you only want a view of the Message's value, you can create an instance std::string_view (or a similar class). But you should be careful that the lifetime is the same as Message, i.e. you must keep the Message objects in memory to make std::string_view valid.

If you want some extra in place operations, I think you can expose the non-const data() override.

const void* getData() const;
void* getData();  // new interface

If your process function in application just accepts a std::string, I'm afraid you have to change it to std::string_view.

@BewareMyPower
Copy link
Contributor

I just found the batchReceive method added in #21. Maybe if the producer enables batching, the consumer must combine batchReceive and the releaseData APIs. In this case, I prefer not throwing an exception when data_ is nullptr. Instead, fallback to copying the data rather than moving.

@tongsucn
Copy link
Contributor Author

I just found the batchReceive method added in #21. Maybe if the producer enables batching, the consumer must combine batchReceive and the releaseData APIs. In this case, I prefer not throwing an exception when data_ is nullptr. Instead, fallback to copying the data rather than moving.

Well, I think the result of batchReceive may contain data from multiple message batches, which make the situation more complicated. Falling back without telling caller may confuse the callers.

Under current design, pulsar::Message is actually an agent of the message data rather than a container. An agent should NOT give the ownership of the data to others, because it actually DOES NOT OWN the data. Therefore, I think interfaces like releaseData should be avoided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improve] Add interfaces in class Message to avoid unnecessary memory copy.

4 participants