[Issue#4110] [component/functions] Adding message as source or input of Function #4127

ConcurrencyPractitioner · 2019-04-25T01:18:16Z

Resolves Issue #4110

Motivation

The message metadata that Pulsar Function uses is unavailable to the user. Consequently, they could not use this metadata for their own computations. We wish to expose this metadata.

Modifications

After discussion in the issue, it has been agreed that adding a new getActualMessage() method to PulsarRecord will help fix this problem.

ConcurrencyPractitioner · 2019-04-25T23:41:49Z

ping @jerrypeng @sijie

sijie

Do I miss anything in the PR? I only see you changed PulsarRecord. How can people use this interface? Also can you add an example function on how people can use this method?

ConcurrencyPractitioner · 2019-04-26T03:19:15Z

@sijie Oh, well, wasn't this what was proposed in the issue?

ConcurrencyPractitioner · 2019-04-26T05:20:02Z

Alright, @sijie I added a getCurrentMessage() method (I think in line with your approach 2).

ConcurrencyPractitioner · 2019-04-26T23:53:38Z

Retest this please.

sijie

@ConcurrencyPractitioner

Neither ContextImpl.java nor PulsarRecord.java is a public interface class. How can people use this method? Don't you need to add this method to some interfaces?

The reason I asked you to provide a Pulsar Function example is to show the developers how to use this method.

ConcurrencyPractitioner · 2019-04-27T00:20:28Z

@sijie. Oh, didn't notice that we were in internals at first. In that case, we would just have to add getCurrentMessage() to the Context interface then. Since that interface is technically a public API, it should be accessible.

srkukarni · 2019-04-27T00:25:57Z

pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/instance/ContextImpl.java


+    @Override
+    public Object getCurrentMessage() {
+        return ((PulsarRecord<?>) record).getActualMessage();


this should just be record.getActualMessage?

Oh, well, it was getActualMessage() for PulsarRecord, but in ContextImpl, I think getCurrentMessage() was proposed to be used instead. I just took the name from what was proposed in the issue.

ConcurrencyPractitioner · 2019-04-27T00:28:15Z

The user I think should be able to retrieve the current message as needed. (which should help resolve the problem which the issue initially posed, i.e. message was unavailable to the user for computation).

sijie · 2019-04-27T03:12:16Z

pulsar-functions/api-java/src/main/java/org/apache/pulsar/functions/api/Context.java

+    /**
+     * Access the message associated with current input value.
+     */
+    Object getCurrentMessage();


@ConcurrencyPractitioner

what @jerrypeng proposed in #4110 "we could also add a getActualMessage() method to PulsarRecord" is the right direction. I guess what he means there is "add getAcutalMessage() to Record". because PulsarRecord is an internal class which is not publicly available to the developers who develop a function.

I asked the question "can you add an example function on how people can use this method?", is to help you think from a function developer perspective. You have to be a pulsar function user before you know how to make a good API to pulsar functions users. If you write a pulsar function example to use the method you proposed, you will know whether the method is good or not.

You can find the examples under https://github.com/apache/pulsar/tree/master/pulsar-functions/java-examples, which demonstrate how users can use the function API to develop functions. My suggestion to you is to write a pulsar function example first. It would help you a lot on understanding why we need this change and how to provide the right api to the developers.

Now back to the discussion on the interface itself.

I don't think we should add another getCurrent* interface in Context. It makes the interface very confused because there is already a getCurrentRecord. Hence it should be getActualMessage in Context as what @jerrypeng proposed.

Returning Object doesn't help resolving the problem. The function-api dependency doesn't include pulsar-client-api, so developers don't know what type is this object and they can not cast it back to Message<T>.

That means in order to implement the proposal that @jerrypeng proposed in #4110 , you have to do followings:

add Message<T> getActualMessage() to Record<T> interface.

introduce pulsar-client-api dependency to pulsar-function-api.

If @srkukarni and @jerrypeng agree on introducing pulsar-client-api dependency to pulsar-function-api, then you can implement that as what I point out, and #4110 is done.

However if there are concerns about the pulsar-client-api dependency in pulsar-function-api, then we have to go back and check my original proposal to add the support of using Message<T> as input type. Hence we don't need to include pulsar-client-api as pulsar-function-api, and it also supports people writing functions using java native Function interface.

Oh, yeah. I was stupid for not understanding what was required. Sorry about that. I took a look at PulsarFunctions examples and it helped. Added a use case in examples.

ConcurrencyPractitioner · 2019-04-27T20:37:42Z

Retest this please.

sijie · 2019-04-28T02:03:50Z

@ConcurrencyPractitioner @srkukarni

if pulsar-client-api is not a dependency of pulsar-function-api, how can a function user cast it to Message?

If you are asking people to cast to Message<T>, why not adopt my original proposal in #4110 which is way cleaner that returning Object and asking users to cast Message<T>.

ConcurrencyPractitioner · 2019-04-28T22:18:31Z

Oh, I'm fine with a change in approach. Any thoughts @srkukarni?

jerrypeng · 2019-04-29T00:39:55Z

@ConcurrencyPractitioner @sijie @srkukarni

Approaches I thought of:

If we don't want to pull in pulsar-client-api and we want to the users to be able to get all of the properties of a Message, then we will have to add the corresponding methods to the Record interface to expos all the properties.

Pros: pulsar-function-api do not need to depend on pulsar-client-api

Cons: As @sijie mentioned before, we need properties/methods are add to Message, we will also need to add them to the Record interface. However, my biggest concern with this approach is that we already have methods like getRecordSequence in the Record interface (which is different then the message sequence id though they return different forms of the same thing). We will have to add another method getSequenceId to the Record interface which will be confusing. Thus, I am not for this method

We add an additional method to PulsarRecord i.e. public Message<T> getRawMessage(). Add an method to the Context interface i.e. pubic Message<T> getRawInputMessage. In the implementation of the method, we will just cast Record to PulsarRecord which is assumed for functions anyways.

Pros: Simple addition

Cons: pulsar-function-api need to depend on pulsar-client-api. Though this is not a deal breaker for me.

sijie · 2019-04-29T00:43:30Z

@jerrypeng have you checked my initial proposal? the approaches you mentioned are not the proposal I proposed initially in #4110

jerrypeng · 2019-04-29T00:45:33Z

@sijie want is your point?

sijie · 2019-04-29T00:49:14Z

my original proposal provides a cleaner approach. we don't have to deal with the pulsar-client-api and pulsar-function-api dependency and people doesn't have to cast to an PulsarRecord to get the actual message. Things are handled properly and gracefully.

jerrypeng · 2019-04-29T00:51:51Z

@sijie first of all I am not proposing user's casting Record -> PulsarRecord. We can do that internally in the implementation and just return to the user the Message interface.

@sijie the approach you suggest will still require the user to pull in the pulsar-client-api.

jerrypeng · 2019-04-29T00:59:50Z

@sijie I am also ok will your approach if everyone one else is onboard

sijie · 2019-04-29T04:16:31Z

the approach you suggest will still require the user to pull in the pulsar-client-api.

@jerrypeng

it is different from adding a dependency to pulsar-function-api though. In my approach, pulsar-client-api will be treated as "user function dependency".

srkukarni · 2019-04-29T18:59:58Z

My biggest concern is about adding pulsar-client dep to functions interface. My inclination would be to keep the interfaces seperate as much as possible.
@sijie just thinking aloud here:- there might be specialization interfaces that might have closer pulsar integration. Just like windowing, we could have your MessageFunction api, but keep it at a user layer?

sijie · 2019-04-29T23:40:43Z

Sanjeev

Just like windowing, we could have your MessageFunction api, but keep it at a user layer?

My original proposal doesn’t change function api at all. It is a runtime change to support user behavior.

jerrypeng · 2019-04-30T18:05:20Z

@sijie @srkukarni though for this PR #4093

It's already trying to add pulsar-client-api as a dependency of pulsar-functions-api

jerrypeng · 2019-04-30T18:30:27Z

If that PR is going in then or discussion about whether to add pulsar-client-api as a dependency of pulsar-functions-api becomes irrelevant.

I guess we should discuss whether or not adding pulsar-client-api as a dependency of the functions api is appropriate or not.

Ideally we should try keep these separate so we don't create more a of mess of dependencies.

What are (if any) the potential dependency related problems we would see if this happens?

sijie · 2019-04-30T19:13:59Z

If we are going to support publishing Messages in function or retrieving a Message instance (for accessing the full list of metadata associated with a message), we have to include pulsar-client-api as an dependency of pulsar-function-api. So my vote here will just add the dependency for both #4127 and #4042 .

For 4127, I still don't think casting is a good approach. If a record can be a non-pulsar-message record (because we used same abstraction for source connectors), my vote here will be providing the support of using Message<T> as the generic type for Function Input Type, hence users don't have to cast.

jerrypeng · 2019-04-30T19:21:02Z

@sijie I am ok with adding pulsar-client-api as a dependency of the function-api. I can't think of a problem that may cause.

jerrypeng · 2019-04-30T19:24:59Z

However, now we will have multiple ways to get the same data. One from Message and the other from context.getRecord. This is not an ideal situation.

What about Python runtime ? There is no way for a user to specify a Message type there. If we were to support this for python, it will most likely need to be exposed with context. I would like to see some uniformity in how we do this across languages though.

sijie · 2019-04-30T20:03:26Z

If we were to support this for python, it will most likely need to be exposed with context. I would like to see some uniformity in how we do this across languages though.

It is some not all. Most of the features available in Java Client / Function is specific to Java, because Java supports Generic Type. Most of other languages like Python and Go don't support Generic Type. The approach I am proposing is using Generic Type, which is specific to Java.

The concern I have with exposing Message in Java Record is casting. Because the Java function code is also used for pulsar-io (source and sink). But there are not source and sinks in Python or Go.

Hence IMO it is really hard to achieve uniformity across the languages from many aspects.

Also, the original motivation of functions is to let user write the function in a native way as how they write a normal function in their preferred language. If that still stands, we should just consider the most native approach for each language rather than choosing the uniformity. If you look into severless world, you will find that it is really hard to get the uniformity across languages.

jerrypeng · 2019-05-13T04:52:56Z

@ConcurrencyPractitioner @sijie @srkukarni lets continue progress in this as there are users waiting for this feature. I am for with allowing functions to has Message<T> as an input as @sijie suggested. Is everyone ok with that?

sijie · 2019-05-14T20:28:20Z

I am for with allowing functions to has Message as an input as @sijie suggested. Is everyone ok with that?

+1 from me

ConcurrencyPractitioner · 2019-05-15T02:35:52Z

Cool, I'm fine with implementing that. @srkukarni I guess if you don't have any problems with this, then we could go ahead and get started.

ConcurrencyPractitioner · 2019-05-15T02:52:17Z

Oh, then with this approach, what are the high level steps in adding Message as an acceptable input format?

srkukarni · 2019-05-15T06:14:51Z

Since we have already added client-api dep on functions-api, i will withdraw my concerns about dep. so please go ahead. One thing that also needs to be done is allow Message in windowing api as well.

ConcurrencyPractitioner · 2019-05-16T02:32:54Z

Alright, so I did some digging, and I have some ideas on how to do it. But I just want to make sure that my thoughts on the implementation is right. @srkukarni @sijie or @jerrypeng Do you mind explaining the high level steps involved in coding this approach? It doesn't have to be long, maybe just some pointers.

jerrypeng · 2019-05-16T17:54:14Z

@sijie @srkukarni if we go with this approach and a user has something like the following:

public class MyFunction implements Function<Message<InputType>, OutputType> {

public void process(Message<InputType>, Context context) {
....
}

}

what should go into the Protobuf field in the SourceSpec typeClassName? Message.class? But then we don't store the actual type of the input.

srkukarni · 2019-05-23T17:09:49Z

@jerrypeng in the protobuf, we should store inputtype. The rest of the logic handling should be inside the code.

jerrypeng · 2019-05-23T17:49:02Z

@sijie @srkukarni @ConcurrencyPractitioner I think we can close this?

I think everyone is in favor of just doing this #4341 instead

sijie · 2019-06-20T01:22:52Z

close this issue since #4341 has already implemented it

ConcurrencyPractitioner added 2 commits April 24, 2019 18:15

Adding getActualMessage() method

ae092d5

Removing minor details

1bb9eec

Removing unnecessary import

9ac36c3

srkukarni approved these changes Apr 26, 2019

View reviewed changes

sijie requested changes Apr 26, 2019

View reviewed changes

Adding method

f6e9ed2

Modifying return type

ad0079a

sijie reviewed Apr 27, 2019

View reviewed changes

Making some changes

253cb64

srkukarni reviewed Apr 27, 2019

View reviewed changes

sijie requested changes Apr 27, 2019

View reviewed changes

ConcurrencyPractitioner added 2 commits April 26, 2019 21:02

Making some changes to better suit approach

af5c6bf

Making some changes

7cbaf89

Changing to object and removinng depedency

a956cb9

ConcurrencyPractitioner added 2 commits April 27, 2019 22:22

changes

62adbed

Fixing bug

0c8488b

ConcurrencyPractitioner changed the title ~~[Issue#4110] [component/functions] Adding getActualMessage() method~~ [Issue#4110] [component/functions] Adding message as source or input of Function Apr 28, 2019

sijie mentioned this pull request Apr 30, 2019

[issue#4042] improve java functions API #4093

Merged

jerrypeng mentioned this pull request May 12, 2019

Expose MessageId as part of Record interface #4237

Closed

1 task

jerrypeng mentioned this pull request May 20, 2019

Support Pulsar Message as function input #4313

Closed

sijie closed this Jun 20, 2019

[Issue#4110] [component/functions] Adding message as source or input of Function #4127

[Issue#4110] [component/functions] Adding message as source or input of Function #4127

Uh oh!

Conversation

ConcurrencyPractitioner commented Apr 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Uh oh!

ConcurrencyPractitioner commented Apr 25, 2019

Uh oh!

sijie left a comment

Choose a reason for hiding this comment

Uh oh!

ConcurrencyPractitioner commented Apr 26, 2019

Uh oh!

ConcurrencyPractitioner commented Apr 26, 2019

Uh oh!

ConcurrencyPractitioner commented Apr 26, 2019

Uh oh!

sijie left a comment

Choose a reason for hiding this comment

Uh oh!

ConcurrencyPractitioner commented Apr 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srkukarni Apr 27, 2019

Choose a reason for hiding this comment

Uh oh!

ConcurrencyPractitioner Apr 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ConcurrencyPractitioner commented Apr 27, 2019

Uh oh!

sijie Apr 27, 2019

Choose a reason for hiding this comment

Uh oh!

ConcurrencyPractitioner Apr 27, 2019

Choose a reason for hiding this comment

Uh oh!

ConcurrencyPractitioner commented Apr 27, 2019

Uh oh!

sijie commented Apr 28, 2019

Uh oh!

ConcurrencyPractitioner commented Apr 28, 2019

Uh oh!

jerrypeng commented Apr 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sijie commented Apr 29, 2019

Uh oh!

jerrypeng commented Apr 29, 2019

Uh oh!

sijie commented Apr 29, 2019

Uh oh!

jerrypeng commented Apr 29, 2019

Uh oh!

jerrypeng commented Apr 29, 2019

Uh oh!

sijie commented Apr 29, 2019

Uh oh!

srkukarni commented Apr 29, 2019

Uh oh!

sijie commented Apr 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerrypeng commented Apr 30, 2019

Uh oh!

jerrypeng commented Apr 30, 2019

Uh oh!

sijie commented Apr 30, 2019

Uh oh!

jerrypeng commented Apr 30, 2019

Uh oh!

jerrypeng commented Apr 30, 2019

Uh oh!

sijie commented Apr 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ConcurrencyPractitioner commented Apr 25, 2019 •

edited

Loading

ConcurrencyPractitioner commented Apr 27, 2019 •

edited

Loading

ConcurrencyPractitioner Apr 27, 2019 •

edited

Loading

jerrypeng commented Apr 29, 2019 •

edited

Loading

sijie commented Apr 29, 2019 •

edited

Loading

sijie commented Apr 30, 2019 •

edited

Loading