Skip to content

Conversation

@candlerb
Copy link
Contributor

@candlerb candlerb commented Nov 1, 2019

Motivation

Documentation improvements

Modifications

  • functions-debug: show where function stderr logs are written
  • functions-develop: note that pulsar only starts "python", you may need a frig to use "python3" (python functions cannot find python3 interpreter #5518)
  • functions-overview: give example of deploying python function with --py and --classname
  • functions-state: document Python state API (currently says it's not implemented)
  • functions-api, functions-state: add deleteState to Java API

Note: functions-state and functions-api are currently unlinked from the navbar. Need to decide what to do about these - e.g. move valuable parts into "functions-develop" and then "git rm" them.

@sijie sijie assigned sijie and candlerb and unassigned sijie Nov 1, 2019
@sijie sijie added this to the 2.5.0 milestone Nov 1, 2019
Copy link
Contributor

@Jennifer88huang-zz Jennifer88huang-zz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@candlerb Thank you very much for your feedback and contributions.
We should have removed the deprecated files functions-api.md and functions-state.md files from the master.
We've adopted the new structure for Pulsar Functions in #4554. It's not complete at the moment. If you have any issue on the new structure, feel free to let us know.

Comment on lines 38 to 50
Note that functions can be written in python2 or python3, but pulsar
currently only looks for "python" as the interpreter to execute them.

A recent Ubuntu system may have only "python3" but not "python", in which
case functions will fail to start. As a workaround you can create a symlink, but beware this has some
[risks](https://askubuntu.com/questions/320996/how-to-make-python-program-command-execute-python-3#answer-475815):

```bash
sudo update-alternatives --install /usr/bin/python python /usr/bin/python3 10
```

If you choose to do this, be careful not to install any other package which
depends on "python" (2.x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note
You can write Pulsar Functions in python2 or python3. However, Pulsar only looks for python as the interpreter.

If you're running Pulsar Functions on Ubuntu system that only supports python3, you might fail to
start the functions. In this case, you can create a symlink. However, such action
has potential risks.

sudo update-alternatives --install /usr/bin/python python /usr/bin/python3 10

If you create a symlink, you'd better not install any other package that depends on "python" (2.x).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tiny suggestion on this: If we can summarize the risks briefly, we can use a brief summary, and not use external links.
Reason: The external link might change or someone might delete the Q&A in this link, then the link in our docs will be broken, it's hard to maintain external links.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could link to #5518 on github instead. Actually, I'm hoping #5518 will accept a new config option for setting the path to the python interpreter, in which case we would document that instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sure.


Since Pulsar 2.1.0 release, Pulsar integrates with Apache BookKeeper [table service](https://docs.google.com/document/d/155xAwWv5IdOitHh1NVMEwCMGgB28M3FyMiQSxEpjE-Y/edit#heading=h.56rbh52koe3f) to store the `State` for functions. For example, a `WordCount` function can store its `counters` state into BookKeeper table service via Pulsar Functions State API.

States are key-value pairs, where the key is a string and the value is arbitrary binary data - counters are stored as 64-bit big-endian binary values. Keys are scoped to an individual pulsar function, but shared between all instances of that function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
States are key-value pairs, where the key is a string and the value is arbitrary binary data - counters are stored as 64-bit big-endian binary values. Keys are scoped to an individual pulsar function, but shared between all instances of that function.
States are key-value pairs, where the key is a string and the value is arbitrary binary data - counters are stored as 64-bit big-endian binary values. Thought keys are scoped to an individual Pulsar Function, the keys are shared among all instances of that function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Thought/Though/ ?

I think the conjunction isn't useful here, as it implies the user already knows the scoping of keys. I don't think it's mentioned earlier, and without this knowledge I might have guessed they were scoped differently (to the pulsar "namespace" that the function executes within, for example)

So as a user, I just want a statement which answers the question: "What's the scope of the key?"

I would be happy with "Keys are scoped to the pulsar function". I thought it worth clarifying functions versus function instances, but maybe that's unnecessary. State storage which wasn't shared between function instances wouldn't be very useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Though", sorry for the typo.
If you think the conjunction is useful, you can remove it.


## API

<!--DOCUSAURUS_CODE_TABS-->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for adding those valuable info. Could we add those info in the [functions-develop.md#state-storage] section?

@candlerb
Copy link
Contributor Author

candlerb commented Nov 1, 2019

We've adopted the new structure for Pulsar Functions in #4554. It's not complete at the moment.

Do you think I should wait until the structure is finalized before reworking this patch?

@Jennifer88huang-zz
Copy link
Contributor

We've adopted the new structure for Pulsar Functions in #4554. It's not complete at the moment.

Do you think I should wait until the structure is finalized before reworking this patch?

You do not need to wait, just go ahead with your patch.

@candlerb candlerb closed this Nov 1, 2019
@candlerb
Copy link
Contributor Author

candlerb commented Nov 1, 2019

I have force-pushed new version; this also removes functions-api.md and functions-state.md

@candlerb
Copy link
Contributor Author

candlerb commented Nov 1, 2019

Never mind, I will open new PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants