Skip to content

Conversation

@lhotari
Copy link
Member

@lhotari lhotari commented Feb 22, 2021

Motivation

org.apache.pulsar.functions.LocalRunner currently has several issues:

  • cleanup and shutdown isn't handled in tests
  • it doesn't support .nar files that aren't on the classpath
    • adding support for .nar files that aren't on the classpath will help in the PIP-62 migration so that an integration test can run tests with a .nar file that has been built in some other build (for example)

In addition to fixing real bugs in LocalRunner, this change is important for improving the build and Pulsar GitHub Actions based CI.

One benefit is that it simplifies the building of Pulsar since it reduces the dependencies of pulsar-broker module since it gets rid of the usage of maven-antrun-plugin plugin for copying the .nar files and pulsar-functions-api-examples.jar file to src/test/resources. This is needed so that it's possible to reuse binary artifacts in the Pulsar CI pipeline. For example, in the unit test jobs, it should be possible to reuse the binary artifacts that are built in a previous step instead of building everything with mvn clean install in each step. When it's possible to reuse binary artifacts, we can save about 15 minutes for each unit test and integration test build job in the GitHub Action based CI pipeline.

Modifications

  • Make LocalRunner work properly with .nar files that aren't on the classpath

  • also support the previous choice of providing the implementation classes
    on the Thread context classloader classpath

  • Pass nar file locations as system properties to tests:

  • Pass pulsar-io cassandra and twitter nar file locations as system properties
    to tests

  • Pass pulsar-io-data-generator.nar and pulsar-functions-api-examples.jar
    file locations as system properties to tests

  • Improve LocalRunner cleanup/shutdown which wasn't handled before.

  • Fix invalid test in PackagesApiTest

  • Make PulsarFunction tests and Sink/Source tests work with .nar files which
    aren't on the classpath.

  • Extract FileServer in tests to simplify the test code

@merlimat merlimat added this to the 2.8.0 milestone Feb 22, 2021
@merlimat merlimat requested a review from jerrypeng February 22, 2021 16:48
@lhotari
Copy link
Member Author

lhotari commented Feb 22, 2021

/pulsarbot run-failure-checks

Copy link
Member

@sijie sijie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this pull request is trying to solve two things together.

Can you handle "cleanup and shutdown isn't handled in tests" first?

I am not sure how the refactor will handle PIP-62 yet. Can we defer that until it is actually happening? Let's avoid over-engineering now.

@lhotari
Copy link
Member Author

lhotari commented Feb 24, 2021

Can you handle "cleanup and shutdown isn't handled in tests" first?

Hi @sijie . Thanks for the review.
I can do this, however I think there's a justification to put multiple related changes together because the Pulsar CI is in such bad shape. It's pretty painful to get even a single PR to pass all the flaky tests.

I am not sure how the refactor will handle PIP-62 yet. Can we defer that until it is actually happening? Let's avoid over-engineering now.

There's no need to couple these changes with PIP-62. This PR fixes clear problems in LocalRunner. What makes you think that this is over-engineering?

@lhotari
Copy link
Member Author

lhotari commented Feb 25, 2021

@jerrypeng would you mind reviewing this PR? thank you

@lhotari lhotari force-pushed the lh-improve-local-runner branch from f0650cb to 8f1b797 Compare February 25, 2021 15:47
@lhotari
Copy link
Member Author

lhotari commented Feb 25, 2021

/pulsarbot run-failure-checks

@lhotari lhotari force-pushed the lh-improve-local-runner branch from 8f1b797 to 9719aae Compare March 2, 2021 10:56
@lhotari
Copy link
Member Author

lhotari commented Mar 2, 2021

@sijie @merlimat @david-streamlio @jerrypeng @jiazhai @codelipenghui
Please review this change and provide feedback.
This change is important for improving the build and Pulsar GitHub Actions based CI.

One benefit is that it simplifies the building of Pulsar since it reduces the dependencies of pulsar-broker module since it gets rid of the usage of maven-antrun-plugin plugin for copying the .nar files and pulsar-functions-api-examples.jar file to src/test/resources. This is needed so that it's possible to reuse binary artifacts in the Pulsar CI pipeline. For example, in the unit test jobs, it should be possible to reuse the binary artifacts that are built in a previous step instead of building everything with "mvn clean install" in each step. When it's possible to reuse binary artifacts, we can save about 15 minutes for each unit test and integration test build job in the GitHub Action based CI pipeline.

In addition to these benefits, this PR fixes real bugs and problems in LocalRunner. Those are described in the description of this PR.

Since the improvements of Pulsar CI are blocked by this PR, I'm asking for more reviews and addressable feedback. Thank you

@lhotari
Copy link
Member Author

lhotari commented Mar 3, 2021

/pulsarbot run-failure-check

1 similar comment
@lhotari
Copy link
Member Author

lhotari commented Mar 3, 2021

/pulsarbot run-failure-check

@lhotari
Copy link
Member Author

lhotari commented Mar 3, 2021

/pulsar-bot run-failure-checks

Copy link
Contributor

@rdhabalia rdhabalia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.. loading nar from external dir will be helpful for the tests.

@rdhabalia
Copy link
Contributor

/pulsarbot run-failure-checks

1 similar comment
@lhotari
Copy link
Member Author

lhotari commented Mar 5, 2021

/pulsarbot run-failure-checks

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
great to see such CI enhancements

@codelipenghui
Copy link
Contributor

@nlu90 @freeznet could you please help also review this PR?

…sspath

- also support the previous choice of providing the implementation classes
  on the Thread context classloader classpath

- Pass nar file locations as system properties to tests:
 - Pass pulsar-io cassandra and twitter nar file locations as system properties
   to tests
 - Pass pulsar-io-data-generator.nar and pulsar-functions-api-examples.jar
   file locations as system properties to tests

- Improve LocalRunner cleanup/shutdown which wasn't handled before.

- Fix invalid test in PackagesApiTest

- Make PulsarFunction tests and Sink/Source tests work with .nar files which
  aren't on the classpath.

- Extract FileServer in tests to simplify the test code
@lhotari lhotari force-pushed the lh-improve-local-runner branch from 0384e2e to ec1bc45 Compare March 5, 2021 08:53
@lhotari
Copy link
Member Author

lhotari commented Mar 5, 2021

/pulsarbot run-failure-checks

1 similar comment
@lhotari
Copy link
Member Author

lhotari commented Mar 5, 2021

/pulsarbot run-failure-checks

@lhotari
Copy link
Member Author

lhotari commented Mar 5, 2021

@merlimat This PR has been approved by @rdhabalia . Would you also mind checking this?

@lhotari
Copy link
Member Author

lhotari commented Mar 5, 2021

/pulsarbot run-failure-checks

4 similar comments
@lhotari
Copy link
Member Author

lhotari commented Mar 5, 2021

/pulsarbot run-failure-checks

@lhotari
Copy link
Member Author

lhotari commented Mar 5, 2021

/pulsarbot run-failure-checks

@lhotari
Copy link
Member Author

lhotari commented Mar 5, 2021

/pulsarbot run-failure-checks

@lhotari
Copy link
Member Author

lhotari commented Mar 5, 2021

/pulsarbot run-failure-checks

@merlimat merlimat merged commit 602137f into apache:master Mar 8, 2021
codelipenghui pushed a commit that referenced this pull request Jun 12, 2021
…r and make SchemaInfo an interface (#10878)

### Motivation

The java-instance.jar generated by the pulsar-functions-runtime-all module should only contain interfaces that Pulsar Function's framework uses to interact with user code.  The module should on have the following dependencies
    1. pulsar-io-core
    2. pulsar-functions-api
    3. pulsar-client-api
    4. slf4j-api
    5. log4j-slf4j-impl
    6. log4j-api
    7. log4j-core

*Explain here the context, and why you're making that change. What is the problem you're trying to solve.*

### Modifications

Change dep pulsar-client-original to pulsar-client-api

Slight changes in the top level pom for what is included in all sub-modules so that additional deps don't land into java-instance.jar

There is also a fix for an issue introduced by #9673. The thread context class loader was set incorrectly in ThreadRuntime.

### Future improvements

1. We should also add a test in the future to make sure external libraries don't get add accidentally this module and java-instance.jar

2. Rename the module "pulsar-functions-runtime-all" to something that describes its function better.  The current name can be confusing
codelipenghui pushed a commit that referenced this pull request Jun 12, 2021
…r and make SchemaInfo an interface (#10878)

### Motivation

The java-instance.jar generated by the pulsar-functions-runtime-all module should only contain interfaces that Pulsar Function's framework uses to interact with user code.  The module should on have the following dependencies
    1. pulsar-io-core
    2. pulsar-functions-api
    3. pulsar-client-api
    4. slf4j-api
    5. log4j-slf4j-impl
    6. log4j-api
    7. log4j-core

*Explain here the context, and why you're making that change. What is the problem you're trying to solve.*

### Modifications

Change dep pulsar-client-original to pulsar-client-api

Slight changes in the top level pom for what is included in all sub-modules so that additional deps don't land into java-instance.jar

There is also a fix for an issue introduced by #9673. The thread context class loader was set incorrectly in ThreadRuntime.

### Future improvements

1. We should also add a test in the future to make sure external libraries don't get add accidentally this module and java-instance.jar

2. Rename the module "pulsar-functions-runtime-all" to something that describes its function better.  The current name can be confusing


(cherry picked from commit d81b5f8)
eolivelli pushed a commit to datastax/pulsar that referenced this pull request Jun 14, 2021
…r and make SchemaInfo an interface (apache#10878)

### Motivation

The java-instance.jar generated by the pulsar-functions-runtime-all module should only contain interfaces that Pulsar Function's framework uses to interact with user code.  The module should on have the following dependencies
    1. pulsar-io-core
    2. pulsar-functions-api
    3. pulsar-client-api
    4. slf4j-api
    5. log4j-slf4j-impl
    6. log4j-api
    7. log4j-core

*Explain here the context, and why you're making that change. What is the problem you're trying to solve.*

### Modifications

Change dep pulsar-client-original to pulsar-client-api

Slight changes in the top level pom for what is included in all sub-modules so that additional deps don't land into java-instance.jar

There is also a fix for an issue introduced by apache#9673. The thread context class loader was set incorrectly in ThreadRuntime.

### Future improvements

1. We should also add a test in the future to make sure external libraries don't get add accidentally this module and java-instance.jar

2. Rename the module "pulsar-functions-runtime-all" to something that describes its function better.  The current name can be confusing

(cherry picked from commit d81b5f8)
(cherry picked from commit 89ac98e)
yangl pushed a commit to yangl/pulsar that referenced this pull request Jun 23, 2021
…r and make SchemaInfo an interface (apache#10878)

### Motivation

The java-instance.jar generated by the pulsar-functions-runtime-all module should only contain interfaces that Pulsar Function's framework uses to interact with user code.  The module should on have the following dependencies
    1. pulsar-io-core
    2. pulsar-functions-api
    3. pulsar-client-api
    4. slf4j-api
    5. log4j-slf4j-impl
    6. log4j-api
    7. log4j-core

*Explain here the context, and why you're making that change. What is the problem you're trying to solve.*

### Modifications

Change dep pulsar-client-original to pulsar-client-api

Slight changes in the top level pom for what is included in all sub-modules so that additional deps don't land into java-instance.jar

There is also a fix for an issue introduced by apache#9673. The thread context class loader was set incorrectly in ThreadRuntime.

### Future improvements

1. We should also add a test in the future to make sure external libraries don't get add accidentally this module and java-instance.jar

2. Rename the module "pulsar-functions-runtime-all" to something that describes its function better.  The current name can be confusing
ivankelly pushed a commit to ivankelly/pulsar that referenced this pull request Aug 10, 2021
…sspath (apache#9673)

- also support the previous choice of providing the implementation classes
  on the Thread context classloader classpath

- Pass nar file locations as system properties to tests:
 - Pass pulsar-io cassandra and twitter nar file locations as system properties
   to tests
 - Pass pulsar-io-data-generator.nar and pulsar-functions-api-examples.jar
   file locations as system properties to tests

- Improve LocalRunner cleanup/shutdown which wasn't handled before.

- Fix invalid test in PackagesApiTest

- Make PulsarFunction tests and Sink/Source tests work with .nar files which
  aren't on the classpath.

- Extract FileServer in tests to simplify the test code

Co-authored-by: Matteo Merli <mmerli@apache.org>
bharanic-dev pushed a commit to bharanic-dev/pulsar that referenced this pull request Mar 18, 2022
…r and make SchemaInfo an interface (apache#10878)

### Motivation

The java-instance.jar generated by the pulsar-functions-runtime-all module should only contain interfaces that Pulsar Function's framework uses to interact with user code.  The module should on have the following dependencies
    1. pulsar-io-core
    2. pulsar-functions-api
    3. pulsar-client-api
    4. slf4j-api
    5. log4j-slf4j-impl
    6. log4j-api
    7. log4j-core

*Explain here the context, and why you're making that change. What is the problem you're trying to solve.*

### Modifications

Change dep pulsar-client-original to pulsar-client-api

Slight changes in the top level pom for what is included in all sub-modules so that additional deps don't land into java-instance.jar

There is also a fix for an issue introduced by apache#9673. The thread context class loader was set incorrectly in ThreadRuntime.

### Future improvements

1. We should also add a test in the future to make sure external libraries don't get add accidentally this module and java-instance.jar

2. Rename the module "pulsar-functions-runtime-all" to something that describes its function better.  The current name can be confusing
lhotari pushed a commit to apache/pulsar-sql that referenced this pull request Oct 18, 2024
…r and make SchemaInfo an interface (#10878)

### Motivation

The java-instance.jar generated by the pulsar-functions-runtime-all module should only contain interfaces that Pulsar Function's framework uses to interact with user code.  The module should on have the following dependencies
    1. pulsar-io-core
    2. pulsar-functions-api
    3. pulsar-client-api
    4. slf4j-api
    5. log4j-slf4j-impl
    6. log4j-api
    7. log4j-core

*Explain here the context, and why you're making that change. What is the problem you're trying to solve.*

### Modifications

Change dep pulsar-client-original to pulsar-client-api

Slight changes in the top level pom for what is included in all sub-modules so that additional deps don't land into java-instance.jar

There is also a fix for an issue introduced by apache/pulsar#9673. The thread context class loader was set incorrectly in ThreadRuntime.

### Future improvements

1. We should also add a test in the future to make sure external libraries don't get add accidentally this module and java-instance.jar

2. Rename the module "pulsar-functions-runtime-all" to something that describes its function better.  The current name can be confusing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants