-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-9577] Artifact v2 support for uber jars. #11708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ibzib
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will review the Python part in a bit, but wanted to comment on Java first because I feel I'm missing something.
sdks/java/core/src/main/java/org/apache/beam/sdk/io/ClassLoaderFileSystem.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/main/java/org/apache/beam/sdk/io/ClassLoaderFileSystem.java
Show resolved
Hide resolved
| InputStream inputStream = newInputStream(channel); | ||
| byte[] magic = new byte[4]; | ||
| inputStream.read(magic); | ||
| assertArrayEquals(magic, new byte[] {(byte) 0xCA, (byte) 0xFE, (byte) 0xBA, (byte) 0xBE}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did I never know this? 😆
sdks/java/core/src/main/java/org/apache/beam/sdk/io/ClassLoaderFileSystem.java
Outdated
Show resolved
Hide resolved
|
Still trying to figure out why the test fails on jenkins but passes locally, but other than that it should be ready to be looked at again. |
| self._artifact_staging_service.close() | ||
| self._artifact_manifest_location = ( | ||
| self._artifact_staging_service.retrieval_token(self._job_id)) | ||
| self._artifact_manifest_location = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we remove this now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Done.
| url='localhost:%d' % port) | ||
| self._artifact_staging_server.start() | ||
| _LOGGER.info('Artifact server started on port %s', port) | ||
| _LOGGER.error('Artifact server started on port %s', port) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover debugging. Removed.
| def close(self): | ||
| self._zipfile_handle.close() | ||
|
|
||
| def file_writer(self, path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a type annotation and/or comment on the return value here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; | ||
|
|
||
| /** A read-only {@link FileSystem} implementation looking up resources using a ClassLoader. */ | ||
| public class ClassLoaderFileSystem extends FileSystem<ClassLoaderFileSystem.ClassLoaderResourceId> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change PortablePipelineJarUtils to use ClassLoaderFileSystem? (Maybe in a follow-up PR) https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/PortablePipelineJarUtils.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we'll want to do that too (and use the new artifact api).
robertwb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, PTAL.
| import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; | ||
|
|
||
| /** A read-only {@link FileSystem} implementation looking up resources using a ClassLoader. */ | ||
| public class ClassLoaderFileSystem extends FileSystem<ClassLoaderFileSystem.ClassLoaderResourceId> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we'll want to do that too (and use the new artifact api).
| def close(self): | ||
| self._zipfile_handle.close() | ||
|
|
||
| def file_writer(self, path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| url='localhost:%d' % port) | ||
| self._artifact_staging_server.start() | ||
| _LOGGER.info('Artifact server started on port %s', port) | ||
| _LOGGER.error('Artifact server started on port %s', port) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover debugging. Removed.
| self._artifact_staging_service.close() | ||
| self._artifact_manifest_location = ( | ||
| self._artifact_staging_service.retrieval_token(self._job_id)) | ||
| self._artifact_manifest_location = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Done.
ibzib
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| def file_writer(self, path): | ||
| # type: (str) -> Tuple[BinaryIO, str] | ||
| """Given a relative path, returns an open handle that can be written to | ||
| and an reference that can later be used to read this file.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| and an reference that can later be used to read this file.""" | |
| and a reference that can later be used to read this file.""" |
* Support ZetaSQL DATE type as a Beam LogicalType * [BEAM-6733] Add pipeline option to flush bundle data before checkpointing We had a couple of PRs in which we wanted to remove the buffering of bundle output during checkpointing: #7940 #9652. Ultimately, we didn't merge any of those because we weren't sure how the change would affect the checkpoint performance. As a better migration path, this introduces a pipeline option to change the default, buffering bundle output during checkpointing, to finishing the bundle and flushing all data before checkpointing. * Remove all answer placeholder checks as they can be confusing at times for some learners * Update course in Stepik * [BEAM-10018] Fix timestamps in windowing kata In this Kata, the timestamp was calculated from time objects, and converted to a timestamp in the local timezone. Thus, the results of the test depended on the configuration of the local timezone in the running system. The tests were hardcoded with a timezone different to mine, and thus I always failed to pass this Kata. The changes in this commit change the type in Event to be a datetime, the timestamps are set in UTC, and the output in the tests is hardcoded in UTC too. This should ensure that the kata works regardless the timezone configured in the system running the kata. * [BEAM-10018] Kata failing due to failed parsing Parsing the timestamps as strings using fromisoformat was failing, and the Kata failed silently regardless the code written in the boxes. This change sets the same timestamps, with UTC timezone, without parsing strings. * Convert html task description to md for "Hello Beam" and "Core Transforms/Map" * Remove unused import * Add missing dependency * Fix member variable name in Kata documentation * Fix placeholder location * Convert html task description to md for "Core Transforms" remaining lessons * Convert html task description to md for "Common Transforms" lessons * Convert html task description to md for remaining Python Katas lessons * Convert html task description to md for most of Java Katas lessons * Convert html task description to md for Java Katas "Common Transforms" lessons * Convert html task description to md for Java Katas "Core Transforms" lessons * [BEAM-2530] Implement Zeta SQL precommit compile tests and run on java 11 (#11692) [BEAM-2530] Implement Zeta SQL precommit compile tests and run on java 11 * Python3 fix - convert dict.keys() to list before indexing (#11733) * Updates google-apitools and httplib2 (#11726) * [BEAM-9964] Update CHANGES.md (#11743) Co-authored-by: Omar Ismail <omarismail@omarismail-macbookpro.roam.corp.google.com> * [BEAM-9577] Artifact v2 support for uber jars. (#11708) * Adds a "filesystem" for artifacts placed on the classpath (e.g. within the uberjar). * Updates the flink and spark uberjars to use artifact staging v2, leveraging the above filesystem. * Populate all SpannerIO batching parameters in display data. Add all the grouping/batching parameters in SpannerIO populateDisplayData(). * Fix capitalization, clarify descriptions * fix capitalization, clarify description Grouped * Refactor to extract single method for popuplating displayData * Convert html task description to md for "Hello Beam" and "Core Transforms/Map" * Convert html task description to md for "Core Transforms" remaining lessons * Convert html task description to md for "Common Transforms" lessons * Convert html task description to md for remaining Python Katas lessons * Convert html task description to md for most of Java Katas lessons * Convert html task description to md for Java Katas "Common Transforms" lessons * Convert html task description to md for Java Katas "Core Transforms" lessons * Resolve merge conflict * Update Python Katas on Stepik * Update Beam Katas Java on Stepik Co-authored-by: Yueyang Qiu <robinyqiu@gmail.com> Co-authored-by: Maximilian Michels <mxm@apache.org> Co-authored-by: Israel Herraiz <ihr@google.com> Co-authored-by: pawelpasterz <32893017+pawelpasterz@users.noreply.github.com> Co-authored-by: Chamikara Jayalath <chamikara@apache.org> Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com> Co-authored-by: Pablo <pabloem@users.noreply.github.com> Co-authored-by: omarismail94 <44980219+omarismail94@users.noreply.github.com> Co-authored-by: Omar Ismail <omarismail@omarismail-macbookpro.roam.corp.google.com> Co-authored-by: Andrew Pilloud <apilloud@users.noreply.github.com> Co-authored-by: Robert Bradshaw <robertwb@google.com> Co-authored-by: nielm <nielm@google.com> Co-authored-by: Brian Hulette <hulettbh@gmail.com> Co-authored-by: Brian Hulette <bhulette@google.com>
* Adds a "filesystem" for artifacts placed on the classpath (e.g. within the uberjar). * Updates the flink and spark uberjars to use artifact staging v2, leveraging the above filesystem.
* Support ZetaSQL DATE type as a Beam LogicalType * [BEAM-6733] Add pipeline option to flush bundle data before checkpointing We had a couple of PRs in which we wanted to remove the buffering of bundle output during checkpointing: apache#7940 apache#9652. Ultimately, we didn't merge any of those because we weren't sure how the change would affect the checkpoint performance. As a better migration path, this introduces a pipeline option to change the default, buffering bundle output during checkpointing, to finishing the bundle and flushing all data before checkpointing. * Remove all answer placeholder checks as they can be confusing at times for some learners * Update course in Stepik * [BEAM-10018] Fix timestamps in windowing kata In this Kata, the timestamp was calculated from time objects, and converted to a timestamp in the local timezone. Thus, the results of the test depended on the configuration of the local timezone in the running system. The tests were hardcoded with a timezone different to mine, and thus I always failed to pass this Kata. The changes in this commit change the type in Event to be a datetime, the timestamps are set in UTC, and the output in the tests is hardcoded in UTC too. This should ensure that the kata works regardless the timezone configured in the system running the kata. * [BEAM-10018] Kata failing due to failed parsing Parsing the timestamps as strings using fromisoformat was failing, and the Kata failed silently regardless the code written in the boxes. This change sets the same timestamps, with UTC timezone, without parsing strings. * Convert html task description to md for "Hello Beam" and "Core Transforms/Map" * Remove unused import * Add missing dependency * Fix member variable name in Kata documentation * Fix placeholder location * Convert html task description to md for "Core Transforms" remaining lessons * Convert html task description to md for "Common Transforms" lessons * Convert html task description to md for remaining Python Katas lessons * Convert html task description to md for most of Java Katas lessons * Convert html task description to md for Java Katas "Common Transforms" lessons * Convert html task description to md for Java Katas "Core Transforms" lessons * [BEAM-2530] Implement Zeta SQL precommit compile tests and run on java 11 (apache#11692) [BEAM-2530] Implement Zeta SQL precommit compile tests and run on java 11 * Python3 fix - convert dict.keys() to list before indexing (apache#11733) * Updates google-apitools and httplib2 (apache#11726) * [BEAM-9964] Update CHANGES.md (apache#11743) Co-authored-by: Omar Ismail <omarismail@omarismail-macbookpro.roam.corp.google.com> * [BEAM-9577] Artifact v2 support for uber jars. (apache#11708) * Adds a "filesystem" for artifacts placed on the classpath (e.g. within the uberjar). * Updates the flink and spark uberjars to use artifact staging v2, leveraging the above filesystem. * Populate all SpannerIO batching parameters in display data. Add all the grouping/batching parameters in SpannerIO populateDisplayData(). * Fix capitalization, clarify descriptions * fix capitalization, clarify description Grouped * Refactor to extract single method for popuplating displayData * Convert html task description to md for "Hello Beam" and "Core Transforms/Map" * Convert html task description to md for "Core Transforms" remaining lessons * Convert html task description to md for "Common Transforms" lessons * Convert html task description to md for remaining Python Katas lessons * Convert html task description to md for most of Java Katas lessons * Convert html task description to md for Java Katas "Common Transforms" lessons * Convert html task description to md for Java Katas "Core Transforms" lessons * Resolve merge conflict * Update Python Katas on Stepik * Update Beam Katas Java on Stepik Co-authored-by: Yueyang Qiu <robinyqiu@gmail.com> Co-authored-by: Maximilian Michels <mxm@apache.org> Co-authored-by: Israel Herraiz <ihr@google.com> Co-authored-by: pawelpasterz <32893017+pawelpasterz@users.noreply.github.com> Co-authored-by: Chamikara Jayalath <chamikara@apache.org> Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com> Co-authored-by: Pablo <pabloem@users.noreply.github.com> Co-authored-by: omarismail94 <44980219+omarismail94@users.noreply.github.com> Co-authored-by: Omar Ismail <omarismail@omarismail-macbookpro.roam.corp.google.com> Co-authored-by: Andrew Pilloud <apilloud@users.noreply.github.com> Co-authored-by: Robert Bradshaw <robertwb@google.com> Co-authored-by: nielm <nielm@google.com> Co-authored-by: Brian Hulette <hulettbh@gmail.com> Co-authored-by: Brian Hulette <bhulette@google.com>
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.