-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-3207] Create a standard location to enumerate and document URNs. #4310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,132 @@ | ||
| <!-- | ||
|
|
||
| Licensed to the Apache Software Foundation (ASF) under one or more | ||
| contributor license agreements. See the NOTICE file distributed with | ||
| this work for additional information regarding copyright ownership. | ||
| The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software | ||
| distributed under the License is distributed on an "AS IS" BASIS, | ||
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| See the License for the specific language governing permissions and | ||
| limitations under the License. | ||
|
|
||
| --> | ||
|
|
||
| # Apache Beam URNs | ||
|
|
||
| This file serves as a central place to enumerate and document the various | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe YAML would allow easier parsing and machine-readable association of metadata and commentary with the URNs? This file already suggests a
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Even if we made the comment and payload fields machine-readable, there's not much automated we could do with them. The greater need is to unify and document these urns, which is why I chose markdown (for easy human production and consumption). |
||
| URNs used in the Beam portability APIs. | ||
|
|
||
|
|
||
| ## Core Transforms | ||
|
|
||
| ### urn:beam:transform:pardo:v1 | ||
|
|
||
| TODO(BEAM-3595): Change this to beam:transform:pardo:v1. | ||
|
|
||
| Represents Beam's parallel do operation. | ||
|
|
||
| Payload: A serialized ParDoPayload proto. | ||
|
|
||
| ### beam:transform:group_by_key:v1 | ||
|
|
||
| Represents Beam's group-by-key operation. | ||
|
|
||
| Payload: None | ||
|
|
||
| ### beam:transform:window_into:v1 | ||
|
|
||
| Payload: A windowing strategy id. | ||
|
|
||
| ### beam:transform:flatten:v1 | ||
|
|
||
| ### beam:transform:read:v1 | ||
|
|
||
|
|
||
| ## Combining | ||
|
|
||
| If any of the combine operations are produced by an SDK, it is assumed that | ||
| the SDK understands the last three combine helper operations. | ||
|
|
||
| ### beam:transform:combine_globally:v1 | ||
|
|
||
| ### beam:transform:combine_per_key:v1 | ||
|
|
||
| ### beam:transform:combine_grouped_values:v1 | ||
|
|
||
| ### beam:transform:combine_pgbkcv:v1 | ||
|
|
||
| ### beam:transform:combine_merge_accumulators:v1 | ||
|
|
||
| ### beam:transform:combine_extract_outputs:v1 | ||
|
|
||
|
|
||
| ## Other common transforms | ||
|
|
||
| ### beam:transform:reshuffle:v1 | ||
|
|
||
|
|
||
| ## WindowFns | ||
|
|
||
| ### beam:windowfn:global_windows:v0.1 | ||
|
|
||
| TODO(BEAM-3595): Change this to beam:windowfn:global_windows:v1 | ||
|
|
||
| ### beam:windowfn:fixed_windows:v0.1 | ||
|
|
||
| TODO(BEAM-3595): Change this to beam:windowfn:fixed_windows:v1 | ||
|
|
||
| ### beam:windowfn:sliding_windows:v0.1 | ||
|
|
||
| TODO(BEAM-3595): Change this to beam:windowfn:sliding_windows:v1 | ||
|
|
||
| ### beam:windowfn:session_windows:v0.1 | ||
|
|
||
| TODO(BEAM-3595): Change this to beam:windowfn:session_windows:v1 | ||
|
|
||
|
|
||
| ## Coders | ||
|
|
||
| ### beam:coder:bytes:v1 | ||
|
|
||
| Components: None | ||
|
|
||
| ### beam:coder:varint:v1 | ||
|
|
||
| Components: None | ||
|
|
||
| ### beam:coder:kv:v1 | ||
|
|
||
| Components: The key and value coder, in that order. | ||
|
|
||
| ### beam:coder:iterable:v1 | ||
|
|
||
| Encodes an iterable of elements. | ||
|
|
||
| Components: Coder for a single element. | ||
|
|
||
| ## Internal coders | ||
|
|
||
| The following coders are typically not specified by manually by the user, | ||
| but are used at runtime and must be supported by every SDK. | ||
|
|
||
| ### beam:coder:length_prefix:v1 | ||
|
|
||
| ### beam:coder:global_window:v1 | ||
|
|
||
| ### beam:coder:interval_window:v1 | ||
|
|
||
| ### beam:coder:windowed_value:v1 | ||
|
|
||
|
|
||
| ## Side input access | ||
|
|
||
| ### beam:side_input:iterable:v1 | ||
|
|
||
| ### beam:side_input:multimap:v1 | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,63 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.beam.runners.core.construction; | ||
|
|
||
| import com.google.common.io.CharStreams; | ||
| import java.io.IOException; | ||
| import java.io.InputStreamReader; | ||
| import java.util.HashSet; | ||
| import java.util.Set; | ||
| import java.util.regex.Matcher; | ||
| import java.util.regex.Pattern; | ||
|
|
||
| /** Utilities for dealing with URNs. */ | ||
| public class UrnUtils { | ||
|
|
||
| private static final String STANDARD_URNS_PATH = "/org/apache/beam/model/common_urns.md"; | ||
| private static final Pattern URN_REGEX = Pattern.compile("\\b(urn:)?beam:\\S+:v[0-9.]+"); | ||
| private static final Set<String> COMMON_URNS = extractUrnsFromPath(STANDARD_URNS_PATH); | ||
|
|
||
| private static Set<String> extractUrnsFromPath(String path) { | ||
| String contents; | ||
| try { | ||
| contents = CharStreams.toString(new InputStreamReader( | ||
| UrnUtils.class.getResourceAsStream(path))); | ||
| } catch (IOException exn) { | ||
| throw new RuntimeException(exn); | ||
| } | ||
| Set<String> urns = new HashSet<>(); | ||
| Matcher m = URN_REGEX.matcher(contents); | ||
| while (m.find()) { | ||
| urns.add(m.group()); | ||
| } | ||
| return urns; | ||
| } | ||
|
|
||
| public static String validateCommonUrn(String urn) { | ||
| if (!URN_REGEX.matcher(urn).matches()) { | ||
| throw new IllegalArgumentException( | ||
| String.format("'%s' does not match '%s'", urn, URN_REGEX)); | ||
| } | ||
| if (!COMMON_URNS.contains(urn)) { | ||
| throw new IllegalArgumentException( | ||
| String.format("'%s' is not found in '%s'", urn, STANDARD_URNS_PATH)); | ||
| } | ||
| return urn; | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.beam.runners.core.construction; | ||
|
|
||
| import static org.junit.Assert.assertEquals; | ||
| import static org.junit.Assert.fail; | ||
|
|
||
| import org.junit.Test; | ||
|
|
||
| /** | ||
| * Tests for UrnUtils. | ||
| */ | ||
| public class UrnUtilsTest { | ||
|
|
||
| private static final String GOOD_URN = "beam:coder:bytes:v1"; | ||
| private static final String MISSING_URN = "beam:fake:v1"; | ||
| private static final String BAD_URN = "Beam"; | ||
|
|
||
| @Test | ||
| public void testGoodUrnSuccedes() { | ||
| assertEquals(GOOD_URN, UrnUtils.validateCommonUrn(GOOD_URN)); | ||
| } | ||
|
|
||
| @Test | ||
| public void testMissingUrnFails() { | ||
| try { | ||
| UrnUtils.validateCommonUrn(MISSING_URN); | ||
| fail("Should have rejected " + MISSING_URN); | ||
| } catch (IllegalArgumentException exn) { | ||
| // expected | ||
| } | ||
| } | ||
|
|
||
| @Test | ||
| public void testBadUrnFails() { | ||
| try { | ||
| UrnUtils.validateCommonUrn(BAD_URN); | ||
| fail("Should have rejected " + BAD_URN); | ||
| } catch (IllegalArgumentException exn) { | ||
| // expected | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to have a generated file? Can't you just reflectively and lazily generate the module/class/constants? (whichever is easiest)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that, but in that case we'd have to copy/distribute the .md file for pypi anyways (as most users won't be running this from within the github tree) so I figured this approach is easier. (Also has advantages for IDEs, and is similar to what we're doing for proto files and will want to do for Java.)