-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: add parsers for events and unbounded expressions #4308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Event Parser and Logging Added Expression parser in event parser and adding error logging for notify. * Removed unused imports * Update scan test logic and SNS error message Combine calls for scan event logic in test. Add topic name in error message for SNS topic. * Remove unused import Did not remove Tablescan import after changing scan to one inline operation * Fix error throwing
* Added Event Types in Json * Changed Reflection Initialization * Changed Event Type Input in Json * Fixed Listener Initialization
* Added Event Types in Json * Changed Reflection Initialization * Changed Event Type Input in Json * Fixed Listener Initialization * Added Partial fromJson in ExpressionParser Still missing Literal serialization. * Added Partial fromJson in ExpressionParser Still missing Literal serialization. * Added Literal Serialization * Fixed DecimalLiteral Serialization * Changed to Iceberg Private Classes (Original)
|
Can we limit this PR to only event parsers? The purpose is to breakdown the draft, not republish it |
| } | ||
| } | ||
|
|
||
| @SuppressWarnings("CyclomaticComplexity") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not needed?
| } | ||
| } | ||
|
|
||
| public static Predicate<?, ?> fromJsonToPredicate(JsonNode json, String predicateType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please keep all methods private except for the main one for Expression
| public static Predicate<?, ?> fromJsonToPredicate(JsonNode json, String predicateType) { | ||
| if (UNBOUNDED_PREDICATE.equals(predicateType)) { | ||
| return fromJsonUnboundPredicate(json); | ||
| } else if (BOUNDED_LITERAL_PREDICATE.equals(predicateType)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we are only targeting unbounded, we can just use a generic else block saying we don't support bounded expression yet
|
|
||
| public static void toJson(And expression, JsonGenerator generator) throws IOException { | ||
| generator.writeStartObject(); | ||
| generator.writeStringField(TYPE, AND); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here can we reuse the word "operation"? Because phrases like "and" and "or" are operations
| generator.writeStringField(OPERATION, predicate.op().name().toLowerCase()); | ||
| generator.writeFieldName(TERM); | ||
| toJson(predicate.term(), generator); | ||
| if (!(ONE_INPUTS.contains(predicate.op()))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant brackets after !
| generator.writeEndObject(); | ||
| } | ||
|
|
||
| public static void toJson(List<Literal<?>> literals, JsonGenerator generator) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method seems unnecessary, can be combined to parent method, also can use enhanced for loop
| private static final String LEFT_OPERAND = "left-operand"; | ||
| private static final String RIGHT_OPERAND = "right-operand"; | ||
| private static final String OPERAND = "operand"; | ||
| private static final String AND = "and"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these can just use operation.name().toLowerCase(), no need to define extra variables
| return True.INSTANCE; | ||
| } else if (FALSE.equals(expressionType)) { | ||
| return False.INSTANCE; | ||
| } else if (PREDICATE_TYPES.contains(expressionType)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need a set of predicate types, we just need to have a single else block, delegating all the rest of the conditions to fromJsonToPredicate(json, expressionType);
|
|
||
| public class EventParser { | ||
| private static final String EVENT_TYPE = "event-type"; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra newline
| } | ||
| } | ||
|
|
||
| public static ScanEvent fromJsonToScanEvent(JsonNode json) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep methods below private
| } | ||
|
|
||
| @Test | ||
| public void testParserBothWays() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "roundTrip" instead of "bothWays"
|
Looks like this is proposing JSON serialization. Can you write up the proposal for those formats? It is much easier to review the proposed formats that way, rather than looking at code that does it. |
| private static final String NOT = "not"; | ||
| private static final String TRUE = "true"; | ||
| private static final String FALSE = "false"; | ||
| private static final String UNBOUNDED_PREDICATE = "unbounded-predicate"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The term is "unbound", meaning that the expression has not been bound to a particular incoming struct/row type, not "unbounded".
| private static final String FALSE = "false"; | ||
| private static final String UNBOUNDED_PREDICATE = "unbounded-predicate"; | ||
| private static final String BOUNDED_LITERAL_PREDICATE = "bounded-literal-predicate"; | ||
| private static final String BOUNDED_SET_PREDICATE = "bounded-set-predicate"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to serialize bound predicates?
| } | ||
| } | ||
|
|
||
| public static void toJson(Expression expression, JsonGenerator generator) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iceberg separates the recursive logic to process a tree structure from the logic to do something by using visitors. I think that this should use an in-order expression visitor to produce a JSON representation.
| import org.apache.iceberg.expressions.Expression; | ||
| import org.apache.iceberg.expressions.ExpressionParser; | ||
|
|
||
| public class EventParser { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of event is this parser for? It appears to accept Object in toJson, which is suspicious.
| if (pretty) { | ||
| generator.useDefaultPrettyPrinter(); | ||
| } | ||
| if (event instanceof ScanEvent) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We add newlines after control flow blocks and the following statement.
| toJson((CreateSnapshotEvent) event, generator); | ||
| } else if (event instanceof IncrementalScanEvent) { | ||
| toJson((IncrementalScanEvent) event, generator); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the event is not handled, this will just fail?
Made Parser function private that aren't called outside the classes and changed Operation type to "operation" instead of "type"
|
@kunal0829, can you write up a doc that summarizes the proposed serialization formats? |
|
@rdblue Here is an example write up that explains the proposed format: https://docs.google.com/document/d/1CvOfsVAMezqQyyAwIn0qfew9EvlX-5PIvKGHPzlHldA/edit?usp=sharing Please let me know if you have any questions about the proposed format |
|
I've not been able to comment on the doc since it isn't opened up for discussion. I'll add a few of the comments here instead. My main concern about the proposed JSON format is that it is needlessly verbose and relies heavily on the JVM classes rather than representing the expressions more directly. For example, it is scoped to only unbound expressions, but still uses "unbound-predicate" and "named-reference". It also uses "left-operand" and "right-operand". I think it would be better to use fewer words and make better use of "type". Each node in the expression tree should be represented as a JSON object. Those are distinguished by the Then there is some specialization by
I would also allow the reference to be replaced with a string field name, and a literal to be replaced with the value when nested. I think this format is more direct: # example 1: event_ts = 2022-04-24T11:42:12.578391 OR bucket(16, id) IN (0, 1)
{
"type": "or",
"left": {
"type": "eq",
"term": "event_ts", # equivalent to { "type": "reference", "name": "event_ts" }
"value": "2022-04-24T11:42:12.578391" # equivalent to { "type": "literal", "value": "2022-04-24T11:42:12.578391" }
}
"right": {
"type": "in"
"term": {
"type": "transform",
"transform": "bucket[16]",
"term": "id"
},
"values": [
{ "type": "literal", "value": 0 },
1
]
}
}# example 2: true
{ "type": "literal", "value": true } |
|
@kunal0829 and @jackye1995, an expression JSON parser was just merged in #5602. |
In this PR, the Iceberg events are now serializable into JSON by introducing an
EventParseralong with the addition ofExpressionParser.