Conversation
|
This helps avoid dealing with bytes in downstream clients such as Python (no Avro) and loading csv/json into downstream databases. |
|
Marked as [WIP] cause |
7c53447 to
5b7166d
Compare
|
fixed |
There was a problem hiding this comment.
Most of this looks fine, but I think just adding BYTES to this isn't quite right. The way this was originally written assumes that all types this transformation supports have a sensible translation to every other type (which is why it stuck to numbers, boolean, and strings). I think you're trying to add support for an additional set of casts that make sense (e.g. most things have a sensible toString()), but now the input and output are not symmetric -- some things make sense as an input, but not as an output (e.g. BYTES). So a single list might not be sufficient to do these checks correctly (e.g. as it is today, if you tried to do something like bytes:string conversion, instead of catching it as a config issue and giving an easy to understand error, it would be caught later and just say it encountered an unexpected type). If the symmetry is no longer going to be there, I think we need to separate out what we support as input and what we support as output (and I'm not even sure that will be sufficient, but it seems like the minimum we'd need).
There was a problem hiding this comment.
Makes sense. Let me check it out and push an update.
|
@ewencp I think this makes more sense now.
What do you think is still missing? On a separate note, WDYT about my note on |
|
FAILURE |
a71579d to
cbfa8f9
Compare
|
I don't particular like |
|
True. To me it makes sense to say that casting to string uses the Object's |
cbfa8f9 to
50783f8
Compare
|
Resolved conflicts, squashed, and rebased. |
|
As @ewencp suggested, for consistency perhaps we can use the same string formats as Having them be inconsistent would be confusing IMHO. |
|
Sure thing! I'll update the PR. |
|
retest this please |
ewencp
left a comment
There was a problem hiding this comment.
seems very close... just a couple of minor comments
There was a problem hiding this comment.
hmm, is this just an underlying bug? should this bit be backported?
There was a problem hiding this comment.
yeah that's just a bug I happened to stumble upon, probably should be backported.
There was a problem hiding this comment.
this is a public API change -- everything public in the Connect API is.
do we actually need this here? Scanning through this PR, it looks like it is used in one specific class and then test classes in the same package. Making it package-protected where it is actually used seems sufficient to me, no?
There was a problem hiding this comment.
package-private is not enough since Cast is in package org.apache.kafka.connect.transforms tries to use Values#dateFormatter(java.util.Date) in org.apache.kafka.connect.data.
I think it's fine since we'd want anyone/anything that's manipulating "data values" to use the same formatter, no? so we give access to that method.
infer schema type from schema as fallback tests split cast validation to input/output use Values date fomratter if java.util.Date
18cbc78 to
cfd9cb6
Compare
|
@ewencp I squashed and split the bug fix into it's own commit so you can cherry pick it easily. |
|
test failure seems unrelated, I'll kick Jenkins to try again |
|
jenkins retest this please |
| public class CastTest { | ||
| private final Cast<SourceRecord> xformKey = new Cast.Key<>(); | ||
| private final Cast<SourceRecord> xformValue = new Cast.Value<>(); | ||
| private static final long MILLIS_PER_DAY = 24 * 60 * 60 * 1000; |
There was a problem hiding this comment.
minor: consider using TimeUnit.DAYS.toMillis(1)
There was a problem hiding this comment.
Just being consistent with the project's coding conventions: https://github.com/apache/kafka/blob/trunk/connect/api/src/main/java/org/apache/kafka/connect/data/Values.java#L71
There was a problem hiding this comment.
Could you do a static import for that Values class, then?
There was a problem hiding this comment.
If a the committer wants, he can. This PR had a ridiculous back and forth and had been stretched for ages compared to what it covers.
|
Hey @ewencp, if this PR is close to be merged, would you have time to review it for 2.1.0 release? |
Switches to normal year format instead of week date years and day of month instead of day of year. This is directly from #4820, but separated into a different JIRA/PR to keep the fixes independent. Original authorship should be maintained in the commit. Author: Amit Sela <amitsela33@gmail.com> Reviewers: Ewen Cheslack-Postava <ewen@confluent.io> Closes #5718 from ewencp/fix-header-converter-date-format (cherry picked from commit c1457be) Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>
Switches to normal year format instead of week date years and day of month instead of day of year. This is directly from #4820, but separated into a different JIRA/PR to keep the fixes independent. Original authorship should be maintained in the commit. Author: Amit Sela <amitsela33@gmail.com> Reviewers: Ewen Cheslack-Postava <ewen@confluent.io> Closes #5718 from ewencp/fix-header-converter-date-format (cherry picked from commit c1457be) Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>
Switches to normal year format instead of week date years and day of month instead of day of year. This is directly from #4820, but separated into a different JIRA/PR to keep the fixes independent. Original authorship should be maintained in the commit. Author: Amit Sela <amitsela33@gmail.com> Reviewers: Ewen Cheslack-Postava <ewen@confluent.io> Closes #5718 from ewencp/fix-header-converter-date-format
ewencp
left a comment
There was a problem hiding this comment.
@amitsela Thank you! Merging to trunk for 2.1.0.
Sorry for the (slow) back and forth. We prefer to be very certain of patches than to merge them too quickly, and in this case review throughput was limited.
I think there's still substantial follow-up to be done here re: handling of types and test coverage, but this is a significant improvement regardless.
|
|
||
| private static String castToString(Object value) { | ||
| return value.toString(); | ||
| if (value instanceof java.util.Date) { |
There was a problem hiding this comment.
Given we're doing this in the core api/transform code now, this probably warrants a more generalizable approach. On first iteration, the logical type classes (in o.a.k.connect.data) were sufficient, but it seems they may not quite expose enough info. In particular, other than the fromLogical return type and toLogical parameter type, they may not make it clear enough what the conversion to primitive type is. Also, they probably aren't sufficiently isolated.
We shouldn't block getting this in since Values currently tries to do its best to determine the correct type, though even that is subject to failure modes (e.g. @rhauch just because a timestamp falls on a day or within the first day after 0AD doesn't mean they actually are that type -- we should really be passing through schema info to determine that). Given lack of that info provided to currently public APIs, I don't think there are real compatibility issues -- in the future we'd want a more correct alternative we'd provide a separate multi-argument version that includes schema info and fallback to the single-argument version.
There was a problem hiding this comment.
That makes sense.
One why to do this would be to create an abstraction o.a.k.connect.data.Logical which would avoid the whole "if isinstaceof.." code in Cast or any other place you're looking to convert to/from logical.
|
I agree both with being thorough, and with the fact that there's a lot that could be done to further improve this area of the code. |
Switches to normal year format instead of week date years and day of month instead of day of year. This is directly from apache#4820, but separated into a different JIRA/PR to keep the fixes independent. Original authorship should be maintained in the commit. Author: Amit Sela <amitsela33@gmail.com> Reviewers: Ewen Cheslack-Postava <ewen@confluent.io> Closes apache#5718 from ewencp/fix-header-converter-date-format
Allow to cast LogicalType to string by calling the serialized (Java) object's toString(). Added tests for `BigDecimal` and `Date` as whole record and as fields. Author: Amit Sela <amitsela33@gmail.com> Reviewers: Randall Hauch <rhauch@gmail.com>, Robert Yokota <rayokota@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io> Closes apache#4820 from amitsela/cast-transform-bytes
Allow to cast LogicalType to string by calling the serialized (Java) object's toString(). Added tests for `BigDecimal` and `Date` as whole record and as fields. Author: Amit Sela <amitsela33@gmail.com> Reviewers: Randall Hauch <rhauch@gmail.com>, Robert Yokota <rayokota@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io> Closes #4820 from amitsela/cast-transform-bytes
Allow to cast LogicalType to string by calling the serialized (Java) object's toString(). Added tests for `BigDecimal` and `Date` as whole record and as fields. Author: Amit Sela <amitsela33@gmail.com> Reviewers: Randall Hauch <rhauch@gmail.com>, Robert Yokota <rayokota@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io> Closes #4820 from amitsela/cast-transform-bytes
Allow to cast LogicalType to string by calling the serialized (Java) object's toString().
Added tests for
BigDecimalandDateas whole record and as fields.Committer Checklist (excluded from commit message)