Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -300,8 +300,7 @@ public <T> Literal<T> to(Type type) {
case TIMESTAMP:
return (Literal<T>) new TimestampLiteral(value());
case TIMESTAMP_NANO:
// assume micros and convert to nanos to match the behavior in the timestamp case above
return new TimestampLiteral(value()).to(type);
return (Literal<T>) new TimestampNanoLiteral(value());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems correct to me, the previous behavior for this was to assume the value was in microseconds and then pass that through to TimestampLiteral but that can overflow and does not actually represent a nanosecond timestamp!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I still think this is correct but it's worth getting other's perspective on this since we are changing one of the assumptions of how value is interpreted when the type to convert to is nanoseconds.

CC @nastra @epgif @jacobmarble @rdblue thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both before and after this change is correct. In Iceberg we had the assumption that everything is in microseconds. But this doesn't hold anymore now we have nano's. I do think the version after the change is more correct and more closely aligns with my expectations. If we can make sure that folks are not using this yet, I think this change is a good one 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a chat with @rdblue who reviewed the PR that introduced this, and it is actually on purpose. Spark always passes in microseconds, changing this would break this assumption with Spark. So I think we have to revert this line. That said, I do think we need to check (and raise an error) when it overflows. Easiest way of doing this is by converting it to nano's, and convert is back to micro's and check if it still the same value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sharing the context. What about other query engines? Actually, I found this issue when I was trying to support nanosecond precision in Trino Iceberg connector. As you may know, the max precision in Trino is picos (12).

Copy link
Member Author

@ebyhr ebyhr Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevenzwu I don't think we want to discuss the best practice for partitioning. Iceberg spec allows such a partition spec, right? It's just one of our test scenarios at Starburst, not a customer's table definition.

Copy link
Contributor

@stevenzwu stevenzwu Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. But at least the broken behavior using string literal with identity partition transformation on timestamp_ns column has minimal/no impact in practice, since it is not really a viable setup with production workload. the string literal problem can be tracked separately than the long literal issue.

Come back to this PR/issue, we need to decide on what to do with long literal for timestamp_ns. there are two options (1) derive/imply the precision (micro or nano) based the timestamp column type, which is the current approach this PR (2) always assume micro second precision for consistent interpretation behavior regardless of column timestamp precision, which is the current behavior and where Ryan coming from.

It seems that there is no consensus on adopting the behavior change of option 1. To move forward without changing current behavior and causing confusions to users, we can adopt these two steps

  1. for the immediate step, fail the long literal (assumed micro precision) to nano ts with an explicit and more friendly error msg (suggested by Russel). For now, users should use the string literal with hour/day/week/month transformation on timestamp_nano partition column.
  2. for the longer term, we can expose public methods for engines/clients to construct TimestampLiteralNano directly. Then long literal can used correctly with timestamp nano type, e.g. ts < timestamp_nano(1230219000123123)

Copy link
Contributor

@stevenzwu stevenzwu Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about these two options of new public API for literal construction in Expressions class? With the new pubic constructor, would it be enough for engines to support nano timestamp?

  1. public static Literal<TimestampNanoLiteral> litTimestampNano(long value)

With this API, clients/engines could construct literals like this when they know the long value contains timestamp in nano.

Literal<?> columnLiteral;
if (icebergType.typeId() == TypeID.TIMESTAMP_NANO && columnValue instanceof Long) {
  columnLiteral = Expressions.litNanoTimestamp(columnValue);
} else {
  columnLiteral = Expressions.lit(columnValue);
}
  1. public static <T> Literal<T> lit(T value, TypeID type)

Basically move the above if-else inside this method, which assumes the long value for nano timestamp type has the precision of nano.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding such method is insufficient if I understand correctly. For instance, the problematic place in Trino depends on Expressions#in(java.lang.String, T...) taking values, not literals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just opened #13747 with fixes for the problems raised by this thread. It adds millis, micros, and nanos methods to Expressions that construct timestamp literals from long values. I also updated Literals.from so that these literals can be passed when creating in expressions (see the test).

case DATE:
if ((long) Integer.MAX_VALUE < value()) {
return aboveMax();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;

import java.time.LocalDate;
import java.time.format.DateTimeParseException;
import org.apache.iceberg.types.Types;
import org.apache.iceberg.util.DateTimeUtil;
Expand Down Expand Up @@ -107,6 +108,38 @@ public void testTimestampMicrosToDateConversion() {
assertThat(dateOrdinal).isEqualTo(-1);
}

@Test
public void testTimestampNanoWithLongLiteral() {
// verify round-trip between timestamp_ns and long
Literal<Long> timestampNanoFromStringLiteral =
Literal.of("2017-11-16T14:31:00.123456789").to(Types.TimestampNanoType.withoutZone());
assertThat(timestampNanoFromStringLiteral.getClass())
.isEqualTo(Literals.TimestampNanoLiteral.class);
assertThat(timestampNanoFromStringLiteral.value()).isEqualTo(1510842660123456789L);

Literal<Long> timestampNanoFromlongLiteral =
Literal.of(1510842660123456789L).to(Types.TimestampNanoType.withoutZone());
assertThat(timestampNanoFromlongLiteral.getClass())
.isEqualTo(Literals.TimestampNanoLiteral.class);
assertThat(timestampNanoFromlongLiteral).isEqualTo(timestampNanoFromStringLiteral);

// cast long literal to temporal types
assertThat(timestampNanoFromlongLiteral.to(Types.DateType.get()).value())
.isEqualTo((int) LocalDate.of(2017, 11, 16).toEpochDay());

assertThat(timestampNanoFromlongLiteral.to(Types.TimestampType.withoutZone()).value())
.isEqualTo(1510842660123456L);

assertThat(timestampNanoFromlongLiteral.to(Types.TimestampType.withZone()).value())
.isEqualTo(1510842660123456L);

assertThat(timestampNanoFromlongLiteral.to(Types.TimestampNanoType.withoutZone()).value())
.isEqualTo(1510842660123456789L);

assertThat(timestampNanoFromlongLiteral.to(Types.TimestampNanoType.withZone()).value())
.isEqualTo(1510842660123456789L);
}

@Test
public void testTimestampNanoToTimestampConversion() {
Literal<Long> timestamp =
Expand Down Expand Up @@ -158,6 +191,33 @@ public void testTimestampNanosToDateConversion() {
assertThat(dateOrdinal).isEqualTo(-1);
}

@Test
public void testTimestampNanoWithZoneWithLongLiteral() {
// verify round-trip between timestamptz_ns and long
Literal<Long> timestampNanoWithZone =
Literal.of("2017-11-16T14:31:08.000000001+01:00").to(Types.TimestampNanoType.withZone());
assertThat(timestampNanoWithZone.value()).isEqualTo(1510839068000000001L);

Literal<Long> longLiteral =
Literal.of(1510839068000000001L).to(Types.TimestampNanoType.withZone());
assertThat(longLiteral).isEqualTo(timestampNanoWithZone);

// cast long literal to temporal types
assertThat(longLiteral.to(Types.DateType.get()).value())
.isEqualTo((int) LocalDate.of(2017, 11, 16).toEpochDay());

assertThat(longLiteral.to(Types.TimestampType.withoutZone()).value())
.isEqualTo(1510839068000000L);

assertThat(longLiteral.to(Types.TimestampType.withZone()).value()).isEqualTo(1510839068000000L);

assertThat(longLiteral.to(Types.TimestampNanoType.withoutZone()).value())
.isEqualTo(1510839068000000001L);

assertThat(longLiteral.to(Types.TimestampNanoType.withoutZone()).value())
.isEqualTo(1510839068000000001L);
}

@Test
public void testTimestampNanosWithZoneConversion() {
Literal<CharSequence> isoTimestampNanosWithZoneOffset =
Expand Down
10 changes: 4 additions & 6 deletions api/src/test/java/org/apache/iceberg/types/TestConversions.java
Original file line number Diff line number Diff line change
Expand Up @@ -104,16 +104,14 @@ public void testByteBufferConversions() {
.isEqualTo(new byte[] {-128, 26, 6, 0, 0, 0, 0, 0});
assertThat(Literal.of(400000L).to(TimestampType.withZone()).toByteBuffer().array())
.isEqualTo(new byte[] {-128, 26, 6, 0, 0, 0, 0, 0});
// values passed to assertConversion and Literal.of differ because Literal.of(...) assumes
// the value is in micros, which gets converted when to(TimestampNanoType) is called
assertConversion(
400000000L, TimestampNanoType.withoutZone(), new byte[] {0, -124, -41, 23, 0, 0, 0, 0});
400000L, TimestampNanoType.withoutZone(), new byte[] {-128, 26, 6, 0, 0, 0, 0, 0});
assertConversion(
400000000L, TimestampNanoType.withZone(), new byte[] {0, -124, -41, 23, 0, 0, 0, 0});
400000L, TimestampNanoType.withZone(), new byte[] {-128, 26, 6, 0, 0, 0, 0, 0});
assertThat(Literal.of(400000L).to(TimestampNanoType.withoutZone()).toByteBuffer().array())
.isEqualTo(new byte[] {0, -124, -41, 23, 0, 0, 0, 0});
.isEqualTo(new byte[] {-128, 26, 6, 0, 0, 0, 0, 0});
assertThat(Literal.of(400000L).to(TimestampNanoType.withZone()).toByteBuffer().array())
.isEqualTo(new byte[] {0, -124, -41, 23, 0, 0, 0, 0});
.isEqualTo(new byte[] {-128, 26, 6, 0, 0, 0, 0, 0});

// strings are stored as UTF-8 bytes (without length)
// 'A' -> 65, 'B' -> 66, 'C' -> 67
Expand Down