Skip to content

Add Primitive time/timestamp_nanos/timestampntz_nanos/uuid test file#92

Merged
mapleFU merged 6 commits intoapache:masterfrom
klion26:primitive_time
Aug 14, 2025
Merged

Add Primitive time/timestamp_nanos/timestampntz_nanos/uuid test file#92
mapleFU merged 6 commits intoapache:masterfrom
klion26:primitive_time

Conversation

@klion26
Copy link
Copy Markdown
Member

@klion26 klion26 commented Aug 12, 2025

This wants to add binary artifacts for
primitive_time.[metadata/value], primitive_timestamp_nanos.[metadata/value], primitive_timestampntz_nanos.[metadata/value] and primitive_uuid.[metadata/data]. They were generated by the modified iceberg code, as currently Spark does not support these primitives

modified code to generate the binary artifacts
  private String writeVariantFile(int rowId, Variant variant) throws IOException {
    String variantFile = String.format("case-%03d_row-%d.variant.bin", caseNumber, rowId);

    try (OutputStream out =
            IO.newOutputFile(CASE_LOCATION + "/" + caseNumber + ".metadata").createOrOverwrite()) {
      ByteBuffer bufferMeta =
              ByteBuffer.allocate(variant.metadata().sizeInBytes())
                      .order(ByteOrder.LITTLE_ENDIAN);
      variant.metadata().writeTo(bufferMeta, 0);
      out.write(bufferMeta.array());
    }
    try (OutputStream out =
                 IO.newOutputFile(CASE_LOCATION + "/" + caseNumber + ".value").createOrOverwrite()) {
      ByteBuffer bufferValue =
              ByteBuffer.allocate(variant.value().sizeInBytes())
                      .order(ByteOrder.LITTLE_ENDIAN);
      variant.value().writeTo(bufferValue, 0);
      out.write(bufferValue.array());
    }
    ByteBuffer buffer = ParquetVariantUtil.toByteBuffer(variant.metadata(), variant.value());
    try (OutputStream out =
        IO.newOutputFile(CASE_LOCATION + "/" + variantFile).createOrOverwrite()) {
      out.write(buffer.array());
    }

    return variantFile;
  }

binary artifacts content

~/arrow-rs/parquet-testing (primitive_time) » xxd variant/primitive_time.metadata                                                                                        1 ↵ qiucongxian@bogon
00000000: 0100 00                                  ...
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
~/arrow-rs/parquet-testing (primitive_time) » xxd variant/primitive_time.value                                                                                               qiucongxian@bogon
00000000: 44c0 f229 880a 0000 00                   D..).....
~/arrow-rs/parquet-testing (primitive_time) » xxd variant/primitive_uuid.metadata                                                                                             qiucongxian@bogon
00000000: 0100 00                                  ...
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
~/arrow-rs/parquet-testing (primitive_time) » xxd variant/primitive_uuid.value                                                                                                qiucongxian@bogon
00000000: 50f2 4f9b 6481 fa49 d1b7 4e8c 09a6 e31c  P.O.d..I..N.....
00000010: 56                                       V
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
~/arrow-rs/parquet-testing (primitive_time) » xxd variant/primitive_timestamp_nanos.metadata                                                                                  qiucongxian@bogon
00000000: 0100 00                                  ...
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
~/arrow-rs/parquet-testing (primitive_time) » xxd variant/primitive_timestamp_nanos.value                                                                                     qiucongxian@bogon
00000000: 4815 413a 6cb7 af05 18                   H.A:l....
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
~/arrow-rs/parquet-testing (primitive_time) » xxd variant/primitive_timestampntz_nanos.metadata                                                                               qiucongxian@bogon
00000000: 0100 00                                  ...
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
~/arrow-rs/parquet-testing (primitive_time) » xxd variant/primitive_timestampntz_nanos.value                                                                                  qiucongxian@bogon
00000000: 4c15 413a 6cb7 af05 18                   L.A:l....

@klion26
Copy link
Copy Markdown
Member Author

klion26 commented Aug 13, 2025

This wants to add the binary artifacts for Variant::TimeNTZ, so that we can use them to do cross-language testing

cc @mapleFU @alamb and @aihuaxu

Comment thread variant/data_dictionary.json Outdated
"short_string": "Less than 64 bytes (\u2764\ufe0f with utf8)"
} No newline at end of file
"short_string": "Less than 64 bytes (\u2764\ufe0f with utf8)",
"primitive_time": "12:33:54:123456"
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value copied from iceberg test

Comment thread variant/regen.py Outdated
@@ -78,6 +78,7 @@
-- https://github.com/apache/parquet-testing/issues/79
-- is not clear how to create the following types using Spark SQL
-- TODO TimeNTZ (Type ID 17)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will do it

Comment thread variant/regen.py Outdated
Comment on lines 82 to 84
-- TODO 'timestamp with timezone (NANOS)' (Type ID 18)
-- TODO 'timestamp with time zone (NANOS)' (Type ID 19)
-- TODO 'UUID' (Type ID 20)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind generate other types?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will update the pr

@klion26
Copy link
Copy Markdown
Member Author

klion26 commented Aug 13, 2025

@mapleFU I've updated the pr, please have another look when you're free, thanks.

~/arrow-rs/parquet-testing (primitive_time) » xxd variant/primitive_uuid.metadata                                                                                             qiucongxian@bogon
00000000: 0100 00                                  ...
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
~/arrow-rs/parquet-testing (primitive_time) » xxd variant/primitive_uuid.value                                                                                                qiucongxian@bogon
00000000: 50f2 4f9b 6481 fa49 d1b7 4e8c 09a6 e31c  P.O.d..I..N.....
00000010: 56                                       V
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
~/arrow-rs/parquet-testing (primitive_time) » xxd variant/primitive_timestamp_nanos.metadata                                                                                  qiucongxian@bogon
00000000: 0100 00                                  ...
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
~/arrow-rs/parquet-testing (primitive_time) » xxd variant/primitive_timestamp_nanos.value                                                                                     qiucongxian@bogon
00000000: 4815 413a 6cb7 af05 18                   H.A:l....
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
~/arrow-rs/parquet-testing (primitive_time) » xxd variant/primitive_timestampntz_nanos.metadata                                                                               qiucongxian@bogon
00000000: 0100 00                                  ...
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
~/arrow-rs/parquet-testing (primitive_time) » xxd variant/primitive_timestampntz_nanos.value                                                                                  qiucongxian@bogon
00000000: 4c15 413a 6cb7 af05 18                   L.A:l....

@klion26 klion26 requested a review from mapleFU August 13, 2025 03:13
@mapleFU
Copy link
Copy Markdown
Member

mapleFU commented Aug 13, 2025

I would checking these files after work about 8PM in UTC-8

@klion26 klion26 changed the title Add Primitive time test file Add Primitive time/timestamp_nanos/timestampntz_nanos/uudi test file Aug 13, 2025
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Aug 13, 2025

Thanks @mapleFU and @klion26

@mapleFU mapleFU changed the title Add Primitive time/timestamp_nanos/timestampntz_nanos/uudi test file Add Primitive time/timestamp_nanos/timestampntz_nanos/uuid test file Aug 13, 2025
Comment thread variant/regen.py Outdated
-- TODO 'timestamp with timezone (NANOS)' (Type ID 18)
-- TODO 'timestamp with time zone (NANOS)' (Type ID 19)
-- TODO 'UUID' (Type ID 20)
-- binary artifacts of 'TimeNTZ'/'timestamp with timezone (NANOS)'/'timestamp with time zone (NANOS)'/'UUID' was generated by the iceberg test code, please ref to https://github.com/apache/parquet-testing/pull/92 for more detail
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestamp with timezone is duplicated?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, there is one for without, fixed it.

Comment thread variant/data_dictionary.json Outdated
} No newline at end of file
"primitive_timestamp_nanos": "2024-11-07T12:33:54.123456789+00:00",
"primitive_timestampntz_nanos": "2024-11-07T12:33:54.123456789",
"short_string": "Less than 64 bytes (\u2764\ufe0f with utf8)",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ordering here is a bit weird, can we order this by alphabet?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, fixed it.

@mapleFU
Copy link
Copy Markdown
Member

mapleFU commented Aug 13, 2025

Other verified and looks ok to me

@mapleFU
Copy link
Copy Markdown
Member

mapleFU commented Aug 14, 2025

Before merging, can you update the readme with latest info?

@mapleFU
Copy link
Copy Markdown
Member

mapleFU commented Aug 14, 2025

(Sorry, I mean pr description, my bad...)

@klion26
Copy link
Copy Markdown
Member Author

klion26 commented Aug 14, 2025

@mapleFU I've updated the readme, please take another look when you're free, thanks.

@klion26
Copy link
Copy Markdown
Member Author

klion26 commented Aug 14, 2025

@mapleFU, thanks for the reply. I've updated the PR description. Do I need to revert the latest commit?

@mapleFU mapleFU merged commit 5cbfc43 into apache:master Aug 14, 2025
@klion26 klion26 deleted the primitive_time branch August 14, 2025 06:30
@klion26
Copy link
Copy Markdown
Member Author

klion26 commented Aug 14, 2025

@mapleFU Thanks for the review and merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants