Skip to content

build: Add basic CI test pipelines#18

Merged
sunchao merged 2 commits intoapache:mainfrom
sunchao:add-ci
Feb 14, 2024
Merged

build: Add basic CI test pipelines#18
sunchao merged 2 commits intoapache:mainfrom
sunchao:add-ci

Conversation

@sunchao
Copy link
Copy Markdown
Member

@sunchao sunchao commented Feb 13, 2024

This adds some basic CI test pipelines for the project. Basically run tests in the Rust and Java/Scala side within the repo.

Closes #7

@sunchao
Copy link
Copy Markdown
Member Author

sunchao commented Feb 13, 2024

Need to find a way to test this. Also this only run tests in Linux for Spark 3.4 atm, and we need more combinations.

@sunchao sunchao force-pushed the add-ci branch 7 times, most recently from 9a6cd49 to 1d82548 Compare February 14, 2024 05:11
@sunchao sunchao marked this pull request as ready for review February 14, 2024 05:38
@sunchao
Copy link
Copy Markdown
Member Author

sunchao commented Feb 14, 2024

@viirya @andygrove this is ready now. Tested here.

Comment thread core/src/errors.rs
/// See [`object_panic_exception`] for a test which involves generating a panic and verifying
/// that the resulting stack trace includes the offending call.
#[test]
#[ignore]
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the golden file for this test is not added yet, so ignore for now


tpcdsQueries.foreach { q =>
test(s"check simplified (tpcds-v1.4/$q)") {
ignore(s"check simplified (tpcds-v1.4/$q)") {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depends on #10

Comment thread .github/workflows/pr_build.yml Outdated
run: |
cd core
# This is required to run some JNI related tests on the Rust side
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME/lib:$JAVA_HOME/lib/server cargo test
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RUST_BACKTRACE=1?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea added


- name: Run tests
run: |
SPARK_HOME=`pwd` ./mvnw clean install
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we use ./mvnw verify in Makefile, do we need to do that?

Copy link
Copy Markdown
Member

@viirya viirya Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need BOSON_CONF_DIR?

Copy link
Copy Markdown
Member Author

@sunchao sunchao Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

install covers verify: https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html#a-build-lifecycle-is-made-up-of-phases

COMET_CONF_DIR is optional and by default, Rust test outputs are directed to stdout.

@viirya
Copy link
Copy Markdown
Member

viirya commented Feb 14, 2024

There are a log message like this:

Downloaded from central: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0.22/plexus-utils-3.0.22.jar (245 kB at 2.3 MB/s)
Progress (1): 1.0/2.3 MB
Progress (1): 1.1/2.3 MB
Progress (1): 1.1/2.3 MB
...

It is really distracting for looking at the logs. Can we silent it?

@sunchao
Copy link
Copy Markdown
Member Author

sunchao commented Feb 14, 2024

It is really distracting for looking at the logs. Can we silent it?

Hmm perhaps we can cache the Maven dependencies so these will only show up in the first time. Let me check

@sunchao sunchao merged commit 311ef6b into apache:main Feb 14, 2024
@sunchao
Copy link
Copy Markdown
Member Author

sunchao commented Feb 14, 2024

Thanks! merged. Let me check if the Maven cache did work. I'll create followups if it doesn't.

@sunchao sunchao deleted the add-ci branch February 14, 2024 15:40
schenksj added a commit to schenksj/datafusion-comet that referenced this pull request Apr 12, 2026
Fixes all 14 previously-deferred review findings:

apache#4  Case-sensitivity in DV column detection: isDeltaDvFilterPattern
    and findAndStripDeltaScanBelow now use equalsIgnoreCase for the
    __delta_internal_is_row_deleted column name match.

apache#8  S3 key documentation: added comment in JNI documenting the
    Hadoop-style key names that storageOptions carries and how
    extract_storage_config maps them.

apache#10 Proto comment inaccuracy: updated reserved field number comments
    to describe purpose rather than referencing (now-stale) phase
    numbers. Added field numbering strategy note on DeltaScanCommon.

apache#11 Module quarantine docs: updated delta/mod.rs doc comment to note
    that create_object_store returns Arc<dyn ObjectStore> from
    object_store_kernel 0.12, and that it never escapes the module.
    Updated public API listing to match current exports.

apache#12 Optimizer rule double-init: added synchronized double-checked
    locking on the CometDeltaDvConfigRule to prevent concurrent
    threads from racing on the config set.

apache#14 Incomplete partition type support: castPartitionString now throws
    IllegalArgumentException for unsupported types (STRUCT, ARRAY,
    MAP, etc.) instead of silently converting to UTF8String.

apache#6  DV materialization clarity: added comment explaining why
    .unwrap_or_default() is safe (get_row_indexes returns Ok(None)
    only if has_vector() lied, which kernel guarantees doesn't happen;
    Err propagates via ?).

apache#17 Consistent JNI null handling: extracted read_string_array helper
    for reading Java String[] into Vec<String>, consolidating the
    null-check + iteration pattern.

apache#18-19 Proto field ordering: added numbering strategy comment to
    DeltaScanCommon and DeltaScanTask messages.

apache#20 Memory note: added comment about potential driver OOM on
    extremely large tables (millions of files) with suggestion for
    future streaming/chunked processing.

Tests: succeeded 35, failed 0, canceled 0, ignored 0, pending 0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create CI/CD pipelines

3 participants