Skip to content

Conversation

@AritraDey-Dev
Copy link
Member

Description

Running a workflow that has a db connection and attempts to download met data is failing silently after the meta-analysis step with the following error:

 Error in sample.int(length(x), size, replace, prob) : 
  invalid first argument
Calls: <Anonymous> -> <Anonymous> -> <Anonymous> -> sample -> sample.int

The same issue occurs in the Docker stack tests.

This PR fixes a regression in convert_input.R where met file downloads were silently skipped, causing downstream sample.int errors in ensemble.R when the workflow fell back to missing met data after meta-analysis. The fix relaxes an incorrect NULL check and ensures check_missing_files returns a named list.

Motivation and Context

Review Time Estimate

  • Immediately
  • Within one week
  • When possible

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • My name is in the list of CITATION.cff
  • I agree that PEcAn Project may distribute my contribution under any or all of
    • the same license as the existing code,
    • and/or the BSD 3-clause license.
  • I have updated the CHANGELOG.md.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Signed-off-by: Aritra Dey <adey01027@gmail.com>
@AritraDey-Dev
Copy link
Member Author

Just to add, after digging into the git history, this issue seems to have been introduced in PR #3338.

Copy link
Member

@mdietze mdietze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with sorting out the requested fixes, but ultimately they shouldn't be causing the issues you are discussing -- the meta-analysis code should not depend on the met data from convert.inputs in any way.

# Get machine information
machine.info <- get_machine_info(host, input.args = input.args, input.id = input.id, con = con)

if (any(sapply(machine.info, is.null))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this change. If there's only 1 machine info it seems like the old version should still work, but if there's more than 1 the new version will definitely fail

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_machine_info returns a single list with three named elements: list(machine = ..., input = ..., dbfile = ...). It does not return a list of multiple machine infos.

For a new input (which is the case failing here), input.id is NULL. In this scenario, get_machine_info explicitly sets input and dbfile to NULL and returns:

list(machine = <data.frame>, input = NULL, dbfile = NULL)

This is a valid state for a new file download.

However, the old check any(sapply(machine.info, is.null)) iterates over these three elements. Since input and dbfile are NULL, it evaluates to TRUE, incorrectly flagging this valid state as a fatal error and stopping the workflow.

The new check is.null(machine.info) is correct because get_machine_info returns NULL (the whole object) only when the machine lookup itself fails, which is the actual error condition we want to catch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old check also behaved badly in the single null case because sapply(NULL, is.null) returns an empty list and then any(list()) returns FALSE.

Copy link
Member

@infotroph infotroph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tracking this down!

@AritraDey-Dev
Copy link
Member Author

meta-analysis code should not depend on the met data from convert.inputs in any way.

Yes i agree it is not a meta analysis issue,the thing is convert_input fails silently,so no met data is downloaded and after meta analysis is run(since it don't require met data), then it proceeds to ensemble run (here we need met data) and which is empty now,so giving sample.int error.

@infotroph infotroph added this pull request to the merge queue Dec 18, 2025
Merged via the queue into PecanProject:develop with commit 3b6cc67 Dec 18, 2025
19 of 26 checks passed
@AritraDey-Dev AritraDey-Dev deleted the fix-convert-input-regression branch December 18, 2025 18:42
@infotroph
Copy link
Member

@robkooper should we pull this into release too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants