Skip to content

Add OCS/GCS backend support#342

Open
ernst-bablick wants to merge 1 commit intomschubert:masterfrom
ernst-bablick:feature/ocs-gcs-backend
Open

Add OCS/GCS backend support#342
ernst-bablick wants to merge 1 commit intomschubert:masterfrom
ernst-bablick:feature/ocs-gcs-backend

Conversation

@ernst-bablick
Copy link

As discussed in #341, this PR adds native backend support for OCS and GCS.

Summary of changes:

  • Added OCS and GCS as separate scheduler backends.
  • Implemented dedicated submission templates for both backends.
  • Switched the primary job identifier from job name to job ID (job names are not guaranteed to be unique).
  • Added a finalize handler.
  • Adjusted cleanup handling (the current SGE implementation appears to reset the flag before finalize() is executed).
  • Added documentation, including references to OCS/GCS manuals and inline template documentation.
  • Added CLion's .idea/ to .gitignore.

Testing:

  • Verified functionality with OCS/GCS 9.1.0beta1.
  • Added an automated test for clustermq + GCS integration.
  • Added an R Mandelbrot example to GCS that demonstrates usage.
  • Opened a Jira issue regarding qsub -terse handling in 9.0.x (expected to be fixed in 9.0.12).
  • Added howto to GCS distribution with references to you and this project

Many thanks for the helpful feedback and suggestions.

Best regards,
Ernst

Copy link
Owner

@mschubert mschubert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I added some comments, please have a look and explain/amend as appropriate.

Comment on lines +13 to +15
qname = c("SLURM", "LSF", "SGE", "GCS", "OCS", "LOCAL")
exec = Sys.which(c("sbatch", "bsub", "qsub", "qsub", "qsub"))
select = c(which(nchar(exec) > 0), 6)[1]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work: We're checking available shell commands to guess the scheduler. If qsub is available, this is SGE (by assumption). We can not distinguish between PBS, Torque, GCS, OCS; so it makes no sense to check these here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your recommendation here?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to leave the code as it was before

Comment on lines +85 to +86
* [GCS](#gcs) - *works without setup*
* [OCS](#ocs) - *works without setup*
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect, needs clustermq.scheduler to be set (see PBS/Torque)

Comment on lines +54 to +55
* [GCS](https://mschubert.github.io/clustermq/articles/userguide.html#gcs) - *works without setup*
* [OCS](https://mschubert.github.io/clustermq/articles/userguide.html#ocs) - *works without setup*
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect, needs clustermq.scheduler to be set (see PBS/Torque)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Now I understand. From the scheduler end there is no change required. A default OCS/GCS installtion is sufficient.

the missing options.


### GCS
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The templates and docs of GCS/OCS are a lot more verbose than for the other schedulers. We should try to be consistent here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that I should remove helpfull comments?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make it more in the style of the SGE docs?

Comment on lines +61 to +64
OCS = R6::R6Class("OCS",
inherit = QSys,

public = list(
Copy link
Owner

@mschubert mschubert Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the SGE initializer not reused here? Job names are guaranteed to be unique within clustermq; but if IDs are better, we should use them in SGE as well

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no access to SGE and do not know if job names were unique back then with the old Sun Microsystems release.

log_worker=FALSE, log_file=NULL, verbose=TRUE) {
super$initialize(addr=addr, master=master, template=template)

# fill the template with options and required fields
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please only add comments/whitespace where they add value. If the function is called fill_template, a comment with fill the template is superfluous.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

private$template_error(class(self)[1], status, filled)

# try to read the job ID from stdout. On error stop
private$job_id <- regmatches(private$qsub_stdout, regexpr("^[0-9]+", private$qsub_stdout))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be consistent with assignments; we use = by convention in this project

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Comment on lines +100 to +101
# first call finalize to send qdel ...
private$finalize()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qdel should only be called if there are still running jobs, not otherwise. I believe the cleanup implementation in SGE is correct.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. This did not work for me when i had the same code for OCS/GCS. qdel was not triggered although the worker tasks where processed and there were still pending jobs...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants