Skip to content

Conversation

@LizBaldo
Copy link
Collaborator

@LizBaldo LizBaldo commented Apr 16, 2025

Jira ticket: https://broadworkbench.atlassian.net/browse/AN-503

Summary of changes

What

Why

  • We need to keep up with the latest releases from the Hail team to enable our users to continue do their analyses on a tool that remains supported
  • Being able to cache both dataproc 2.1.x and 2.2.x will help the AOU team do testing and transition to their new 2.2.16 image on their own timeline

Testing these changes

I pointed my BEE to this PR and was able to successfully launch a Hail and AOU image with both a spark single node, and a spark cluster with 2 nodes.
When opening a jupyter notebook, I can import and initialize the new version of hail:

Screenshot 2025-05-02 at 2 40 03 PM

I also was able to launch the AOU image that is currently I prod using the legacy Dataproc 2.1 image. So we should be safe to merge this as it won't impact RWB and they can move over to Dataproc 2.2 when they want:

Screenshot 2025-05-06 at 3 38 31 PM
  • This change is covered by automated tests
    • NB: Rerun automation tests on this PR by commenting jenkins retest or jenkins multi-test.
  • I validated this change
  • Primary reviewer validated this change
  • I validated this change in the dev environment

@codecov
Copy link

codecov bot commented Apr 16, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.67%. Comparing base (f58279b) to head (7c9abd7).
Report is 1 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #4839      +/-   ##
===========================================
- Coverage    74.67%   74.67%   -0.01%     
===========================================
  Files          166      166              
  Lines        14623    14622       -1     
  Branches      1156     1143      -13     
===========================================
- Hits         10920    10919       -1     
  Misses        3703     3703              
Files with missing lines Coverage Δ
...bench/leonardo/config/ClusterResourcesConfig.scala 100.00% <ø> (ø)
...titute/dsde/workbench/leonardo/config/Config.scala 97.78% <100.00%> (+<0.01%) ⬆️
...te/dsde/workbench/leonardo/util/BucketHelper.scala 60.00% <100.00%> (+0.42%) ⬆️
.../workbench/leonardo/util/DataprocInterpreter.scala 63.15% <100.00%> (-0.46%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f58279b...7c9abd7. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

STEP_TIMINGS=($(date +%s))


## Installs Google Cloud Ops Agent that is now required for Datapoc 2.2.X ###
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main change in addition to updating docker compose, and that sneaky change in external IP assignment behavior.

It is annoying that the new log agent does not come pre-built into the dataproc image itself, but the install and setup was not too too bad in the end

@LizBaldo LizBaldo marked this pull request as ready for review May 2, 2025 18:59
@LizBaldo LizBaldo requested a review from a team as a code owner May 2, 2025 18:59
Copy link
Contributor

@aednichols aednichols left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the detailed comments.

@lucymcnatt lucymcnatt self-requested a review May 5, 2025 15:20
@LizBaldo
Copy link
Collaborator Author

LizBaldo commented May 5, 2025

@Qi77Qi I modified the PR to make sure that Leonardo can support both the deployment of the AOU 2.2.13 image on Dataproc 2.1.x (aka what you currently have in production), and AOU 2.2.16 image on Dataproc 2.2.x. I will do some testing on my BEE, but this should let us release the new hail/dataproc version on terra without impacting RWB (you can switch your pre prod / prod environments on your own timeline).
Note that the dataproc 2.1.x image will expire in 60 days, so please let the team know if you have not migrated to using the new AOU / dataproc image by then

@LizBaldo LizBaldo merged commit 7fa8d78 into develop May 6, 2025
23 checks passed
@LizBaldo LizBaldo deleted the AN-503-update-to-dataproc-2.2.x branch May 6, 2025 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants