Add mepo and geos-gcm-env#372
Conversation
|
@mathomp4 This is still a draft PR, but I wanted to get your input on the changes and a hint or two for how to test my build! |
|
I hope to be able to look at/work on this next week. But for @climbfuji 's benefit, here is an attempt from a few months ago I made for both geosgcm and mepo: jcsda_emc_spack_stack...GMAO-SI-Team:spack:feature/mathomp4/add-geosgcm It's quite out of date in that at the time we still used FLAP instead of fArgParse, but I am happy I got pretty close on the mepo package :) |
|
As for how to test the build, well you can try to follow along with: https://github.com/GEOS-ESM/GEOSgcm/wiki/Setting-up-an-AMIP-Experiment but the easier path might be:
That should work...though I'm probably missing some "new user issue" that we'll have to debug when you try (might need to add, say, |
I'll try that. But the real challenge is to repeat the same on my macOS, which is where I built geos-gcm with the changes in this PR. |
Ahh. For macOS (or non-NASA systems) you need something like my TinyBCs. On discover you'll find: copy that locally and extract it somewhere. For example on my mac it's at Then, you can build the model and do the same as above but use (using my path): TinyBCs only does C12 and C24, so you use a lower res here. But that should work. |
Hmm seems like a lot of paths in the gmc_run.j script are hardcoded to copy data from Discover etc. Maybe we can touch base next week and weed through the script? |
Indeed they are, but if you run |
Hmm, I still see things like in the resulting |
Oh. You don't care about those. Those are for a very specific use case (EMIPs) that you don't run. Heck, I can't run them. I know they exist...but that's it. I guess I can look at deleting those from the experiment on non-discover machines...but I guess I'm used to ignoring them. |
|
I commented out the lines for the emip stuff, also removed the basedir logic: When I run the script, I get: and then more errors afterwards. I'll look into this when I have time, but wanted to let you know for now. |
|
I guess I first need to know how to set Like this? |
No. Again. Those are only used by EMIPs so those don't matter (you can see why we are rewriting our setup and run scripts in Python!) But something weird is happening. This: makes me think Also this: needs to at least be: I think. You need the GEOSDIR libraries I'd imagine. (Unless we run from the install dir...) |
|
I believe I got rid of the ford thing by installing gsed. I started over and now I am getting this in gcm_run.j after removing the BASEDIR path from setenv DYLD_LIBRARY_PATH:
(I used `#!/bin/csh -f -v -x` to get more information, the error is `cat: fvcore_layout.rc: No such file or directory`)
```
umask 022
limit stacksize unlimited
setenv ARCH `uname`
setenv SITE JCSDA-L-18146
setenv GEOSDIR /Users/heinzell/scratch/geos-gcm-spack-stack-1.5.1/GEOSgcm/install
setenv GEOSBIN /Users/heinzell/scratch/geos-gcm-spack-stack-1.5.1/GEOSgcm/install/bin
setenv GEOSETC /Users/heinzell/scratch/geos-gcm-spack-stack-1.5.1/GEOSgcm/install/etc
setenv GEOSUTIL /Users/heinzell/scratch/geos-gcm-spack-stack-1.5.1/GEOSgcm/install
setenv DYLD_LIBRARY_PATH ${LD_LIBRARY_PATH}:${GEOSDIR}/lib
setenv RUN_CMD "mpirun -np "
setenv GCMVER `cat $GEOSETC/.AGCM_VERSION`
echo VERSION: $GCMVER
VERSION: GEOSgcm-v11.3.3
setenv EXPID test-c12
setenv EXPDIR ../../experiments/test-c12
setenv HOMDIR ../../experiments/test-c12
setenv RSTDATE @RSTDATE
setenv GCMEMIP @GCMEMIP
if ( ! -e $EXPDIR/restarts ) mkdir -p $EXPDIR/restarts
if ( ! -e $EXPDIR/holding ) mkdir -p $EXPDIR/holding
if ( ! -e $EXPDIR/archive ) mkdir -p $EXPDIR/archive
if ( ! -e $EXPDIR/post ) mkdir -p $EXPDIR/post
if ( ! -e $EXPDIR/plot ) mkdir -p $EXPDIR/plot
if ( $GCMEMIP == TRUE ) then
setenv SCRDIR $EXPDIR/scratch
endif
if ( ! -e $SCRDIR ) mkdir -p $SCRDIR
set NX = `grep '^\s*NX:' $HOMDIR/AGCM.rc | cut -d: -f2`
set NY = `grep '^\s*NY:' $HOMDIR/AGCM.rc | cut -d: -f2`
set AGCM_IM = `grep '^\s*AGCM_IM:' $HOMDIR/AGCM.rc | cut -d: -f2`
set AGCM_JM = `grep '^\s*AGCM_JM:' $HOMDIR/AGCM.rc | cut -d: -f2`
set AGCM_LM = `grep '^\s*AGCM_LM:' $HOMDIR/AGCM.rc | cut -d: -f2`
set OGCM_IM = `grep '^\s*OGCM\.IM_WORLD:' $HOMDIR/AGCM.rc | cut -d: -f2`
set OGCM_JM = `grep '^\s*OGCM\.JM_WORLD:' $HOMDIR/AGCM.rc | cut -d: -f2`
set USE_IOSERVER = 0
set NUM_OSERVER_NODES = `grep '^\s*IOSERVER_NODES:' $HOMDIR/AGCM.rc | cut -d: -f2`
set NUM_BACKEND_PES = `grep '^\s*NUM_BACKEND_PES:' $HOMDIR/AGCM.rc | cut -d: -f2`
if ( $?SLURM_NTASKS ) then
if ( $?PBS_NODEFILE ) then
set NCPUS = NULL
endif
@ MODEL_NPES = $NX * $NY
set NCPUS_PER_NODE = 10
set NUM_MODEL_NODES=`echo "scale=6;($MODEL_NPES / $NCPUS_PER_NODE)" | bc | awk 'function ceil(x, y){y=int(x); return(x>y?y+1:y)} {print ceil($1)}'`
if ( $NCPUS != NULL ) then
@ TOTAL_PES = $MODEL_NPES
endif
if ( $GCMEMIP == TRUE & ! -e $EXPDIR/restarts/$RSTDATE/cap_restart ) then
cd $SCRDIR
/bin/rm -rf *
cp -f $EXPDIR/RC/* .
cp: No match.
cp $EXPDIR/cap_restart .
cp: ../../experiments/test-c12/cap_restart: No such file or directory
cp -f $HOMDIR/*.rc .
cp: No match.
cp -f $HOMDIR/*.nml .
cp: No match.
cp -f $HOMDIR/*.yaml .
cp: No match.
cp $GEOSBIN/bundleParser.py .
cat fvcore_layout.rc >> input.nml
cat: fvcore_layout.rc: No such file or directory
if ( -z input.nml ) then
echo "try cat for input.nml again"
try cat for input.nml again
cat fvcore_layout.rc >> input.nml
cat: fvcore_layout.rc: No such file or directory
endif
if ( -z input.nml ) then
echo "input.nml is zero-length"
input.nml is zero-length
exit 0
```
… On Nov 27, 2023, at 5:10 PM, Matthew Thompson ***@***.***> wrote:
I guess I first need to know how to set
setenv RSTDATE @RSTDATE
setenv GCMEMIP @GCMEMIP
Like this?
setenv RSTDATE "20231101"
setenv GCMEMIP "FALSE"
No. Again. Those are only used by EMIPs so those don't matter (you can see why we are rewriting our setup and run scripts in Python!)
But something weird is happening. This:
cp: /ford1/share/gmao_SIteam/ModelData/GWD_RIDGE/gwd_internal_c12: No such file or directory
makes me think makeoneday.bash didn't do its job. What was the console output when you ran that?
Also this:
setenv DYLD_LIBRARY_PATH ${LD_LIBRARY_PATH}:${BASEDIR}/${ARCH}/lib:${GEOSDIR}/lib
needs to at least be:
setenv DYLD_LIBRARY_PATH ${LD_LIBRARY_PATH}:${GEOSDIR}/lib
I think. You need the GEOSDIR libraries I'd imagine. (Unless we run from the install dir...)
—
Reply to this email directly, view it on GitHub <#372 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5C2RNYRBB4FVN4K36KGHDYGUTXFAVCNFSM6AAAAAA7VSIWF6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRYHA2DKOJQHA>.
You are receiving this because you were mentioned.
|
|
Again, what is the output from running |
|
This is the output of the TinyBCs And this is the output of |
|
Huh. That all looks good to me. Do you have a |
Thanks for checking. Yes, I do: |
|
Huh. Well, can you send me the output from I mean, if the model builds, we should be good... |
Sure, the output is in comment #372 (comment) (it looks like the script itself, because I added |
|
This is the bit I think that screws it up: Can you try running I'm not sure anyone has ever tried passing in a relative path and that might be causing havoc. If that's the issue, I can see how to fix that in python. |
|
Sorry for my slow progress here. I used absolute paths for both create_expt.py and makeoneday.bash and that got me further! Now it's back to segfaulting: I'll take a look but wanted to let you know. |
|
Okay. That means I might need to try this myself. But first: what compilers/mpi/etc. is this using? |
|
***@***.***, ***@***.*** - thanks for your help!
… On Nov 29, 2023, at 9:52 AM, Matthew Thompson ***@***.***> wrote:
Okay. That means I might need to try this myself.
But first: what compilers/mpi/etc. is this using?
—
Reply to this email directly, view it on GitHub <#372 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5C2RPTUWO6QNVJSEYKDETYG5R3HAVCNFSM6AAAAAA7VSIWF6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZSGMZDKNZXGY>.
You are receiving this because you were mentioned.
|
|
@climbfuji I think something got...nuked in your reply. It's all blank. |
Ahh well ... apple-clang@13.1.6 and openmpi@4.1.5, sorry |
|
Also, if you move away/rebuild, probably a good idea to rebuild GEOS. I don't think CMake found that .so, but if it did, maybe the build got wonky? |
|
It failed again, but at least it looks like it is getting the correct mapl libraries: The errors for both |
|
@climbfuji Yeah maybe try removing MAPL from spack-stack and testing that. You might have to manually then install GFE libraries if MAPL was bringing them in. |
Yes - pflogger, fargparse and gftl-shared are already installed in spack-stack, all I had to do was load the modules. It's building. |
|
Little difference after building from scratch w/o mapl in the environment: |
Actually, the model ran! Huzzah! Now So a few things. First, I'm going to mention @bena-nasa here as I'm on leave next week and he might be able to try things while I'm gone. Second, can you point me to your modules and how to get this setup? I just want to see if MPI Hello World falls apart. Third, can you try this same combo of modules but with a Debug build of GEOS? Fourth, what is the backing gcc compiler for your Intel stack? That is, we usually load GCC 11.2.0 and Intel 2021.6.0 so that icc and icpc have gcc/g++-11 as the backer to Intel. |
|
Lots of good questions, and again thank you for your help. Answers inline below.
Thanks!
Will do - stay tuned
|
|
Okay. Well, Hello World works just fine. Perhaps the issue is the fact you are backing with gcc 10.1 instead of gcc 12.1? Maybe that's the next test if the Debug shows nothing? |
|
Debug output is here: At first glance, I didn't see much more useful information than w/o debug. The only change I made for the debug build was adding |
Are you sure that GEOS is that finnicky in terms of compiler versions? I can give this a try in order to nail down the problem, but I am worried if that is really the case we'll be having a very hard time getting geos to run on other systems. |
Maybe worth adding that the stack itself is very likely ok - we run all sorts of JEDI experiments with it: with fv3 (dycore), with geos (compiled externally and pulled in), ... - all 2000-ish ctests that come with jedi-bundle pass as well. |
…. Add prerequisites for building mapl instead
|
@mathomp4 I rebuilt the spack-stack geos-gcm-env with Intel and gcc12 as backend, and redid the tests. Same result. |
|
@climbfuji I've finally recovered from all the craziness that happened after the holiday break. Have you figured anything else out with this? Or still the same problems? In some ways, I think focusing on |
I have not made any progress on this. The only distant memory that came to my mind was that we had problems with the UFS depending on whether ESMF or MAPL (here is where my memory is faint) are compiled dynamically or statically, or with certain variants on or off. @AlexanderRichert-NOAA Do you recall what we had to change to avoid the segfault at the end of the model runs with ESMF/MAPL in the UFS? |
…into feature/geos_gcm_dependencies
…into feature/geos_gcm_dependencies
|
@mathomp4 I think we can finally merge this PR to prepare spack-stack for supporting geos. The actual changes to the spack stack templates and esmf/mapl configs will be in JCSDA/spack-stack#953, which is still being worked on (debugging shared ESMF build errors in CI ...). |
Sounds good as a first go. My guess is at some point there will be a lot more added to |
Thanks! At that point we should try to consolidate the common Python packages between geos-gcm-env, ewok-env and gmao-swell-env to simplify dependency trees. |
…cm-env/package.py
|
I'll merge this after fixing the style errors that black is complaining about. |
Description
This PR adds the mepo package and geos-gcm-env. I was able to build geos-gcm develop on my macOS after building an environment with just geos-gcm-env and loading the modules.
I didn't run any tests though, and certainly need help from @mathomp4 with that.Update. After many hours of debugging and back and forth, I was able to run geos on Discover using the spack-stack libraries (note that geos builds mapl internally, therefore not using the spack-stack version at the moment). Thanks very much to @mathomp4 for all the help.Issue(s) addressed
Resolves JCSDA/spack-stack#242 (turns out on all of these packages are needed for geos-gcm)
Dependencies
n/a
Impact
n/a
Checklist
I have made corresponding changes to the documentation