Skip to content

Fix FAST.Farm issues with OMP (segfaults mostly)#2711

Merged
andrew-platt merged 14 commits intoOpenFAST:rc-4.0.3from
andrew-platt:b/OMP_file_open_fixes
Apr 8, 2025
Merged

Fix FAST.Farm issues with OMP (segfaults mostly)#2711
andrew-platt merged 14 commits intoOpenFAST:rc-4.0.3from
andrew-platt:b/OMP_file_open_fixes

Conversation

@andrew-platt
Copy link
Collaborator

@andrew-platt andrew-platt commented Mar 31, 2025

Ready to merge

Feature or improvement description
There have been unit number collisions with OMP parallelization around ReadHighResWindFile causing FAST.Farm to frequently fail for unknown reasons. This problem was tracked down using the IFX compiler and may partially exist with GCC with OpenMP.

A few changes:

  • speedup of GetNewUnit with revised logic
  • increase maximum number of units to 2^16-1 = 65535 - this should work for most clusters, but may cause issues on smaller machines (increase it with ulimit -n # on *nix machines, no clue how to change on Windows).
  • wrap close(Un) in !$OMP critical(fileopen_critical) - this was the main problem
  • Add note about IntelLLVM debug flag nonuninit (forgot to add in Add "nouninit" to debug flags for IntelLLVM #2709)
  • Add trouble shooting info for FAST.Farm segfaults due to OMP stacksize
  • Add (in x.y days) to the simulation status message (useful for really long simulations where end day is ambiguous)

Related issue, if one exists
Many issues in the past, some reported on GH (sorry, you'll have to search as I'm feeling lazy right now)

Impacted areas of the software
FAST.Farm should see a small increase in speed, but more importantly should no longer have unit number collisions between threads

Additional supporting information
NOTE: there are likely other close(Unit) locations that need this !$OMP critical(fileopen_critical) wrapping

Test results, if applicable
Tests are run single threaded to reduce load on GH actions, so no changes there.

Thanks to @deslaughter for help in debugging.

The `close(Un)` is not atomic, so it may not have released the file
before declaring the unit available.  This can cause issues with
opening a whole bunch of files simulataneously.

There are other places that need this fix as well.
@andrew-platt andrew-platt added this to the v4.0.3 milestone Mar 31, 2025
@andrew-platt andrew-platt self-assigned this Mar 31, 2025
andrew-platt and others added 11 commits April 3, 2025 22:59
Co-authored-by: Derek Slaughter <deslaughter@gmail.com>
Changed fileopenNWTCio_critical to fileopen_critical so all file open is
the same OMP critical
Also remove from Read84AryWDefault.  Somehow this was triggering a
segfault with IFX.  No idea how.
Segmentation faults can occur if the OMP PARALLEL DO has enough private
memory per thread that it exceeds the default OMP_STACKSIZE="4 M".  If
this happens, `export OMP_STACKSIZE="32 M"` or suitably large value.

Calculating the values for this don't exactly work out as I would
expect, but are in the ballpark (see code note)
The routine isn't actually used... yet.  But for completeness adding the critical around the close so it isn't an issue later when I actually use the routine
@andrew-platt andrew-platt force-pushed the b/OMP_file_open_fixes branch from 2468228 to 5361499 Compare April 4, 2025 05:01
@andrew-platt andrew-platt changed the title Modify OMP parrallelization in FF ReadHighResWindFile Fix FAST.Farm issues with OMP (segfaults mostly) Apr 4, 2025
@andrew-platt andrew-platt merged commit c81d4d5 into OpenFAST:rc-4.0.3 Apr 8, 2025
22 checks passed
This was referenced Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments