Skip to content

FAST.Farm speedup with OMP#2730

Closed
andrew-platt wants to merge 17 commits intoOpenFAST:rc-4.0.3from
andrew-platt:f/OMP_speedup_FF
Closed

FAST.Farm speedup with OMP#2730
andrew-platt wants to merge 17 commits intoOpenFAST:rc-4.0.3from
andrew-platt:f/OMP_speedup_FF

Conversation

@andrew-platt
Copy link
Collaborator

Ready to merge

Feature or improvement description
Added some !$OMP parallel directives around obvious pieces that could be parallelized.

A few changes:

  • Increase parallelization with COLLAPSE(2) around ReadHighResWindFile in AWAE.f90 to include inner loop
  • Add parallelization to ComputeLocals
  • Add parallelization to LowResGridCalcOutput (second part)
  • Add parallelization around writing of planes

Related issue, if one exists
#2711

Impacted areas of the software
FAST.Farm may see a speed increase, and should see more effective multi-threading.

Additional supporting information
In the process of profiling, we found a few places that were obviously missing parallelization. One place that remains is the low resolution wind grid reading -- that is all contained in a single VTK, so there is no way to speed it up in the present form.

Test results, if applicable
Test results will not change.

andrew-platt and others added 15 commits March 31, 2025 13:43
The `close(Un)` is not atomic, so it may not have released the file
before declaring the unit available.  This can cause issues with
opening a whole bunch of files simulataneously.

There are other places that need this fix as well.
Co-authored-by: Derek Slaughter <deslaughter@gmail.com>
Changed fileopenNWTCio_critical to fileopen_critical so all file open is
the same OMP critical
Also remove from Read84AryWDefault.  Somehow this was triggering a
segfault with IFX.  No idea how.
Segmentation faults can occur if the OMP PARALLEL DO has enough private
memory per thread that it exceeds the default OMP_STACKSIZE="4 M".  If
this happens, `export OMP_STACKSIZE="32 M"` or suitably large value.

Calculating the values for this don't exactly work out as I would
expect, but are in the ballpark (see code note)
The routine isn't actually used... yet.  But for completeness adding the critical around the close so it isn't an issue later when I actually use the routine
@andrew-platt
Copy link
Collaborator Author

Merge after #2711

@andrew-platt andrew-platt force-pushed the f/OMP_speedup_FF branch 3 times, most recently from a174b2f to 05ec200 Compare April 4, 2025 14:48
@andrew-platt andrew-platt marked this pull request as draft April 7, 2025 21:43
@andrew-platt
Copy link
Collaborator Author

Going to cherry-pick pieces out of this for 4.0.3, then revamp the changes in parallelization (too much OMP overhead how I set it up).

@andrew-platt
Copy link
Collaborator Author

Closing this. Testing showed very little additional speed increases

@andrew-platt andrew-platt deleted the f/OMP_speedup_FF branch April 17, 2025 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments