consider this model, which has 10,000 segments, each with one reach:
import flopy.modflow as fm
size = 100
model = fm.Modflow('junk')
dis = fm.ModflowDis(model, nlay=1, nrow=size, ncol=size, top=1, botm=0)
rd = fm.ModflowSfr2.get_empty_reach_data(size**2)
rd['iseg'] = range(len(rd))
rd['ireach'] = 1
sd = fm.ModflowSfr2.get_empty_segment_data(size**2)
sd['nseg'] = range(len(sd))
sfr = fm.ModflowSfr2(reach_data=rd, segment_data=sd, model=m)
The last call to ModflowSfr2.__init__ takes about 21 sec on my machine, but for another model with 15k segments and 25k reaches, loading the SFR package took 4 minutes, with almost all that time spent in __init__. Looking at results from cprofile:
import cProfile
pr = cProfile.Profile()
pr.enable()
sfr = fm.ModflowSfr2(reach_data=rd, segment_data=sd, model=m)
pr.disable()
pr.print_stats(sort='cumtime')
It looks like the culprit is the all_segments method, which accounts for most of the time. Looking at all_segments, it looks like it is only used in the context of nseg and outseg, in other words, the routing. It seems like it was created because SFR2 doesn't require all of the segments to be listed in the first stress period. Is this really the case?
Regardless, I would propose replacing all_segments with the graph property (a dictionary that is intended to define the segment routing, and therefore should be including all of the segments anyways). Using a dictionary should be much more efficient than getting and setting recarrays in a for: loop, and the graph property could be configured so that it only gets executed once during __init__, or when the segment numbering changes. @jlarsen-usgs, would this work for what you intended all_segments to do?
consider this model, which has 10,000 segments, each with one reach:
The last call to
ModflowSfr2.__init__takes about 21 sec on my machine, but for another model with 15k segments and 25k reaches, loading the SFR package took 4 minutes, with almost all that time spent in__init__. Looking at results from cprofile:It looks like the culprit is the
all_segmentsmethod, which accounts for most of the time. Looking atall_segments, it looks like it is only used in the context ofnsegandoutseg, in other words, the routing. It seems like it was created because SFR2 doesn't require all of the segments to be listed in the first stress period. Is this really the case?Regardless, I would propose replacing
all_segmentswith thegraphproperty (a dictionary that is intended to define the segment routing, and therefore should be including all of the segments anyways). Using a dictionary should be much more efficient than getting and setting recarrays in a for: loop, and the graph property could be configured so that it only gets executed once during__init__, or when the segment numbering changes. @jlarsen-usgs, would this work for what you intendedall_segmentsto do?