Skip to content

failure running splitAndBalanceMesh on large mixed mesh #35

@cwsmith

Description

@cwsmith

After splitting, during hypergraph creation (mesh elements are graph nodes, mesh vertices -> hyperedges), the pin connection process segfaults with the a bad allocation here:

lid_t* pl = pin_list[type] = new lid_t[nlp];

The following print statement was added before that line to help determine the cause:

fprintf(stderr, "%d nlp %d nle %d\n", PCU_Comm_Self(), nlp, nle);

The output from a run on Pleiades is pasted below. Two ranks print large negative values for nlp:

57 nlp -1146837624 nle 51071
56 nlp -841074879 nle 52681

input mesh and model

/projects/tools/Models/BoeingBump/LES_DNS_Meshing/FPS-MTW-6-15/MGEN

stdout/err

> mpiexec -n 120 ~/buildengpar/test/splitAndBalanceMesh outModel.dmg outMesh/ 0 2 
ENGPAR Git hash 68853710c2dc936006d0eb9446434724f8a21ce6
mesh outMesh/ loaded in 3.733180 seconds
number of tet 18685187 hex 0 prism 9682240 pyramid 0
mesh entity counts: v 8009471 e 41503118 f 61787347 r 28367427
planned Zoltan split factor 2 to target imbalance 1.100000 in 2.935652 seconds
mesh expanded from 60 to 120 parts in 1.240766 seconds
mesh migrated from 60 to 120 in 11.678687 seconds
PARMA_STATUS disconnected <max avg> 0 0.000
PARMA_STATUS neighbors <max avg> 19 9.983
PARMA_STATUS smallest side of max neighbor part 29
PARMA_STATUS num parts with max neighbors 1
PARMA_STATUS empty parts 0
PARMA_STATUS small neighbor counts 1:0 2:2 3:0 4:0 5:0 6:2 7:0 8:0 9:2 10:0 
PARMA_STATUS weighted vtx <tot max min avg> 8682895.0 112373.0 34920.0 72357.458
PARMA_STATUS weighted edge <tot max min avg> 43045935.0 435313.0 229585.0 358716.125
PARMA_STATUS weighted face <tot max min avg> 62730587.0 589500.0 355227.0 522754.892
PARMA_STATUS weighted rgn <tot max min avg> 28367427.0 290030.0 142546.0 236395.225
PARMA_STATUS owned bdry vtx <tot max min avg> 576843 12960 0 4807.025
PARMA_STATUS shared bdry vtx <tot max min avg> 1176539 16528 1510 9804.492

error rPARMA_STATUS model bdry vtx <tot max min avg> 266000 10172 0 2216.667
PARMA_STATUS sharedSidesToElements <max min avg> 0.111 0.010 0.068
PARMA_STATUS entity imbalance <v e f r>: 1.55 1.21 1.13 1.23
99 nlp 79993306 nle 39737
59 nlp -267509587 nle 50000
terminate called after throwing an instance of ‘std::bad_array_new_length’
 what(): std::bad_array_new_length
108 nlp 952788 nle 41193
57 nlp -1146837624 nle 51071
112 nlp 1046825 nle 45320
terminate called after throwing an instance of ‘std::bad_array_new_length’
 what(): std::bad_array_new_length
51 nlp 927740 nle 39326
102 nlp 1192183 nle 51189
109 nlp 838136 nle 34920
33 nlp 977030 nle 41241
92 nlp 1045943 nle 44215
93 nlp 1111342 nle 47043
118 nlp 1160873 nle 49411
113 nlp 1070532 nle 45309
117 nlp 1082932 nle 45770
MPT ERROR: Rank 59(g:59) received signal SIGABRT/SIGIOT(6).
	Process ID: 91212, Host: r575i6n4, Program: /home5/kjansen/buildengpar/test/splitAndBalanceMesh
	MPT Version: HPE MPT 2.17 11/30/17 08:08:29
MPT: --------stack traceback-------
36 nlp 1061060 nle 44968
37 nlp 1110348 nle 46773
104 nlp 997048 nle 42346
105 nlp 1089882 nle 46158
98 nlp 1240290 nle 53230
34 nlp 1147389 nle 48452
90 nlp 1083250 nle 45724
54 nlp 1159270 nle 48648
107 nlp 1133343 nle 47904
101 nlp 1915291810 nle 49541
119 nlp 1222668 nle 51540
100 nlp 1231264 nle 52035
27 nlp 1228810 nle 52584
43 nlp 1174608 nle 49490
52 nlp 1156365 nle 48880
103 nlp 1221699 nle 51715
91 nlp 1162613 nle 100807
47 nlp 1180836 nle 49838
39 nlp 1229520 nle 51761
55 nlp 1359121 nle 52562
106 nlp 1097030 nle 92104
96 nlp 1047691 nle 71326
58 nlp -1569407485 nle 52478
45 nlp 1156956 nle 48946
53 nlp 1238186 nle 52231
110 nlp 1140810 nle 48316
115 nlp 1232744 nle 52366
terminate called after throwing an instance of ‘std::bad_array_new_length’
46 nlp 1206098 nle 50625
48 nlp 1204564 nle 50627
32 nlp 1150728 nle 48542
83 nlp 990491 nle 50397
111 nlp 1124040 nle 46817
 what(): std::bad_array_new_length
42 nlp 1238392 nle 52403
31 nlp 1247900 nle 52705
50 nlp 1231181 nle 51662
38 nlp 1261028 nle 52832
56 nlp -841074879 nle 52681
44 nlp 1262759 nle 53363
30 nlp 1254895 nle 52612
35 nlp 1267186 nle 53494
terminate called after throwing an instance of ‘std::bad_array_new_length’
41 nlp 1229760 nle 51681
64 nlp 1256866 nle 75894
28 nlp 1253121 nle 52910
 what(): std::bad_array_new_length
114 nlp 1265824 nle 53089
24 nlp 1217616 nle 50886
49 nlp 1260068 nle 52762
82 nlp 890976 nle 68804
29 nlp 1265666 nle 53055
17 nlp 1211961 nle 62727
66 nlp 1097870 nle 93892
78 nlp 923136 nle 77637
116 nlp 1266674 nle 53357
40 nlp 1212876 nle 50810
80 nlp 971814 nle 82007
84 nlp 1239423 nle 105484
15 nlp 1290996 nle 95015
26 nlp 1268241 nle 52986
75 nlp 1067598 nle 73259
77 nlp 1116858 nle 79411
73 nlp 1038174 nle 87581
69 nlp 1223016 nle 104134
61 nlp 1242092 nle 81230
97 nlp 1175911 nle 74222
6 nlp 1276641 nle 108635
60 nlp 1260873 nle 108003
95 nlp 1265598 nle 108429
8 nlp 1203720 nle 102109
22 nlp 1282856 nle 70829
86 nlp 1124802 nle 94993
88 nlp 1283457 nle 90131
21 nlp 1297034 nle 91652
25 nlp 1300891 nle 74589
70 nlp 1161988 nle 97269
63 nlp 1293464 nle 84892
89 nlp 1295568 nle 110380
3 nlp 1227291 nle 104297
18 nlp 1307321 nle 96492
74 nlp 1299909 nle 110652
13 nlp 1263841 nle 105356
68 nlp 1268243 nle 82514
4 nlp 1208871 nle 103201
81 nlp 1314015 nle 102439
19 nlp 1289680 nle 89270
72 nlp 1243826 nle 81195
71 nlp 1315848 nle 110669
85 nlp 1256709 nle 106822
79 nlp 1252092 nle 106791
65 nlp 1251204 nle 106011
76 nlp 1221766 nle 95822
94 nlp 1321065 nle 111892
20 nlp 1310938 nle 92005
23 nlp 1310037 nle 88603
9 nlp 1261650 nle 106179
1 nlp 1309599 nle 111764
2 nlp 1317027 nle 112373
10 nlp 1307088 nle 110946
67 nlp 1228118 nle 94014
62 nlp 1227882 nle 88072
14 nlp 1276752 nle 103633
12 nlp 1210623 nle 95865
16 nlp 1331516 nle 96747
87 nlp 1343556 nle 111219
7 nlp 1304574 nle 109850
11 nlp 1275648 nle 106917
5 nlp 1325682 nle 111950
0 nlp 1319316 nle 107363

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions