Please take a look at the following example:
ref = "
if ( db . getCollectionNames () . contains ( collectionName ) ) {
db . getCollection ( collectionName ) . drop () ;
mongoDBCollections . remove ( collectionName ) ;
}"
example1 = "
if ( ( collectionName != null ) && ( ! ( db . getCollectionNames () . contains ( collectionName ) ) ) ) {
db . getCollection ( collectionName ) . drop () ;
mongoDBCollections . remove ( collectionName ) ;
}"
example2 = "
if ( ( ( db ) != null ) && ( ! ( db . getCollectionNames () . contains ( collectionName ) ) ) ) {
db . getCollection ( collectionName ) . drop () ;
mongoDBCollections . remove ( collectionName ) ;
}"
For example 1 here is the data flow graph and score:
ref dfg:
[('db', 2, 'comesFrom', [], []), ('collectionName', 10, 'comesFrom', [], []), ('db', 14, 'comesFrom', ['db'], [2]), ('collectionName', 18, 'comesFrom', ['collectionName'], [10]), ('collectionName', 29, 'comesFrom', ['collectionName'], [10])]
cand dfg:
[('collectionName', 3, 'comesFrom', [], []), ('db', 11, 'comesFrom', [], []), ('collectionName', 19, 'comesFrom', ['collectionName'], [3]), ('db', 25, 'comesFrom', ['db'], [11]), ('collectionName', 29, 'comesFrom', ['collectionName'], [3]), ('collectionName', 40, 'comesFrom', ['collectionName'], [3])]
Normalized ref dfg:
[('var_0', 'comesFrom', []), ('var_1', 'comesFrom', []), ('var_0', 'comesFrom', ['var_0']), ('var_1', 'comesFrom', ['var_1']), ('var_1', 'comesFrom', ['var_1'])]
Normalized cand dfg:
[('var_0', 'comesFrom', []), ('var_1', 'comesFrom', []), ('var_0', 'comesFrom', ['var_0']), ('var_1', 'comesFrom', ['var_1']), ('var_0', 'comesFrom', ['var_0']), ('var_0', 'comesFrom', ['var_0'])]
0.709 | 0.973 | 0.875 | 0.800 > 0.839
83.91522695531542
For example 2, here is the data flow graph and score
ref dfg:
[('db', 2, 'comesFrom', [], []), ('collectionName', 10, 'comesFrom', [], []), ('db', 14, 'comesFrom', ['db'], [2]), ('collectionName', 18, 'comesFrom', ['collectionName'], [10]), ('collectionName', 29, 'comesFrom', ['collectionName'], [10])]
cand dfg:
[('db', 4, 'comesFrom', [], []), ('db', 13, 'comesFrom', ['db'], [4]), ('collectionName', 21, 'comesFrom', [], []), ('db', 27, 'comesFrom', ['db'], [4]), ('collectionName', 31, 'comesFrom', ['collectionName'], [21]), ('collectionName', 42, 'comesFrom', ['collectionName'], [21])]
Normalized ref dfg:
[('var_0', 'comesFrom', []), ('var_1', 'comesFrom', []), ('var_0', 'comesFrom', ['var_0']), ('var_1', 'comesFrom', ['var_1']), ('var_1', 'comesFrom', ['var_1'])]
Normalized cand dfg:
[('var_0', 'comesFrom', []), ('var_0', 'comesFrom', ['var_0']), ('var_1', 'comesFrom', []), ('var_0', 'comesFrom', ['var_0']), ('var_1', 'comesFrom', ['var_1']), ('var_1', 'comesFrom', ['var_1'])]
0.675 | 0.973 | 0.875 | 1.000 > 0.881
88.08105911288919
You can see data flow match for ex1 is 0.8 and 1.0 for ex2. Based on the algorithm, they should have the same score 1.0 because they both cover all the data flows in reference. The difference is because of the order of variables after normalization, i.e. var_0 in ex1 is var_1 in ex2. Is it an unexpected behavior?
Thanks!
Please take a look at the following example:
For example 1 here is the data flow graph and score:
For example 2, here is the data flow graph and score
You can see data flow match for ex1 is 0.8 and 1.0 for ex2. Based on the algorithm, they should have the same score 1.0 because they both cover all the data flows in reference. The difference is because of the order of variables after normalization, i.e. var_0 in ex1 is var_1 in ex2. Is it an unexpected behavior?
Thanks!