Randomness in CodeBLEU computation

https://github.com/microsoft/CodeXGLUE/blob/6744a7f6ab658a15382f842df6b9c5f148423a49/Code-Code/code-to-code-trans/evaluator/CodeBLEU/dataflow_match.py#L100

The result of `list(set())` is random under some circumstances. It can be easily reproduced by running `python -c 'print(list(set(["fa", "dsa", "dsa", "w"])))'`. 

---

In some cases, the code snippet above can result in the difference of DFG returned by `get_data_flow()` and cause varying CodeBLUE scores (specifically `dataflow_match_score`):
https://github.com/microsoft/CodeXGLUE/blob/6744a7f6ab658a15382f842df6b9c5f148423a49/Code-Code/code-to-code-trans/evaluator/CodeBLEU/calc_code_bleu.py#L64-L67

---

I have compared these two functions and found that in `GraphCodeBERT` there is no "merge nodes" action.

https://github.com/microsoft/CodeBERT/blob/ac04c77ca7cda9dc757dc8b4360e358731c8708e/GraphCodeBERT/codesearch/run.py#L68-L104

https://github.com/microsoft/CodeXGLUE/blob/6744a7f6ab658a15382f842df6b9c5f148423a49/Code-Code/code-to-code-trans/evaluator/CodeBLEU/dataflow_match.py#L64-L105

---

My reference and candidate is:
```
  candidate = \
  '''
  throws IOException {
      int read = super.read(b, off, len);
      if (read > 0) {
          bytesRead.incrementAndGet();
      }
      return read;
  }
  '''
  reference = \
  '''
  throws IOException {
      // Obey InputStream contract.
      checkPositionIndexes(off, off + len, b.length);
      if (len == 0) {
      return 0;
      }

      // The rest of this method implements the process described by the CharsetEncoder javadoc.
      int totalBytesRead = 0;
      boolean doneEncoding = endOfInput;

      DRAINING:
      while (true) {
      // We stay in draining mode until there are no bytes left in the output buffer. Then we go
      // back to encoding/flushing.
      if (draining) {
          to
  '''
```

I was wondering if #104 ran into the same problem.

Thank you for your replying! @JiyangZhang @Imagist-Shuo @celbree 
```[tasklist]
### Tasks
```


	def get_data_flow(code, parser):
	try:
	tree = parser[0].parse(bytes(code,'utf8'))
	root_node = tree.root_node
	tokens_index=tree_to_token_index(root_node)
	code=code.split('\n')
	code_tokens=[index_to_code_token(x,code) for x in tokens_index]
	index_to_code={}
	for idx,(index,code) in enumerate(zip(tokens_index,code_tokens)):
	index_to_code[index]=(idx,code)
	try:
	DFG,_=parser[1](root_node,index_to_code,{})
	except:
	DFG=[]
	DFG=sorted(DFG,key=lambda x:x[1])
	indexs=set()
	for d in DFG:
	if len(d[-1])!=0:
	indexs.add(d[1])
	for x in d[-1]:
	indexs.add(x)
	new_DFG=[]
	for d in DFG:
	if d[1] in indexs:
	new_DFG.append(d)
	codes=code_tokens
	dfg=new_DFG
	except:
	codes=code.split()
	dfg=[]
	#merge nodes
	dic={}
	for d in dfg:
	if d[1] not in dic:
	dic[d[1]]=d
	else:
	dic[d[1]]=(d[0],d[1],d[2],list(set(dic[d[1]][3]+d[3])),list(set(dic[d[1]][4]+d[4])))
	DFG=[]
	for d in dic:
	DFG.append(dic[d])
	dfg=DFG
	return dfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Randomness in CodeBLEU computation #152

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	dataflow_match_score = dataflow_match.corpus_dataflow_match(references, hypothesis, args.lang)

	print('ngram match: {0}, weighted ngram match: {1}, syntax_match: {2}, dataflow_match: {3}'.\
	format(ngram_match_score, weighted_ngram_match_score, syntax_match_score, dataflow_match_score))

Randomness in CodeBLEU computation #152

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions