-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
By default, in XLearner (XLearner.py), the parameter resolve_fd in the learn function is set to True.
As a result, the resolve_fd method is called. From my understanding, if the user does not provide any functional dependencies (FDs), resolve_fd will create FD edges. These edges are created between two variables X and Y if, for any $ x_i, x_j \in X $, the condition
However, this behavior is not explained in the paper (unless I missed something).
Questions
- Is there a specific reason behind this logic?
- Should this be kept as the default behavior?
- Does this apply only to categorical values?
Code from resolve_fd at XLearner.py file under XDA/src
if fd_edges is not None: fd_list = set(fd_edges)
else:
for cmb in permutations(cols, 2):
col1 = cmb[0]
col2 = cmb[1]
if (col2, col1) in fd_list: continue
mapper = {}
has_dup = False
if count[col2] == 1 or count[col2] > count[col1]:
continue
for index, row in self.df.iterrows():
key = row[col1]
val = row[col2]
if key in mapper and mapper[key] != val:
has_dup = True
break
if key not in mapper:
mapper[key] = val
if not has_dup:
fd_list.add(cmb)balaktsis, antliarokapis, apapadoi and Bilpapster
Metadata
Metadata
Assignees
Labels
No labels