Resolving  Functional Dependency (FD) in XLearner


By default, in **XLearner** (`XLearner.py`), the parameter `resolve_fd` in the `learn` function is set to `True`.  

As a result, the `resolve_fd` method is called. From my understanding, if the user does not provide any functional dependencies (FDs), `resolve_fd` will create FD edges. These edges are created between two variables **X** and **Y** if, for any $ x_i, x_j \in X \$, the condition $y_i = y_j$  holds only when $ x_i = x_j $.  

However, this behavior is not explained in the paper (unless I missed something).  

## Questions

1. Is there a specific reason behind this logic?  
2. Should this be kept as the default behavior?  
3. Does this apply only to categorical values? 


Code from `resolve_fd` at `XLearner.py` file under `XDA/src` 
```python
        if fd_edges is not None: fd_list = set(fd_edges)
        else:
            for cmb in permutations(cols, 2):
                col1 = cmb[0]
                col2 = cmb[1]
                if (col2, col1) in fd_list: continue
                mapper = {}
                has_dup = False
                if count[col2] == 1 or count[col2] > count[col1]:
                    continue
                for index, row in self.df.iterrows():
                    key = row[col1]
                    val = row[col2]
                    if key in mapper and mapper[key] != val:
                        has_dup = True
                        break
                    if key not in mapper:
                        mapper[key] = val
                if not has_dup:
                    fd_list.add(cmb)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolving Functional Dependency (FD) in XLearner #10

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Resolving Functional Dependency (FD) in XLearner #10

Description

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions