Skip to content

Resolving Functional Dependency (FD) in XLearner #10

@agiannoul

Description

@agiannoul

By default, in XLearner (XLearner.py), the parameter resolve_fd in the learn function is set to True.

As a result, the resolve_fd method is called. From my understanding, if the user does not provide any functional dependencies (FDs), resolve_fd will create FD edges. These edges are created between two variables X and Y if, for any $ x_i, x_j \in X $, the condition $y_i = y_j$ holds only when $ x_i = x_j $.

However, this behavior is not explained in the paper (unless I missed something).

Questions

  1. Is there a specific reason behind this logic?
  2. Should this be kept as the default behavior?
  3. Does this apply only to categorical values?

Code from resolve_fd at XLearner.py file under XDA/src

        if fd_edges is not None: fd_list = set(fd_edges)
        else:
            for cmb in permutations(cols, 2):
                col1 = cmb[0]
                col2 = cmb[1]
                if (col2, col1) in fd_list: continue
                mapper = {}
                has_dup = False
                if count[col2] == 1 or count[col2] > count[col1]:
                    continue
                for index, row in self.df.iterrows():
                    key = row[col1]
                    val = row[col2]
                    if key in mapper and mapper[key] != val:
                        has_dup = True
                        break
                    if key not in mapper:
                        mapper[key] = val
                if not has_dup:
                    fd_list.add(cmb)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions