Skip to content

Clustering: Optimal k #1

@bdiptesh

Description

@bdiptesh

Is your feature request related to a problem? Please describe.
A clustering module to cluster any given data (categorical/continuos/ordinal) and returns optimal clustering solution

Describe the solution you'd like
Compute optimal clustering solution using gap-statistic.

Methods:

  1. First SE
  2. Maximum Gap

Expected input

df: pandas.DataFrame
x_var: List[str]
max_cluster: int
method: Union[str]

Expected API

opt_k

Acceptance criteria

Integration tests:

  • Categorical variables only
  • Continuos variables only
  • Ordinal variables only
  • Combination of categorical/ordinal/continuos

Tasks

  • Define integration tests
  • First pass implementation of Gap statistic
  • Modular implementation

Metadata

Metadata

Labels

featureNew feature or requeststatsStatisticstestsIntegration/Unit tests

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions