From bec4c43f5d2cb39a8471f5b37b0398e9da9441d0 Mon Sep 17 00:00:00 2001 From: Pierre Alexandre Tremblay Date: Fri, 1 Nov 2024 18:46:23 +0100 Subject: [PATCH 1/2] definition of the new initialize parameter --- doc/SKMeans.rst | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/doc/SKMeans.rst b/doc/SKMeans.rst index 5871d31..ae47e37 100644 --- a/doc/SKMeans.rst +++ b/doc/SKMeans.rst @@ -24,6 +24,10 @@ The maximum number of iterations the algorithm will use whilst fitting. +:control initialize: + + The method used to initialize the clustering process. 0 is the default random assignment. 1 is using a small sample from the dataset as seed cluster centres. + :message fit: :arg dataSet: A :fluid-obj:`DataSet` of data points. From 800d9a6e152ed5416ed53f5d8f6162da0fd2767a Mon Sep 17 00:00:00 2001 From: Pierre Alexandre Tremblay Date: Thu, 29 May 2025 18:57:26 +0200 Subject: [PATCH 2/2] doc with the 3 initMethods --- doc/KMeans.rst | 15 +++++++++++++++ doc/SKMeans.rst | 15 +++++++++++++-- 2 files changed, 28 insertions(+), 2 deletions(-) diff --git a/doc/KMeans.rst b/doc/KMeans.rst index 357f25f..0a7e9c6 100644 --- a/doc/KMeans.rst +++ b/doc/KMeans.rst @@ -19,6 +19,21 @@ The maximum number of iterations the algorithm will use whilst fitting. +:control initMethod: + + The method used to initialize the clustering process. + + :enum: + + :0: + random partition: each input point is randomly assigned to a cluster. + + :1: + random means: the initial means are sampled at random from the input points, which are then assigned to their nearest mean. + + :2: + sampling: the initial means are sampled from the input points, weighted by an approximation of the input data distribution. + :message fit: :arg dataSet: A :fluid-obj:`DataSet` of data points. diff --git a/doc/SKMeans.rst b/doc/SKMeans.rst index ae47e37..47d1576 100644 --- a/doc/SKMeans.rst +++ b/doc/SKMeans.rst @@ -24,10 +24,21 @@ The maximum number of iterations the algorithm will use whilst fitting. -:control initialize: +:control initMethod: - The method used to initialize the clustering process. 0 is the default random assignment. 1 is using a small sample from the dataset as seed cluster centres. + The method used to initialize the clustering process. + + :enum: + + :0: + random partition: each input point is randomly assigned to a cluster. + :1: + random means: the initial means are sampled at random from the input points, which are then assigned to their nearest mean. + + :2: + sampling: the initial means are sampled from the input points, weighted by an approximation of the input data distribution. + :message fit: :arg dataSet: A :fluid-obj:`DataSet` of data points.