You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Sep 28, 2021. It is now read-only.
par(mfrow= c(1, 2))#The whole point of the line is to place to gistograms side by side
hist(fractionalshortening, freq=TRUE, right=F, col='seagreen', main='Fractional shortening')
hist(wallmotionscore, freq=TRUE, right=F, col='slateblue1', main='Wallmotion score')
Calculate Mean Values, Despersions and Standart Deviations
Welch Two Sample t-test
data: fractionalshortening and wallmotionscore
t = -32.053, df = 127.12, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-15.09936 -13.34342
sample estimates:
mean of x mean of y
0.2167339 14.4381250
Mean value Standart deviation Mode Median
Picture-1 0.5100096 0.2356856 0.9007059 0.4631373
Picture-2 0.5879227 0.2512607 0.9976863 0.5405882
Pearson Test
Так как статистика Пирсона измеряет разницу между эмпирическим и теоретическим распределениями, то чем больше ее наблюдаемое значение Kнабл, тем сильнее довод против основной гипотезы.
we can get the fd values for two data frames. Due to the same size of pictures we get fd = 260.Using Excel function ХИ2ОБРwe can calculate Pc. It equals to205.02.
D E J Name
Min. : 0.03765 Min. : 0.01439 Min. :35.01 Baxan :749
1st Qu.:44.71768 1st Qu.:10.82736 1st Qu.:56.32 Dum :749
Median :62.92686 Median :18.06381 Median :64.80 Eugene:749
Mean :56.61332 Mean :19.20471 Mean :63.92
3rd Qu.:71.65353 3rd Qu.:24.29174 3rd Qu.:72.14
Max. :99.98825 Max. :49.99397 Max. :86.99
Оптимальное число кластеров соответствует точке перегиба графика. В данном случае лучше разделить записи на 2 кластера. Теперь выполним непосредственно кластерный анализ методом k-средних для 2 кластеров с помощью функции kmeans:
Total number of correctly classified instances are: 743 + 749 + 599 = 2091. Total number of incorrectly classified instances are: 6 + 150 = 156. Accuracy = 2091/(2091 + 156) = 0.93 i.e our model has achieved 93% accuracy.
Чтобы убедиться в результатах анализа определим средние значениях всех анализируемых параметров в каждом из кластеров:
Из полученной таблицы различие между записями в разных кластерах уже видно. Теперь присвоим номера кластеров каждой из записей исходного набора данных:
We'll use the silhouette coefficient (silhouette width) to evaluate the goodness of our clustering.
The silhouette coefficient is calculated as follows:
For each observation i,it calculates the average dissimilarity betweeniand all the other points within the same cluster whichibelongs. Let’s call this average dissimilarityDi. Now we do the same dissimilarity calculation between iand all the other clusters and get the lowest value among them. That is, we find the dissimilarity betweeniand the cluster that is closest toiright after its own cluster. Let’s call that valueCi. The silhouette (Si) width is the difference between CiandDi divided by the greatest of those two values (max(Di, Ci)). Si = (Ci — Di) / max(Di, Ci)
So, the interpretation of the silhouette width is the following:
Si > 0 means that the observation is well clustered. The closest it is to 1, the best it is clustered.
Si < 0 means that the observation was placed in the wrong cluster.
Si = 0 means that the observation is between two clusters.
The silhouette plot above gives us evidence that our clustering using four groups is good because there’s no negative silhouette width and most of the values are bigger than 0.5.
K J G B
Min. : 9.002 Min. :35.01 Min. :45.01 Min. :30.03
1st Qu.:10.487 1st Qu.:44.90 1st Qu.:55.73 1st Qu.:40.23
Median :11.610 Median :55.78 Median :58.37 Median :43.18
Mean :11.490 Mean :57.96 Mean :62.34 Mean :47.45
3rd Qu.:12.504 3rd Qu.:70.02 3rd Qu.:64.07 3rd Qu.:53.21
Max. :14.000 Max. :86.99 Max. :99.93 Max. :79.98
A Name
Min. :10.06 Baxan :749
1st Qu.:40.69 Choluteco:749
Median :57.08 Eugene :749
Mean :56.91
3rd Qu.:77.44
Max. :84.99
set.seed(100)
clusterdata<- getCodes(som_model)
wss<- (nrow(clusterdata) -1) * sum(apply(clusterdata, 2, var))
for (iin2:35) { #i must be less than 6*6 the grid size defined at the beginingwss[i] <- sum(kmeans(clusterdata, centers=i)$withinss)
}
par(mar= c(5.1, 4.1, 4.1, 2.1))
plot(wss, type='l',
xlab='Number of Clusters',
ylab='Within groups sum of squares',
main='Within cluster sum of squares (WCSS)')
abline(v=3, col='red')
set.seed(100)
fit_kmeans<- kmeans(clusterdata[,1:5], 3)
cl_assignment_k<-fit_kmeans$cluster[som_model$unit.classif]
#The above is to assign units to clusters based on their class-id (code-id) in the SOM model.dataframe$clustersKm<-cl_assignment_k#back to original data.
head(dataframe)
dataframe$testkm<-1#make everything "wrong" (i.e. 1).#Cluster assignments can only be right in one whay, but wrong in k (no of clusters-1) ways.#Better look for the "rights", rather than the "wrongs".dataframe$testkm[dataframe$clustersKm==2&dataframe$trueN==1] <-0#Make 2 right, i.e. 0.dataframe$testkm[dataframe$clustersKm==3&dataframe$trueN==2] <-0#3 was right for true 2.dataframe$testkm[dataframe$clustersKm==1&dataframe$trueN==3] <-0#1 was righht for true 3.testKM<- sum(dataframe$testkm)
as.matrix(list(accurancy=1- (testKM/36)))
A data.frame: 6 × 8
K
J
G
B
A
Name
clustersKm
trueN
<dbl>
<dbl>
<dbl>
<dbl>
<dbl>
<fct>
<int>
<dbl>
4
13.156304
64.50461
85.11172
78.42686
67.07043
Eugene
1
3
5
12.098432
74.68590
83.20605
48.36484
72.76105
Eugene
1
3
6
12.670125
73.80838
54.31572
78.88656
59.54595
Eugene
1
3
7
9.064479
49.24287
55.86420
32.18144
43.06241
Baxan
2
1
8
11.276959
85.93160
71.59996
63.68944
36.47152
Eugene
1
3
9
9.249312
41.79568
48.30359
39.14198
13.87603
Baxan
2
1
A matrix: 1 × 1
accurancy
0.9444444
Clustering
ACCCORING TO THE GRAPHICS PRINTED ABOVE THE OPTIMAL VALUE OF k EQUALS TO 3