Algorithms: KMeans#

%load ../../rapaio-bootstrap.ipynb
Adding dependency io.github.padreati:rapaio-lib:7.0.1
Solving dependencies
Resolved artifacts count: 1
Add to classpath: /home/ati/work/rapaio-jupyter-kernel/target/mima_cache/io/github/padreati/rapaio-lib/7.0.1/rapaio-lib-7.0.1.jar

// old faithful
Frame old = Datasets.loadOldFaithful();
old.printSummary()
Frame Summary
=============
* rowCount: 272
* complete: 272/272
* varCount: 2
* varNames: 

0. eruptions : dbl | 
1.   waiting : int | 

* summary: 
 eruptions [dbl]       waiting [int]          Mean : 3.4877831    Mean : 70.8970588 
    Min. : 1.6000000    Min. : 43.0000000  2nd Qu. : 4.4542500 2nd Qu. : 82.0000000 
 1st Qu. : 2.1627500 1st Qu. : 58.0000000     Max. : 5.1000000    Max. : 96.0000000 
  Median : 4.0000000  Median : 76.0000000                                           
WS.image(points(old.rvar("eruptions"), old.rvar("waiting"), pch.circleFull(), fill(3)))
../_images/8ae9b0056ce4e9b5368f49b8065c8ad32a7deb9e0edc704086d94d50b86465e4.png
KMCluster model1 = KMCluster.newKMeans().k.set(2);
var res1 = model1.fit(old).predict(old);
res1.printSummary()
Overall errors: 
> count: 272
> mean: 32.7270909
> var: 1,622.7494621
> sd: 40.2833646
> inertia/error:8,901.7687209
> iterations:6

Per cluster: 
    ID count    mean         var      var/total     sd     
[0]  2   172 31.6604119 1,763.2348144 1.0865724 41.9908897 
[1]  1   100 34.5617787 1,391.1074822 0.8572534 37.2975533 
WS.image(
    points(old.rvar("eruptions"), old.rvar("waiting"), pch.circleFull(), fill(res1.assignment().darray_().add(3.0).dv()), color(0))
        .points(model1.getCentroids().rvar("eruptions"), model1.getCentroids().rvar("waiting"), pch.circleFull(), sz(7), fill(3, 4)))
../_images/91748d828ae1f67d37d3deba52fce2bace275b02aa6150e5bc29db6c30afd357.png