josh.meanings

This is a program for computing k-means in Clojure. It is built to handle workloads which are medium data, which means they involve datasets which are too large to fit in memory, but not so large that the computation cannot be persisted to disk.

Unlike most other K-means implementations we employ several techniques which lend themselves toward making this K-means implementation quite a bit faster than other implementations.

We leverage memory mapping of the datasets.
We do our distance calculations on the GPU.
We implement initialization schemes from more recent research.

Note

GPU acceleration is available for several distance functions including EMD, Euclidean, Manhattan, Chebyshev and Euclidean squared.

Installation

If you use the Clojure CLI, add the library to your deps.edn:

org.clojars.joshua/josh.meanings {:mvn/version "3.0.14"}

Getting Started

(require '[josh.meanings.kmeans :refer [k-means k-means-seq]]
         '[josh.meanings.protocols.savable :refer [save-model]]
         '[josh.meanings.protocols.classifier :refer [classify load-centroids load-assignments]])


;; Get a dataset.  You can pass in your dataset under a variety of formats. 
;; See the docs for more details on supported formats.
(def dataset "your_dataset.csv")  

;; Choose the number of clusters you want
(def k 10)


;; To get a single cluster model
(def model (k-means dataset k))

;; Alternatively you can run k means multiple times.  This is recommended because 
;; some k means initializations don't give guarantees on the quality of a solution 
;; and so you can get better results by running k means multiple times and taking 
;; the best result.
(def model (apply min-key :cost (take k-tries (k-means-seq cluster-dataset-name k))))

;; Once you have a model you can save it.
(def model-path (.save-model model))

;; Later you can load that model
(def model (load-model model-path))

;; To load the assignments just
(.load-assignments model)

;; To classify a new entry
(.classify model [1 2 3])

;; To view the centroids
(.load-centroids model)

Testing

Run the project's unit tests with:

lein test

Tests exercising the GPU code paths require an Nvidia GPU with CUDA support.

License

Distributed under the terms of the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 212 Commits
.circleci		.circleci
.clj-kondo		.clj-kondo
benchmarks/src/josh/benchmarks/meanings/initializations		benchmarks/src/josh/benchmarks/meanings/initializations
doc		doc
resources		resources
src		src
test/josh		test/josh
theory		theory
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING		CONTRIBUTING
LICENSE		LICENSE
README.md		README.md
deps.edn		deps.edn
project.clj		project.clj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

josh.meanings

Installation

Getting Started

Testing

License

About

Uh oh!

Releases

Packages

Languages

License

jColeChanged/josh.meanings

Folders and files

Latest commit

History

Repository files navigation

josh.meanings

Installation

Getting Started

Testing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages