em0787 · bfcf468c
--- a/DDE-1/Clustering.md
+++ b/DDE-1/Clustering.md
@@ -53,6 +53,8 @@ We should be aware that the similarity measure is distance-based, Euclidean to b

 ### Gaussian Mixtures

+_Note: Visit [statistics](DDE-1/Statistics) page if you are not familiar with the concepts of probability or likelihood. _
+
 We have just discussed that in k-means we see everything in either black or white (similar to hard classification). This is reflected in the r matrix; all but one element is zero. From a probabilistic view, it means that we assign 100% probability to one cluster and all other probabilities are set to zero, which too much idealization. Alternatively, we can assign and learn probabilistic representation of the matrix r.  This is, in short, what we do with Gaussian mixtures (GM). If we compare it with k-means, the other important effect of adding probabilities is the way model builds the decision boundaries. In 2D, k-means learn to plot “circles” where the centre is the stereotype. In GM, we do not necessarily follow a circular boundary. 

 How does the model work? As the name implies, we are defining cluster boundaries by combining multiple Gaussian distributions. Therefore, we represent the stereotypes with two values: mean and the variance, which helps us to define the uncertainties in our clustering.