em0787 · 6c2c8f87
--- a/DDE-1/Clustering.md
+++ b/DDE-1/Clustering.md
@@ -35,11 +35,13 @@ Since the observations (n) are independent, we can find iteratively the best r f

 <img src="uploads/4ba45b79f70386fef8acbd14a7b30527/cluster_2.png"  width="600">

-In the descriptive learning process, we perform above two steps several time, until we reach a convergence criterion for J, typically user defined:
+In the descriptive learning process, we perform above two steps several times, until we reach a convergence criterion for J, typically user defined:

 <img src="uploads/95dab4e8fcb08691b5f97a0012521dc1/cluster_3.png"  width="300">

-Another input from the user is the number of clusters to find. This is found iteratively as well, if we do not have a prior knowledge. We simple increase the number of clusters to be found gradually, which also gives a curve similar to iteration plot. Here the user decides (manually or automatically) the number of clusters in the dataset. We should be aware that the similarity measure is distance-based, Euclidean to be precise. It has three practical limitations to remember: (i) it is not robust to outliers, (ii) it starts to become useless if features have categorical data (either 0 or 1 for example), (iii) everything is categorized as “either white or black”, meaning that there is no measure of closeness / confidence levels in model predictions (instance at the stereotype centre and at the boundary is considered to be the same). 
+Another input from the user is the number of clusters to find. This is found iteratively as well, if we do not have a prior knowledge. We simple increase the number of clusters to be found gradually, which also gives a curve similar to iteration plot. Here the user decides (manually or automatically) the number of clusters in the dataset. We should remember that in supervised learning tasks, we used cross validation strategy to decide model complexity, but this is not possible for unsupervised learning as we do not have any labels. For a non-probabilistic model such as k-means, we simply use the knee method as a guide.  
+
+We should be aware that the similarity measure is distance-based, Euclidean to be precise. It has three practical limitations we should remember: (i) it is not robust to outliers, (ii) it starts to become useless if features have categorical data (either 0 or 1 for example), (iii) everything is categorized as “either white or black”, meaning that there is no measure of closeness / confidence levels in model predictions (instance at the stereotype centre and at the boundary is considered to be the same). 

 ### Gaussian Mixtures