em0787 · 3e5dfaf5
--- a/DDE-1/Clustering.md
+++ b/DDE-1/Clustering.md
@@ -6,6 +6,14 @@ _C. Ates_

 ...

+## Why clustering is useful?
+
+Everything starts with observations. We collect information on M number features through N number of instances, say measurements. So far, we have known why exactly we do such measurements; we have had a goal. In our example, we collect data by varying 5 different features and collect data on the noise levels. Then, we asked questions such as “Considering the measurements I collected so far, what could be the noise if I have double the velocity?”. This was rather straight-forward analysis. In unsupervised learning, we do focus on the observations and try to see how they are related, even before we look into the relationships between these features and the goal (i.e., label). We basically take a step back and investigate how theese measured features are related to one another in my system in the first place. Herein, clustering refers to grouping of the observations based on how similar / different they are. What can be the use cases for such an effort?
+
+Note that we are interested in system analysis for which we have lots of data / observations. These observations can be collected for a well-defined goal as well (supervised tasks). We have seen that training of supervised models is affected by the quality and the nature of the data. We experienced that if the model is too complex, it may not generalize well in real life application; will be depending on the trained data space too much, particularly if we do not have millions of labelled data for training. In other words, if we want a strong model capable of solving complex nonlinear problems, we need lots of data to match the model complexity. So we need to generate labelled data first, by human supervision. Alternatively, we can use simpler models or a set of simple models (ensemble learning). In this case, the models may still suffer from generalization error due to outliers in the data. We can and did apply regularization strategies. This time, however, the model becomes less accurate (we sacrifice from training accuracy to generalize better). Otherwise, outliers can dominate, or our simple models may capture some trends in the data but not all at the same time – due to its mathematical simplicity. It is like solving nonlinear PDEs via linearization; model complexity is related number of elements under the infinite summation signs (remember the Eigen values and Eigen vectors?). So, what we can do about it? 
+
+One answer here might be clustering. By clustering (grouping) our observations, we can address both problems. It can be used to group similar instances (e.g. images) before labelling, which we can use extrapolate the labels to similar unlabelled instances. Alternatively, we can group our observations to detect outliers, and eliminate them in the training of sensitive, simple models. We can create expert simple models by training the individuals on certain clusters, so that they can learn the patterns in a sub-dataspace (instead of doing regression directly, we cluster and then calculate). Another important use case is the anomaly / novelty detection. You can create a baseline / regular behaviour via clustering algorithms and detect unusual cases / instances. In the lecture, we will use clustering for detecting defective products in a production line. You can imagine to use the method for fraud detection (e.g. shopping tendencies with credit cards). Another good example is the fitness monitoring via wearable devices such as smart watches. You can create a baseline behaviour for the health status with features such as hearth rate, sleep patterns, temperature and detect the onset of going into a different state, getting sick. 
+

 ## Clustering