...  @@ 59,9 +59,27 @@ S = X^T X 
...  @@ 59,9 +59,27 @@ S = X^T X 

S = 1/N \sum_{n=1}^{N} (x_n  \overline{x})(x_n\overline{x})^T


S = 1/N \sum_{n=1}^{N} (x_n  \overline{x})(x_n\overline{x})^T


```


```






In the third step, we maximize the variance of the projected data onto new coordinate system. This step consists of multiple smaller steps.


In the third step, we maximize the variance of the projected data onto new coordinate system. This step consists of several smaller steps.








Let's think about what we just said. We are after a new coordinate system and we will project our data onto this new coordinates (think about the conversion from the Cartesian to spherical coordinates). In this new space, we will again have its own unit vectors indicating the directions of the coordinates (like x,y,z => r, θ, Φ). Let call the direction vector as $`u_{1}`$ . Note that this vector will have M dimension, like our data set.








If we want to project a data point $`x_{n}`$ onto $`u_{1}`$, we just perform $`u_{1}^Tx_{n}`$. Remember that our objective is to maximize the variance of the projected data. So, we first need to calculate variance, for which we will need the mean of the projected data $`$`u_{1}^T\mean{x}`$:








```math




\mean{x} = 1/N \sum_{n=1}^{N} x_n




```








After getting the mean of the projected data, we can calculate the variance of the projected data:








```math




1/N \sum_{n=1}^{N} (u_{1}^Tx_{n}  u_{1}^T\mean{x})^2 = u_{1}^TSu_{1}




```








We have a definition of the variance of the data projected on $`u_1`$. We are ready to maximize it. But maximization is not an easy optimization problem. If we simply try to maximize the above equation, $`u_1`$ would go to $`\inf`$.






















## Dimensionality reduction: why it is useful?








... 

... 