...  ...  @@ 69,7 +69,7 @@ S = X^T X 


S = 1/N \sum_{n=1}^{N} (x_n  \overline{x})(x_n\overline{x})^T



```






In the third step, we maximize the variance of the projected data onto new coordinate system. This step consists of several smaller steps.



In the third step, we maximize the variance of the data projected onto the new coordinate system. This step consists of several smaller steps.






Let's think about what we just said. We are after a new coordinate system and we will project our data onto this new coordinates (think about the conversion from the Cartesian to spherical coordinates). In this new space, we will again have its own unit vectors indicating the directions of the coordinates (like x,y,z => r, θ, Φ). Let call the direction vector as $`u_{1}`$ . Note that this vector will have M dimension, like our data set.




...  ...  