...  ...  @@ 71,7 +71,7 @@ S = 1/N \sum_{n=1}^{N} (x_n  \overline{x})(x_n\overline{x})^T 





In the third step, we maximize the variance of the data projected onto the new coordinate system. This step consists of several smaller steps.






Let's think about what we just said. We are after a new coordinate system and we will project our data onto this new coordinates (think about the conversion from the Cartesian to spherical coordinates). In this new space, we will again have its own unit vectors indicating the directions of the coordinates (like x,y,z => r, θ, Φ). Let call the direction vector as $`u_{1}`$ . Note that this vector will have M dimension, like our data set.



Let's think about what we just said. We are after a new coordinate system and we will project our data onto this new coordinates (think about the conversion from the Cartesian to spherical coordinates). In this new space, we will again have its own unit vectors indicating the directions of the coordinates (like x,y,z => r, θ, Φ). Let call this direction vector as $`u_{1}`$ . Note that this vector will have M dimension, like our data set.






If we want to project a data point $`x_{n}`$ onto $`u_{1}`$, we just perform $`u_{1}^Tx_{n}`$. Remember that our objective is to maximize the variance of the projected data. So, we first need to calculate variance, for which we will need the mean of the projected data $`u_{1}^T\overline{x}`$:




...  ...  