...  ...  @@ 237,7 +237,7 @@ Here, U and V are [unitary matrix]( https://en.wikipedia.org/wiki/Unitary_matrix 


We need to think about the last rotation: U. We need to rotate it back via $`U^*`$. How can we do that? We know that it is organized according to the variances in a hierarchical way. So I can find out on X in which direction the largest variances, i.e. the new coordinates. From the coordinates, we can find how much it was rotated ($`θ`$). The next question is; can I estimate how it may be stretched in the second step ($`Σ`$). How was $`Σ`$ operates in the first place? It was done according to singular values, variances. Since we can calculate the variances over the data, we can stretch it back ($`Σ{1}`$) via these variances (i.e. moment). In the third step, we should rotate back ($`V`$) . Herein, we use another moment, kurtosis. Since we do not know how much to rotate, we approximate. We will rotate in “a way that minimizes the kurtosis”. In short, we need to find the $`V`$ minimizing the kurtosis.






<div align="center">



<img src="uploads/1b71bb1a15e8c1a2e1849e8ea14a9b55/ica_3.png" width="600">



<img src="uploads/5c79fabd20013efc8b565f2e497067a7/ica_3.png" width="600">



</div>






We are ready to explore the math behind the curtains. The first step was to figure out the last rotation step with U. Herein, we are after the angle $`θ`$. If you look at the 2D example above, it is easy to see that after these transformations, data is to be oriented with respect to variance (this was the objective in PCA, maximize the variance). In other words, we are looking for the angle that gives the maximum variance. So, we need to formulate how the variance changes as we look at different $`θ`$ values. In the above 2D example:

...  ...  @@ 282,11 +282,15 @@ giving $`\Sigma^{1}`$ as: 


\end{bmatrix}



```






At this point, we have inverted the rotation and scaling in the principal component directions. and



At this point, we have inverted the rotation and scaling in the principal component directions. We only used the second moment, variance to decorrelate. We still need to satisfy the higher moment to ensure separable probability distribution. This will be the next step.






undoes the principal component direction



of rotation and its associated scaling. However, this process has only decorrelated the images, and



a separable probability distribution has not yet been produced



Previously we mentioned that we will use kurtosis for that purpose but did not explain why. WIth SVD, we assumed that E is zero (first moment). We used second moment, variance above to decorrelate. Next option is the third moment, skewness. Nonetheless, we cannot say anything about the asymmetry in the probability distributions so we need to skip it for a general solution. The next moment (fourth order) is the kurtosis and this is what we will minimize in the objective function. Since we are trying to approximate, we say "fourth order is accurate enough for me". [Kurtosis](https://en.wikipedia.org/wiki/Kurtosis) is given by:






```math



K(\phi)= \sum_{n}^{N} {[\overline{x}_1(n) \overline{x}_2(n)]\begin{bmatrix} cos(phi) \\ sin(phi) \end{bmatrix}}^4



```






where $`\phi`$ is the rotation applied with U.






...



to be contd

...  ...  