...  ...  @@ 287,23 +287,28 @@ At this point, we have inverted the rotation and scaling in the principal compon 


Previously we mentioned that we will use kurtosis for that purpose but did not explain why. WIth SVD, we assumed that E is zero (first moment). We used second moment, variance above to decorrelate. Next option is the third moment, skewness. Nonetheless, we cannot say anything about the asymmetry in the probability distributions so we need to skip it for a general solution. The next moment (fourth order) is the kurtosis and this is what we will minimize in the objective function. Since we are trying to approximate, we say "fourth order is accurate enough for me". [Kurtosis](https://en.wikipedia.org/wiki/Kurtosis) is given by:






```math



K(\phi)= \sum_{n}^{N} {[x^{'}_1(n) x^{'}_2(n)]\begin{bmatrix} cos(phi) \\ sin(phi) \end{bmatrix}}^4



K(\phi)= \sum_{n}^{N} {[x^{'}_1(n) x^{'}_2(n)]\begin{bmatrix} cos(\phi) \\ sin(\phi) \end{bmatrix}}^4



```






where $`\phi`$ is the rotation applied with U.



where $`\phi`$ is the rotation applied with U. In order to perform the minimization, we first normalize the K:






...



to be contd



...






Implementation of this strategy can be seen in [project pursuit approach](https://en.wikipedia.org/wiki/Projection_pursuit). Another way is to use the concept of entropy as a measure. We know that the signals that have maximum joint entropy are mutually independent and this is what will go into the optimizer. The approach is called [Infomax](https://en.wikipedia.org/wiki/Infomax). You can find the details of the proposed method [here](http://www.inf.fuberlin.de/lehre/WS05/Mustererkennung/infomax/infomax.pdf).



```math



K(\phi)= \sum_{n}^{N} {1/{x^{'2}_1(n)+x^{'2}_2(n)}[x^{'}_1(n) x^{'}_2(n)]\begin{bmatrix} cos(\phi) \\ sin(\phi) \end{bmatrix}}^4



```






Then, we take the derivative with respect to $`\phi`$ and set to zero. Once $`\phi`$ is found, we can express the V for back rotating into (approximately) statistically independent signals (the yellow square S in the sketch):






```math



V = \begin{bmatrix} cos(\phi) & sin(\phi) \\ sin(\phi) & cos(\phi)\end{bmatrix}



```






Implementation of this strategy can also be seen in [project pursuit approach](https://en.wikipedia.org/wiki/Projection_pursuit). Another way is to use the concept of entropy as a measure. We know that the signals that have maximum joint entropy are mutually independent and this is what will go into the optimizer. The approach is called [Infomax](https://en.wikipedia.org/wiki/Infomax). You can find the details of the proposed method [here](http://www.inf.fuberlin.de/lehre/WS05/Mustererkennung/infomax/infomax.pdf).






It is highly recommended to check this group for [more on ICA](http://research.ics.aalto.fi/ica/).













Note: In ICA, we assume that we do not have Gaussian distributions in the variables.






Interesting material on ICA:






* [Separating reflections from images by use of independent component analysis](https://www.osapublishing.org/josaa/fulltext.cfm?uri=josaa1692136&id=1263)

...  ...  