...  ...  @@ 155,9 +155,13 @@ s_2 = \alpha_{21}x_1 + \alpha_{22}x_2 


S = WX



```






All we need to do is finding the values for $`\alpha_{ij}`$ in W. Note that we need to find W when A is unknown (we cannot simply use the inverse A). What we know though is, W defines the vectors in the mixture space and each vector (e.g. $`[\alpha_{11},\alpha_{12}`$]) basically extracts one source signal (here it is $`s_{1}`$). If you look at the above sketch of ICA, we see that these vectors must be orthogonal to the samples associated with all sources except the one it describes. So, we need to find W such that each vector in W is orthogonal to all sources but one. Okay, now we are getting closer to define an optimization problem.



All we need to do is finding the values for $`\alpha_{ij}`$ in W. Note that we need to find W when A is unknown (we cannot simply use the inverse A). What we know though is, W defines the vectors in the mixture space and each vector (e.g. $`[\alpha_{11},\alpha_{12}]`$) basically extracts one source signal (here it is $`s_{1}`$). If you look at the above sketch of ICA, we see that these vectors must be orthogonal to the samples associated with all sources except the one it describes. So, we need to find W such that each vector in W is orthogonal to all sources but one. Okay, now we are getting closer to define an optimization problem.






We also said that we are after the independent signals. By saying so, we assume that the sources do reflect this property, better than the merged signals at least. With this constraint, we can say, "I will find such a W that "the independency" is maximized in the extracted signals. In the simplest approach (brute force), we can take a vector $`w_{1}`$ ($`[\alpha_{11},\alpha_{12}]`$) and try combinations by rotating it around the origin. For each $`w_{1}`$ alternative, we can find the corresponding $`s_{1}`$ and choose the one with maximum independency. A smarter move would be using a gradient based search algorithm but this is the core idea. At this point, after our discussions on regression, classification and clustering, you should ask "how can I measure this statistical independence?". The short answer is the [moments](https://en.wikipedia.org/wiki/Moment_(mathematics)) of [probability density functions](https://en.wikipedia.org/wiki/Probability_density_function): the first moment is the expected value, the second central moment is the variance, the third standardized moment is the [skewness](https://en.wikipedia.org/wiki/Skewness), and the fourth standardized moment is the [kurtosis](https://en.wikipedia.org/wiki/Kurtosis).















We also said that we are after the independent signals. By saying so, we assume that the sources do reflect this property, better than the merged signals at least. With this constraint, we can say, "I will find such a W that "the independency" is maximized in the extracted signals.











...  ...  @@ 174,6 +178,10 @@ Interesting material on ICA: 


* [Independent component analysis of ocular artifacts](https://hssopus.ub.rub.de/opus4/frontdoor/deliver/index/docId/2223/file/diss.pdf)









* Lecture on ICA for those who are curious, about 2.5 hours in total: [Part I](https://www.youtube.com/watch?v=_e4SN4TWlgY), [Part II](https://www.youtube.com/watch?v=olKgmOuAvrc),



[Part III](https://www.youtube.com/watch?v=Ad6kMhJbqoY)









## Additional resources







...  ...  