...  ...  @@ 36,7 +36,7 @@ Observations in the data space also tend to cluster on some special hyperplanes, 





## Dimensionality reduction methods






### Principle Component Analysis



### Principal Component Analysis






PCA is one of the most popular dimensionality reduction methods. It is a linear, orthogonal projection method where the high dimensional data is reflected onto a lower dimensional space in a way the variance in the projected data is maximized. We can again make an analogy with the shadow game. This time our objective is to find the right direction for the light so that the features of the object with high dimensions (3D) is kept as much as possible in the lower dimensional space (2D). In other words, we will perform the data projection in a way that it minimizes the information loss.




...  ...  @@ 111,7 +111,7 @@ u_{1}^TSu_{1} = u_{1}^T\lambda_1u_{1} = \lambda_1u_{1}^Tu_{1} = \lambda_1 





In other words, what is to be maximized is the eigenvalue, where $`u_{1}`$ is the eigenvector corresponding to the largest eigenvalue of S, $`\lambda_1`$.






For M principle components, we just need to calculate the eigenvectors and eigenvalues of the covariance matrix S.



For M principal components, we just need to calculate the eigenvectors and eigenvalues of the covariance matrix S.






### Independent Component Analysis




...  ...  @@ 142,7 +142,7 @@ X = AS 





Note that we do not know neither A nor S. We also want to perform this decomposition for any problem, so the constraint we will apply should be generalizable.






We have already discussed a component analysis method, PCA. At this point, you may ask what is the difference between PCA and ICA? First of all, note that what we aim here is different. In PCA, we aim maximum variance, –a weaker constraint than the independency. In PCA, what is independent from one another is the principle components, while it may carry information from more than one source dimension –and this is typically the case as we reflect multiple coordinates into principle components. It is better seen on a plot:



We have already discussed a component analysis method, PCA. At this point, you may ask what is the difference between PCA and ICA? First of all, note that what we aim here is different. In PCA, we aim maximum variance, –a weaker constraint than the independency. In PCA, what is independent from one another is the principal components, while it may carry information from more than one source dimension –and this is typically the case as we reflect multiple coordinates into principal components. It is better seen on a plot:






<div align="center">



<img src="uploads/876dc45febab4a5593bcce12383aa455/ica_1.png" width="600">

...  ...  @@ 246,7 +246,7 @@ We are ready to explore the math behind the curtains. The first step was to figu 


\sigma(θ) = \sum_{n}^{N} {[x_1(n) x_2(n)]\begin{bmatrix} cos(θ) \\ sin(θ) \end{bmatrix}}^2



```






where $`x`$ is the measured signal. Note that n is the elements in X. In the next step, we take the derivative with respect to θ and make it equal to zero. The maximum will give us the first principle component, where the minimum will give the second principle component, which are orthogonal. In practice, it does not matter whether we find the min or max with the derivative; they differ by 90 degree here. After couple of calculation steps, we can get the angle θ:



where $`x`$ is the measured signal. Note that n is the elements in X. In the next step, we take the derivative with respect to θ and make it equal to zero. The maximum will give us the first principal component, where the minimum will give the second principal component, which are orthogonal. In practice, it does not matter whether we find the min or max with the derivative; they differ by 90 degree here. After couple of calculation steps, we can get the angle θ:






```math



θ = 0.5 tan^{1}(2\sum{x_1x_2}/\sum(x_2^2x_1^2))

...  ...  @@ 261,12 +261,12 @@ cos(θ) & sin(θ) \\ 


\end{bmatrix}



```






The second action was the stretching ($`\Sigma^{1}`$, see the figure above). This step is very easy to calculate as we already found the first principle direction and its variance. Let's call it $`\sigma_1`$ this time:



The second action was the stretching ($`\Sigma^{1}`$, see the figure above). This step is very easy to calculate as we already found the first principal direction and its variance. Let's call it $`\sigma_1`$ this time:






```math



\sigma_1(θ) = \sum_{n}^{N} {[x_1(n) x_2(n)]\begin{bmatrix} cos(θ) \\ sin(θ) \end{bmatrix}}^2



```



The other principle component will be orthogonal:



The other principal component will be orthogonal:






```math



\sigma_2(θ) = \sum_{n}^{N} {[x_1(n) x_2(n)]\begin{bmatrix} cos(θ\pi/2) \\ sin(θ\pi/2) \end{bmatrix}}^2

...  ...  @@ 281,7 +281,12 @@ giving $`\Sigma^{1}`$ as: 


0 & 1/\sqrt{\sigma_2}



\end{bmatrix}



```









At this point, we have inverted the rotation and scaling in the principal component directions. and






undoes the principal component direction



of rotation and its associated scaling. However, this process has only decorrelated the images, and



a separable probability distribution has not yet been produced






...



to be contd

...  ...  