...  ...  @@ 17,7 +17,7 @@ The machine learning models we have covered so far can also be interpreted from 





Many real life observations include nonlinear behavior, hence the density distributions of the data in feature space usually cannot be described by single distribution functions such as Gaussian distribution. In engineering, it is a common practice to convert a finite, nonlinear system into an infinite combination of linear functions so that we can analyze and predict the system behavior. For instance, transient heat/mass/momentum transfer with source terms:






<img src="uploads/10baf30be90723e9cc23730b2309f920/dmd_1.png" width="500">



<img src="uploads/10baf30be90723e9cc23730b2309f920/dmd_1.png" width="600">






The same strategy can also be applied in data driven learning. We can, for instance, approximate a nonlinear observation distribution as a linear combinations of basic distribution functions, as in the case of Gaussian mixtures. With this linearization, we can further convert the observed variables X into discrete latent variables. In GMM, for instance, we create latent variables by assigning the observations to specific components of the mixture model (via EM algorithm).




...  ...  @@ 134,6 +134,7 @@ Note that we do not know neither A nor S. We also want to perform this decomposi 





We have already discussed a component analysis method, PCA. At this point, you may ask what is the difference between PCA and ICA? First of all, note that what we aim here is different. In PCA, we aim maximum variance, –a weaker constraint than the independency. In PCA, what is independent from one another is the principle components, while it may carry information from more than one source dimension –and this is typically the case as we reflect multiple coordinates into principle components. It is better seen on a plot:






<img src="uploads/876dc45febab4a5593bcce12383aa455/ica_1.png" width="600">









...

...  ...  