Skip to content
GitLab
  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
  • Data Driven Engineering Data Driven Engineering
  • Project information
    • Project information
    • Activity
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Deployments
    • Deployments
    • Releases
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Infrastructure Registry
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Commits
Collapse sidebar
  • cihan.ates
  • Data Driven EngineeringData Driven Engineering
  • Wiki
  • Dde 1
  • Clustering

Clustering · Changes

Page history
Update Clustering authored Nov 24, 2021 by cihan.ates's avatar cihan.ates
Hide whitespace changes
Inline Side-by-side
DDE-1/Clustering.md
View page @ 608e9419
......@@ -82,7 +82,15 @@ If you want to learn more about the model, you can follow the details from the [
In both classification and clustering models, we talked about "similarities" between instances, as well as the "distances". It is important to take a breath at this point and think about what they mean in our mathematical framework of the models.
...
Any example we pass to our models is vector, where each element corresponds to a feature. Our objective is usually to figure out how these observations (vectors) are distributed to our data space, and/or whether they follow certain patterns (hyperplane). In regression, we make guesses about the label by comparing our model predictions with true values by using distance based measures such as l1, l2 norms. In classification, we try to figure out special decision boundaries that divide dataspace into meaningful fractions. Here the labels provide us an absolute reference of frame, a way to compare. This is done again based on how similar / distant one class from the other. The same is true for clustering, but this time we look at our samples in a relative frame of reference.
Therefore, predictive / descriptive capabilities of the models is strongly affected by how we define the distance, e.g., l1 or l2 norm in regression. This is also the case in clustering.
One of the most common measures is the Euclidean distance, giving the dissimilarity between the features (m=1,2,..M) of two instances:
```math
Distance((x_{i},x_{i'})) = \sum_{m=1}^{M} \Delta_m (x_{im},x_{i'm})
```
You can also check:
[17 types of similarity and dissimilarity measures](https://towardsdatascience.com/17-types-of-similarity-and-dissimilarity-measures-used-in-data-science-3eb914d2681)
......
Clone repository
Home
  • Recommended resources
  • Toolbox
  • Miscellaneous topics
DDE 1: ML for Dynamical systems
  • Complex Systems
  • Ode to Learning
  • Regression
  • Classification
  • Clustering
  • Dimensionality Reduction
  • Outlier Detection
DDE 2: Advanced topics
  • Evolutionary learning

Imprint