Understanding Clustering in Unsupervised Learning
Simple explanation regarding Clustering in Unsupervised Learning
Remember Unsupervised Learning ?
In the previous article, I was explained regarding Unsupervised Learning. Unsupervised Learning is a discovery pattern Given data input only without any label.
According to Wikipedia :
Unsupervised learning is a type of machine learning that looks for previously undetected patterns in a data set with no pre-existing labels and with a minimum of human supervision. In contrast to supervised learning that usually makes use of human-labeled data, unsupervised learning, also known as self-organization allows for modeling of probability densities over inputs. Architecture or framework of unsupervised learning is provided on Figure below :
There are three case in Unsupervised Learning
Clustering, Dimensionality Reduction, and Association Rule
Clustering : grouping data based on similarity patterns
There are methods or algorithms that can be used in case clustering : K-Means Clustering, Affinity Propagation, Mean Shift, Spectral Clustering, Hierarchical Clustering, DBSCAN, ect.
In this section, only explain the intuition of Clustering in Unsupervised Learning
Clustering : Intuition
Clustering a data based on similarity patterns into 1 groups
Clustering a data based on similarity patterns into 2 groups
Clustering a data based on similarity patterns into 3 groups
Clustering a data based on similarity patterns into 4 groups
How do we know a point has the same group as another point?
As mentioned above : based on similarity patterns
How to measure the similarity of a point to another point?
The answer is : based on distance
How to measure the distance of a point to another point?
There are several ways to measure distance
- Euclidean Distance
- Manhattan Distance
- Minkowski Distance
- Hamming Distance
Euclidean Distance
Euclidean Distance represents the shortest distance between two points.
Mathematically, we can write this formula as
Example case :
In this case, the Euclidean Distance between the points is 6.3
Manhattan Distance
Manhattan Distance is the sum of absolute differences between points across all the dimensions.
Mathematically, we can write this formula as
Example case :
In this case, the Manhattan Distance between the points is 8
Minkowski Distance
Minkowski Distance is the generalized form of Euclidean and Manhattan Distance.
Mathematically, we can write this formula as
Minkowski distance can work like Manhattan or Euclidean distance. The selected P value will determine how the Minkowski distance works
- q = 1: Manhattan distance
- q = 2: Euclidean distance
Hamming Distance
Hamming Distance measures the similarity between two strings of the same length. The Hamming Distance between two strings of the same length is the number of positions at which the corresponding characters are different
Mathematically, we can write this formula as
Example case :
Continue Learning
- how to implement Unsupervised Learning with K-means Clustering?
- how to implement Unsupervised Learning with DBSCAN Clustering?
- how to implement Unsupervised Learning with Gaussian Mixture Models Clustering?
- Dimensionality Reduction
About Me
I’m a Data Scientist, Focus on Machine Learning and Deep Learning. You can reach me from Medium and Linkedin
My Website : https://komuternak.com/