DBScan:A density-based clustering algorithm (KDD’96)
- Major features
- Discover clusters of arbitrary shape
- Handle noise
- One scan
- Need density parameters as termination condition
- Two parameters
- Eps: Maximum radius of the neighborhood
- MinPts: Minimum number of points in an Eps-neighborhood of that point
- Four principles of DBScan
- Each cluster contains at least one core point.
- Given any two core points p and q, if N(p) contains q (or N(q) contains p), then p and q are in the same cluster.
- Consider a border point p to be assigned to one of the clusters formed by Principle 1 and Principle 2. Suppose N(p) contains multiple core points. A border point p is assigned arbitrarily to one of the clusters containing these core points (formed by Principle 1 and Principle 2).
- All noise points do not belong to any clusters.
- Advantage:
- Clusters can have arbitrary shape and size, i.e., clusters are not restricted to have convex shapes.
- Number of clusters is determined automatically, as opposed to k-means.
- Can separate clusters from noise. Robust to outliers.
- Can be supported by spatial index structures (R*-tree).
- Disadvantage:
- Input parameter may be difficult to determine
- In some situations very sensitive to input parameter setting
- DBSCAN is not entirely deterministic: border points are reachable from more than one clusters.
- Determining EPS and MinPts
Todo
- Algorithms of DBScan