DBScan

DBScan:A density-based clustering algorithm (KDD’96)

  1. Major features
  • Discover clusters of arbitrary shape
  • Handle noise
  • One scan
  • Need density parameters as termination condition
  1. Two parameters
  • Eps: Maximum radius of the neighborhood
  • MinPts: Minimum number of points in an Eps-neighborhood of that point
  1. Four principles of DBScan
  • Each cluster contains at least one core point.
  • Given any two core points p and q, if N(p) contains q (or N(q) contains p), then p and q are in the same cluster.
  • Consider a border point p to be assigned to one of the clusters formed by Principle 1 and Principle 2. Suppose N(p) contains multiple core points. A border point p is assigned arbitrarily to one of the clusters containing these core points (formed by Principle 1 and Principle 2).
  • All noise points do not belong to any clusters.
  1. Advantage:
  • Clusters can have arbitrary shape and size, i.e., clusters are not restricted to have convex shapes.
  • Number of clusters is determined automatically, as opposed to k-means.
  • Can separate clusters from noise. Robust to outliers.
  • Can be supported by spatial index structures (R*-tree).
  1. Disadvantage:
  • Input parameter may be difficult to determine
  • In some situations very sensitive to input parameter setting
  • DBSCAN is not entirely deterministic: border points are reachable from more than one clusters.
  1. Determining EPS and MinPts

Todo

  • Algorithms of DBScan

Reference

  1. Wiki: DBSCAN

  2. DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation*