Courses
Courses for Kids
Free study material
Offline Centres
More
Store Icon
Store

Cluster Analysis

Reviewed by:
ffImage
hightlight icon
highlight icon
highlight icon
share icon
copy icon
SearchIcon

What is Cluster Analysis?

Let us first know what is cluster analysis? Cluster Analysis is a technique that groups objects which are similar to groups known as clusters. The final effect of the cluster analysis is a group of clusters where each cluster is different from other clusters and the objects within each cluster are broadly identical to each other. For example, in the scatterplot given below, two clusters are shown, one cluster shows filled circles while the other cluster shows unfilled circles.

[Image will be Uploaded Soon]

The objective of the cluster analysis is to identify similar groups of objects where the similarity between each pair of objects means some overall measures over the whole range of characteristics. In this article, we will study cluster analysis, cluster analysis examples, types of cluster analysis, cluster CBSE etc.

Cluster CBSE

A cluster CBSE refers to a group  of data points combined together because of certain similarities.

Types of Cluster Analysis. 

Some of the different types of cluster analysis are:

1. Hierarchical  Cluster Analysis

In hierarchical cluster analysis methods, a cluster is initially formed and then included in another cluster which is quite similar to the cluster which is formed to form one single cluster. This process is repeated until all subjects are found in one single cluster. This method is also known as the Agglomerative method. Agglomerative clustering also initiates with single objects and starts grouping them into clusters.

The divisive method is another type of Hierarchical cluster analysis method in which clustering initiates with the comprehensive data set and then starts grouping into partitions.

2. Centroid-based Clustering

In the centroid-based clustering, clusters are illustrated by a central entity, which may or may not be a component of the given data set. The K-Means method of clustering is used in centroid-based clustering where k are represented as the cluster centers and objects are allocated to the immediate cluster centers.

[Image will be Uploaded Soon]

3. Distribution -based Clustering

Distribution-based clustering model is strongly linked to statistics based on the models of distribution. Objects that are similar are grouped into a single cluster. This type of clustering analysis can represent some complex properties of objects such as correlation and dependence between elements.

[Image will be Uploaded Soon]

4. Density-based Clustering

In the density-based clustering analysis, clusters are identified by the areas of density that are higher than the remaining of the data set. Objects placed in scattered areas are usually required to separate clusters. The objects placed in these scattered areas are usually noisy and represented as broader points in the graph.

[Image will be Uploaded Soon]

Cluster Analysis Examples

Some cluster analysis examples are given below:

  1. Markets- Cluster analysis helps marketers to find different groups in their customer bases and then use the information to introduce targeted marketing programs.

  2. Land - It is used to identify areas of the same land used in an earth observation database.

  3. Insurance - Cluster analysis helps to identify groups who hold a motor insurance policy with a high average claim cost.

  4. Earthquake Studies - Cluster analysis helps to observe earthquakes.

  5. City-Planning - Cluster analysis helps to recognize houses on the basis of their types, house value and geographical location.

Quiz Time

1. What are the Two Types of Hierarchical Clustering Analysis?

  1. Top-down clustering ( Divisive)

  2. Bottom-top clustering (Agglomerative)

  3. Dendrogram

  4. K-means

2. Which of the Following is Needed by K-means Clustering?

  1. Defined distance metric

  2. Number of clusters

  3. Initial guess as to cluster centroids

  4. All of the above answers are correct

3. Clustering Should be Initiated on Samples of 300 or More.

  1. True

  2. False

Fun Facts

  • Cluster analysis was first introduced in anthropology by Driver and Kroeber in 1932. 

  • Cluster analysis was further introduced in psychology by Joseph Zubin in 1938 and Robert Tryon in 1939.

  • Cattell used cluster analysis  in1943 for trait theory of classification in personality psychology.

FAQs on Cluster Analysis

1. What are the Applications of Cluster Analysis?

Cluster analysis is used in various fields. Some of the applications of cluster analysis are:

  1. Cluster analysis is frequently used in outlier detection applications. It is used to diagnose credit card fraud.

  2. Cluster analysis helps to classify documents on the web for the discovery of information.

  3. Cluster analysis is used in market research, data analysis, pattern recognition, and image processing.

  4. Cluster analysis is often used by the insurance company when they find a high number of claims in a particular region. This helps them to know why the claims are increasing.

  5. As a data mining function, cluster analysis served as a tool to gain information into the distribution of data to observe characteristics of each cluster.

2. When to use Cluster Analysis?

Cluster analysis is used to differentiate objects into groups where objects in one group are more similar to each other and different form objects in other groups.

It is primarily used to perform segmentation, be it customers, products or stores. In business, products are clustered together on the basis of their features such as size, brand, flavors, etc. Stores with the same characteristics such as equal sales, size, and the customer base can be clustered together.

Cluster analysis can be used for the detection of an anomaly. For example, identifying fraud transactions.

It is often used to divide large data into smaller groups that are more amenable to other techniques. For example, logistic regression outcomes can be improved by performing it individually on smaller clusters that behave differently and may follow slightly different distributions.