Courses
Courses for Kids
Free study material
Offline Centres
More
Store Icon
Store

Karl Pearson's Coefficient of Correlation: Calculation and Use

Reviewed by:
ffImage
hightlight icon
highlight icon
highlight icon
share icon
copy icon
SearchIcon

Karl Pearson Coefficient of Correlation – A Statistical Study

The study of Karl Pearson's Coefficient of Correlation is an inevitable part of Statistics. Statistics is majorly dependent on Karl Pearson Coefficient Correlation method. The Karl Pearson coefficient is defined as a linear correlation that falls in the numeric range of -1 to +1.


This is a quantitative method that offers the numeric value to form the intensity of the linear relationship between the X and Y variable. But is it really useful for any economic calculation? Let, us find and delve into this topic to get more detailed information on the subject matter – Karl Pearson Coefficient of Correlation.


Do You Know?

  • The correlation was developed in 1885 by Francis Galton! 

  • Karl Pearson was actually a British statistician who was known as the leading founder of modern statistics.  

  • It is regarded as the best method of measuring the association between two variables of interest as it is based on another popular method called covariance. 

  • Karl Pearson’s method gets highly affected by extreme value items, so we cannot draw any immediate conclusion using this method.


What do You mean by Correlation Coefficient?

Before looking into details about Karl Pearson's Coefficient of Correlation, it is vital to brush up on fundamental concepts about correlation and its coefficient in general.


The correlation coefficient can be defined as a measure of the relationship between two quantitative or qualitative variables, i.e., X and Y. It serves as a statistical tool that helps to analyze and in turn, measure the degree of the linear relationship between the variables.


For example, a change in the monthly income (X) of a person leads to a change in their monthly expenditure (Y). With the help of correlation, you can measure the degree up to which such a change can impact the other variables.


Types of Correlation Coefficient

Depending on the direction of the relationship between variables, correlation can be of three types, namely –

  • Positive Correlation (0 to +1)

  • Negative Correlation (0 to -1)

  • Zero Correlation (0)


Positive Correlation (0 to +1)

In this case, the direction of change between X and Y is the same. For instance, an increase in the duration of a workout leads to an increase in the number of calories one burns.


Negative Correlation (0 to -1)

Here, the direction of change between X and Y variables is opposite. For example, when the price of a commodity increases its demand decreases.


Zero Correlation (0)

There is no relationship between the variables in this case. For instance, an increase in height has no impact on one’s intelligence.


Now that we have refreshed our memory of these basics, let’s move on to Karl Pearson Coefficient of Correlation.


What is Karl Pearson’s Coefficient of Correlation?

This method is also known as the Product Moment Correlation Coefficient and was developed by Karl Pearson. It is one of the three most potent and extensively used methods to measure the level of correlation, besides the Scatter Diagram and Spearman’s Rank Correlation.


The Karl Pearson correlation coefficient method is quantitative and offers numerical value to establish the intensity of the linear relationship between X and Y. Such a coefficient correlation is represented as ‘r’.


The Karl Pearson Coefficient of Correlation formula is expressed as 

r = \[\frac{n\left ( \sum xy \right )-\left ( \sum x \right )\left ( \sum y \right )}{\sqrt{\left [ n\sum x^{2}-\left (\sum x  \right )^{2} \right ]\left [ n\sum y^{2}-\left (\sum y  \right )^{2} \right ]}}\]


In this formula,


\[X-\bar{X}\]


is mean of the X variable.

\[Y-\bar{Y}\]


is the mean of the Y variable.


Methods of Karl Pearson’s Coefficient of Correlation and its Calculation 

The Karl Pearson coefficient can be obtained using various methods, which are mentioned below.


  1. Actual Method

  2. Direct Method

  3. Short Cut/Assumed/Indirect Method

  4. Step Deviation Method


1. Actual Mean Method Which is Expressed as

Actual Mean Method Which is Expressed as -

r = \[\frac{\sum \left ( X-\bar{X} \right )\left ( Y-\bar{Y} \right )}{\sqrt{\sum \left ( X-\bar{X} \right )^{2}\sqrt{\left ( Y-\bar{Y} \right )^{2}}}}\]


Where, \[\bar{X}\] = mean of X variable


  \[\bar{Y}\] = mean of Y variable


In this Karl Pearson formula,


x = \[X-\bar{X}\]


y = \[X-\bar{Y}\]


2. Direct Method

Steps to Calculate the Coefficient of Correlation Using the Direct Method:


1. Calculate the sum of the X series ($ \Sigma X $).

2. Calculate the sum of the Y series ($ \Sigma Y $).

3. Square each value in the X series and find their total ($\Sigma X^2$).

4. Square each value in the Y series and find their total ($\Sigma Y^2 $).

5. Multiply corresponding values of the X and Y series, then find the total ($\Sigma XY$).

6. Use the formula below to compute the Coefficient of Correlation:

$ r = \frac{N \Sigma XY - \Sigma X \cdot \Sigma Y}{\sqrt{N \Sigma X^2 - (\Sigma X)^2} \cdot \sqrt{N \Sigma Y^2 - (\Sigma Y)^2}}$


Where:

  • N is the number of paired values in the dataset.

  • r is the coefficient of correlation.


3. Assumed Mean Method Which is Expressed as

Assume Mean Method


d\[_{x}\] = X - A


d\[_{y}\] = Y - A


r = \[\frac{N\sum d_{x}d_{y}-\left ( \sum d_{x} \right )\left ( \sum d_{y} \right )}{\sqrt{N\sum d_{x}^{2}-\left ( \sum d_{x} \right )^{2}}-\sqrt{N\sum d_{y}^{2}-\left ( \sum d_{y} \right )^{2}}}\]


In this Karl Pearson Correlation formula,

  • dx = x-series’ deviation from assumed mean, wherein (X - A)

  • dy = Y-series’ deviation from assumed mean = ( Y - A)

  • Σdx.dy implies summation of multiple dx and dy.

  • Σdx2 is the summation of the square of dx.

  • Σdy2 is the summation of the square of dy.

  • Σdx is the summation of X-series' deviation.

  • Σdy is a summation of the Y-series.


N is the number of observations in pairs.


4. Step Deviation Method Which is Expressed as

r = \[\frac{dX'dY'-\frac{\sum d'X\sum dY'}{N}}{\sqrt{\left ( \sum dx^{1} \right )^{2}}-\frac{\left (\sum dx^{1}  \right )^{2}}{N}.\left ( \sum dy' \right )^{2}\frac{\left ( \sum dy' \right )^{2}}{N}}\]

In this particular Karl Pearson Method,

dx′=dxC1dx′=dxC1

dy′=dyC2dy′=dyC2

C1 = Common factor for series -x

C2 = Common factor for series -y

dx is x-series’ deviation from the assumed mean, where (X - A)

dy is Y-series’ deviation from the assumed mean, where ( Y - A)

Σdx.dy implies summation of multiple dx and dy.

Σdx2 is the summation of the square of dx.

Σdy2 is the summation of the square of dy.

Σdx is the summation of X-series' deviation.

Σdy is the summation of the Y-series.

N is the number of observations in pairs.


Characteristics of Karl Pearson's Coefficient of Correlation

  • Value Range: It ranges from -1 to +1. A value close to +1 shows a strong positive relationship, close to -1 shows a strong negative relationship, and 0 means no linear relationship.

  • Direction: A positive value means both variables move in the same direction (e.g. if one increases, the other increases). A negative value means the variables move in opposite directions (e.g. if one increases, the other decreases).

  • Measures Linear Relationship: It only shows the strength of a straight-line (linear) relationship between two variables. It doesn’t work well for curved or complex relationships.

  • Unit-Free: The coefficient doesn’t depend on the units of measurement (e.g., cm, kg). It’s a pure number that shows strength and direction.

  • Symmetry: The correlation between X and Y is the same as the correlation between Y and X. This means the order of variables doesn’t matter.


Advantages and Disadvantages of Karl Pearson's Coefficient of Correlation:

Advantages

  • The formula is straightforward and easy to compute with basic statistical tools. 

  • It not only shows how strongly two variables are related but also whether the relationship is positive or negative.

  • The coefficient is not affected by the units of measurement, making it easy to compare relationships across different datasets.

  • The correlation between X and Y is the same as the correlation between Y and X, making the calculation unbiased with respect to the order of variables.

  • It is a well-established and commonly used method, making it reliable for comparing relationships in many fields.


Disadvantages 

  • It cannot detect non-linear relationships, so it may give misleading results if the relationship is not straight-line.

  • Extreme values in the data can distort the result and give an incorrect measure of correlation.

  • The method assumes the data is normally distributed, which may not always be the case in real-life scenarios.

  • It only shows association, not cause and effect. Two variables might be correlated but not necessarily related in a meaningful way.

  • It only works with numerical data and cannot be used for categorical variables.


Solving a Few Karl Pearson Coefficient of Correlation Questions

Task 1: Refer to the table below and find out ‘r’ with the help of the provided data. Use the Actual Mean Method to solve it.

Price of Mango (Rs.)

15

25

35

40

50

65

75

Supply of Mango (units)

2

5

6

8

9

10

14


Task 2: With the help of this table below, find out ‘r’ using Karl Pearson Coefficient of Correlation Direct Method Formula.

Age of husband

21

24

27

29

31

35

38

Age of wife

19

21

25

26

29

32

34


Pro Tip: Try to solve one or two Karl Pearson coefficient of correlation problems using all the methods to figure out which is the easiest and shortest method of the lot. However, make sure to be thorough with all the formulas of the Karl Pearson coefficient of correlation, so that you can attempt them in your exams with greater confidence.


Once you have solved the Karl Pearson Coefficient of Correlation sums, you will be able to understand the degree of relationship between discussed variables and relate it with reality better.


Since we gained a fair idea about Pearson’s correlation of coefficient and have also become familiar with its question format, let’s learn about its properties as well.


In case you are wondering, “Why should I check out the properties of coefficient of correlation?” - Note that a clear idea about correlation coefficient will come in handy both during exam preparation and while solving Karl Pearson Coefficient of Correlation sums. It will help you retain every minute yet vital pointer about this ratio and would further prevent you from making any silly mistake.


That being said, let’s glance through these significant properties in brief –

  • The Correlation Coefficient (r) does not have any unit.

  • r with a positive value signifies that both X and Y move along the same direction.

  • r with a negative value indicates an inverse relation between X and Y.

  • X and Y are said to be not correlated if the value of r is 0.

  • r with a high value signifies a strong linear relationship between two variables.

  • r with a low value signifies a weak relationship between two variables.

  • Correlation between two variables is said to be perfect if the value of r is either +1 or -1.


Assumptions of Karl Pearson's Coefficient of Correlation

When we calculate the Karl Pearson Correlation, we are required to make a few assumptions in mind.


Following are the two main assumptions:

  • There is always a linear relationship between any two variables.

  • We are required to keep the outliers to a minimum range or remove them totally.


Outliers are data that contrasts drastically with the rest of the data. It might signify many extreme data which actually does not fit in the set. You can spot an outlier by plotting the data in a graph paper and looking for any extreme study.


Use of Karl Pearson Coefficient in Real Life 

We see that the Karl Pearson Coefficient Correlation is used extensively in mathematical procedures. In the calculation of any economic problem, this gains great vitality by estimating the variables for X and Y and thereby sorting to find the intensity between them. 


To logically and accurately understand the effect of one change in regard to another we can use this method. For example, a shoe manufacturer in order to understand the varied sizes of shoes he first needs to assimilate the common foot sizes, after placing them in the Karl Pearson Coefficient Correlation formula he can estimate the requirement accordingly.

FAQs on Karl Pearson's Coefficient of Correlation: Calculation and Use

1. How to find Karl Pearson's Coefficient Correlation?

By using this formula, one can find coefficient correlation

r = \[\frac{n\left ( \sum xy \right )-\left ( \sum x \right )\left ( \sum y \right )}{\sqrt{\left [ n\sum x^{2}-\left ( \sum x \right )^{2} \right ]\left [ n\sum y^{2}-\left ( \sum y \right )^{2} \right ]}}\]

2. Give Examples of Positive and Negative Correlation.

For negative correlation (0 to -1) – As the cost of flight tickets increases, it leads to a decrease in its demand. For positive correlation (0 to +1) – An increase in temperature increases the demand for soft drinks and ice cream.

3. Give an Example for Zero Correlation.

An increase in the cost of mango and an increase in demand for shirts is not related and hence is a Karl Pearson coefficient of correlation example for zero correlation.

4. What are the limits of Karl Pearson's correlation coefficient?

The limits are between -1 and +1. A value of +1 shows perfect positive correlation, -1 shows perfect negative correlation, and 0 indicates no linear relationship.

5. Karl Pearson's Coefficient of Correlation is denoted by the symbol?

It is denoted by the symbol r.

6. What is the Karl Pearson's coefficient of correlation formula?

The formula is:

r =$ \dfrac{N \sum XY - \sum X \cdot \sum Y}{\sqrt{N \sum X^2 - (\sum X)^2} \cdot \sqrt{N \sum Y^2 - (\sum Y)^2}}$


where r is the correlation coefficient.

7. Give an Karl Pearson's Coefficient of Correlation Solved Example

Consider the following data for two variables XX (e.g., hours studied) and YY (e.g., marks obtained):


X

Y

1

2

2

4

3

6

4

8

5

10


Formula for Pearson's Coefficient:

r = $\dfrac{n\sum(XY) - \sum X \sum Y}{\sqrt{\left[n\sum(X^2) - (\sum X)^2\right]\left[n\sum(Y^2) - (\sum Y)^2\right]}}$

Where:

  • n: Number of data points

  • $\sum X$: Sum of X values

  • $\sum Y$: Sum of Y values

  • $\sum(XY)$: Sum of the product of X and Y

  • $\sum(X^2)$: Sum of the squares of X

  • $\sum(Y^2)$: Sum of the squares of Y

Step-by-Step Calculation:


1. Prepare the table:


X

Y

$X \cdot Y$

$X^2$

$Y^2$

1

2

2

1

4

2

4

8

4

16

3

6

18

9

36

4

8

32

16

64

5

10

50

25

100


2. Calculate sums:

  • $\sum X $= 1 + 2 + 3 + 4 + 5 = 15

  • $\sum Y$ = 2 + 4 + 6 + 8 + 10 = 30

  • $\sum(XY)$ = 2 + 8 + 18 + 32 + 50 = 110

  • $\sum(X^2)$ = 1 + 4 + 9 + 16 + 25 = 55

  • $\sum(Y^2)$ = 4 + 16 + 36 + 64 + 100 = 220

  • n = 5 (number of data points)


3. Apply the formula:

$r = \frac{n\sum(XY) - \sum X \sum Y}{\sqrt{\left[n\sum(X^2) - (\sum X)^2\right]\left[n\sum(Y^2) - (\sum Y)^2\right]}}$

Substitute values:

r =$ \frac{5(110) - (15)(30)}{\sqrt{\left[5(55) - (15)^2\right]\left[5(220) - (30)^2\right]}}$


4. Simplify:

$r = \frac{550 - 450}{\sqrt{\left[275 - 225\right]\left[1100 - 900\right]}} $

r = $\frac{100}{\sqrt{50 \cdot 200}} r=10010000r = \frac{100}{\sqrt{10000}}$

r = $\frac{100}{100}$

r = 1 


Pearson's coefficient of correlation (rr) is 1, indicating a perfect positive linear relationship between X and Y.

8. When did Karl Pearson invent the formula for Coefficient of Correlation?

Karl Pearson introduced the formula for measuring the relationship between two variables in 1890. This method, known as Karl Pearson's Coefficient of Correlation, helps determine how strongly two things are related.

9. What are the properties of Karl Pearson's coefficient of correlation?

Properties of Karl Pearson's Coefficient of Correlation:

1. It measures the strength and direction of the linear relationship between two variables (x and y).

2. The value of the correlation coefficient (r) always lies between -1 and 1:

  • r = 1: Perfect positive relationship.

  • r = −1: Perfect negative relationship.

  • r = 0: No linear relationship.

3. It does not distinguish between dependent and independent variables.

4. It is sensitive to outliers (extreme values can affect the result).

10. How many types of correlation are there, and what do they mean?

Correlation is a way to measure how two things are related or connected. There are three main types of correlation:

  • Positive Correlation: When one variable increases, the other also increases. They move in the same direction.

Example: As the temperature rises, ice cream sales also go up.

  • Negative Correlation: When one variable increases, the other decreases. They move in opposite directions.

Example: As you exercise more, your body weight may go down.

  • No Correlation: There is no connection between the two variables.

Example: The number of books you read and the color of your car.