Definition of Correlation
Correlation refers to the process of establishing a relationship between two variables. To identify or to understand whether a relationship exists between two variables or not, you plot the points on a scatter plot. There are many ways in which you can relate the variables - like the ordinal level of measurement or higher level of measurement, but the most commonly used approach is a correlation.
Correlation in Statistics
In this section, you will be learning how to interpret correlation coefficients and calculate correlation coefficients for interval level scales as well as the original level scales. A correlation coefficient is a single number which is summarized by the relationship between 2 numbers using methods of correlation. The reason behind scaling correlation coefficient is to make sure that it always lies between +1 and -1. If the coefficient is close to 0 then the relation between the relationship between the two numbers is less and when the relationship is far away from 0 then the relationship is strong between the two variables.
The usual symbols given to these variables are X and Y. To show how these variables are related to each other, the values are illustrated by drawing them on the scatter diagram and then graph the combinations of the variables X and Y. First, the scatter diagram is drawn. Next, the method to determine Pearson’s r is performed. Initially, small samples are taken to represent it and then larger sizes of samples are used.
Types of correlation
Now that we know that the scatter plots are used to explain the correlation between two numbers or variables, let us study about correlation and its types. The relationship between the two variables can be compared using three different types of correlation: positive correlation, negative correlation, or no correlation.
Positive Correlation: This situation occurs if the value of one variable increases the value of the variable also increases
Negative Correlation: This situation occurs if the value of one variable increases the value of the decreases also decreases
No Correlation: In this situation, the variables are not dependent on each other
Pearson’s Correlation Coefficient Formula
The most commonly used formula to find the linear dependency of two sets of data is Pearson’s Correlation Coefficient Formula. The value of Pearson’s Correlation Coefficient lies between positive 1 and a negative 1. When the value of the coefficient is above +1 and less than - 1, the data is considered to be unrelated to each other. Data sets are considered to be in positive correlation if their coefficient is +1 and the data sets are considered to be in a negative correlation if their coefficient is -1.
r = \[\frac{n(\sum{xy})-(\sum{x})(\sum{y})}{\sqrt{[n\sum{x^{2}-(\sum{x})^{2}}][n\sum{y^{2}-(\sum{y})^{2}]}}}\]
Here,
n = It is the quantity of information that is available
Σx = The total value of the first variable
Σy = The total value of the second variable
Σxy = It is the product of the sums of the first and the second value
Σx2 = It is the square of the sum of the first value
Σy2 = It is the square of the sum of the second value
Linear Correlation Coefficient Formula
The formula for the linear correlation coefficient is given below:
\[\frac{n\sum_{i=1}^{n}{x_i}{y_i}-\sum_{i=1}^{n}{x_i}\sum_{i=1}^{n}{y_i}}{\sqrt{n\sum_{i=1}^{n}{x_i}^{2}-(\sum_{i=1}^{n}{x_i})^{2}}\sqrt{n\sum_{i=1}^{n}{y_i}^{2}-(\sum_{i=1}^{n}{y_i})^{2}}}\]
Sample Correlation Coefficient Formula
The sample correlation coefficient formula is: rab = Sab / SaSb
Here, = is the sample standard deviation
Sa = is the sample standard deviation
Sb = is the sample standard deviation
Sab = is the sample covariance
Population Correlation Coefficient Formula
The formula for population correlation coefficient is:
rab = σab/σaσb
Here,
σa = is the population standard deviation
σb = is the population standard deviation
σab = is the population covariance
Solved Problem
The number of years of education received and The age of entering the workforce will give us the years of formal education one has received. In the table below, you’ll see the years of education (A) a person has received and the age at which he entered the workforce (B). The survey was done among 12 people and all these people were aged above 30 years or more.
Here, you can notice that people started their formal education early and you can also notice that the relationship between the number of years of schooling and the age at which they entered the workforce. For example: see Person 11. They had just 8 years of formal education but they entered the workforce at the age of 18. The scatter diagram helps you understand the relationship between the number of years of schooling and the age at which they entered the workforce.
FAQs on Correlation
1. What is Correlation?
Correlation refers to the process of establishing a relationship between two variables. To identify or to understand whether a relationship exists between two variables or not, you plot the points on scatter plot. There are many ways in which you can relate the variables - like the ordinal level of measurement or higher level of measurement, but the most commonly used approach is a correlation.
2. Explain the Types of Correlation?
The scatter plots are used to explain the different types of correlation between two numbers or variables. Between two data sets, the correlation and its types are positive correlation, negative correlation, or no correlation.
Positive Correlation: This situation occurs if the value of one variable increases the value of the variable also increases
Negative Correlation: This situation occurs if the value of one variable increases the value of the decreases also decreases
No Correlation: In this situation, the variables are not dependent on each other
Check out the types of correlation pdf.
3. What is the Formula for Correlation Coefficient?
The most commonly used formula to find the linear dependency of two sets of data is Pearson’s Correlation Coefficient Formula. The value should lie between +1 and -1. And if the coefficient is 0 then there is no relationship between the two data sets.
n = It is the quantity of information that is available
Σx = The total value of the first variable
Σy = The total value of the second variable
Σxy = It is the product of the sums of the first and the second value
Σx2 = It is the square of the sum of the first value
Σy2 = It is the square of the sum of the second value