Introduction to Numerical Measures
Data is organized and summarized either graphically or numerically. Graphical descriptions of data are often used. But, if the given data set is large, constructing a graph becomes tedious. Although we can visualize the shape, centre, and spread of the distribution of the data set from the histogram, we cannot quantify data. We need to find out the numerical measures for describing data.
A statistic is a numerical descriptive measure calculated from sample data whereas a parameter is a numerical descriptive measure of a large population. Generally, the values of parameters are not known. We calculate statistics from the sample data, and based on the data in the samples, make claims about the parameters, which represent the population from the sample data.
Here, we will illustrate the numerical descriptive measures for sample and population, their computation, their meaning, and their uses.
Numerical Descriptive Measures For Sample
There are three types of numerical descriptive measures in statistics.
Measure of Central Tendency
Variability
Shape
Let's discuss each of them
Measures of Central Tendency
Measure of central tendency is defined as the number that represents the centre of a set of ordered numerical data. The different measures of central tendency are mean, median, mode, and geometric mean.
Mean
Mean, also known as average, is a measure of the central tendency of a group of values. Mean, generally refers to the arithmetic mean, as opposed to harmonic mean or geometric mean. The value of the mean is extremely affected by outliers.
To calculate the mean, we take the sum of all the values and divide it by the number of values as shown below.
\[\bar{x} = \frac{x_{1} + x_{2} + x_{3} + … + xn}{n}\]
I mean is calculated from the sample of a population, then it is known as sampling mean, represented as \[(\bar{x})\], whereas population mean is represented as \[(\mu)\].
Median
Median is a measure of central tendency which distributes the data into two parts, separating the upper half and lower half of data by a value known as the median. The median is affected by the extreme values.
Locating The Median
If the given data is arranged in order, then the median is located at \[\frac{n+1}{2}\] data values.
If the number of values is odd, the median is the middle number whereas if the number of values is even, the medium is the average of the two middle numbers.
Note: \[\frac{n+1}{2}\] is not the value of median but only the position of median in the ranked data.
Mode
Mode is the value that occurs most frequently. It is not affected by extreme values. It is used either for categorical data or numerical data. There may be several modes or no modes.
Geometric Mean
In Mathematics, the term Geometric mean is defined as the average or mean which represents the central tendency or typical value of a set of numbers by using the product of their values in opposition to the arithmetic mean that uses sum). For a collection {x₁, x₂,...xₙ} of a positive real number, the geometric mean is defined as:
GM {x₁, x₂,...xₙ} =\[\sqrt[n]{x_{1}, x_{2}, ...xn}\]
Example:
Find the geometric mean of 2 and 32.
GM ( 2, 32) = \[\sqrt{2.32}\] = \[\sqrt{64}\] = 8
Therefore, the geometric mean of 2 and 32 is 8
Measure of Variations
Variations measure the spread or dispersion of values in a data set. The different parameters of variations are:
Range
Interquartile range
Variance
Standard Deviation
Coefficient of Variation
Let us discuss each of the parameters of variation
Range
The difference between the greatest value and the smallest value of a given data set is termed as the range. It is the easiest measure of variation.
Range = Largest Value - Smallest Value
It ignores how data is distributed and is also sensitive to outliers.
Interquartile Range
The interquartile range is also the measure of spread or variation, based on splitting a given data set into four quartiles. Quartiles divide the rank-ordered set into 4 equal parts. The values that divide each data are known as the first quartile, second quartile, and third quartile, and are represented as Q1, Q2, and Q3.
First Quartile (Q1) -The first quartile divides the series into 4 equal parts. It is also known as the lower quartile. It divides the series in such a way that 25% of the observations are below it and the remaining 75% are above it.
Second Quartile (Q2) - The second quartile divides the series into 4 equal parts. It is also known as the median. It divides the series equally. 50% of the observations are below it and the other 50% of the observations are above it.
Third Quartile (Q3)- The third quartile divides the series into 4 equal parts. It is also known as the upper quartile. It divides the series in such a way that 75% of the observations are below it and the remaining 25% of the observations are above it.
Interquartile Range = Q3 - Q1
Variance
Variance is the average (approximately) square deviation of values from the mean.
Sample Variance: S² =\[\frac{\sum_{i=1}^{n}(X - \bar{X})^{2}}{n - 1}\]
Here,
\[\bar{X}\] - Arithmetic mean
n = Sample size
Xi = ith value of the variable X
Standard Deviation
Standard deviation is the most commonly used measure of variation for the samples. It shows variation about the means and has the same unit as the original data.
Sample Standard Deviation: S = \[\sqrt{\frac{\sum_{i=1}^{n}(X - \bar{X})^{2}}{n - 1}}\]
Coefficient of Variation
The term coefficient of variation is defined as the standard deviation, divided by the mean, and multiplied by 100. It is always calculated in percentages and shows variation relative to the mean.
The coefficient of variation can be used to compare two or more data sets measured in different units.
CV = \[(\frac{S}{\bar{x}})\] * 100
Measure of Variation Summary
The more the data are spread out, the larger the range, interquartile range, variance, and standard deviation.
The lesser the data are spread out, the smaller the range, interquartile range, variance, and standard deviation.
If there is no variation (all values are the same), then all these measures will be 0.
None of these measures will ever be negative.
Measure of Shape
The shape of the distribution shows how data is distributed. The measures of shape are symmetric or skewed.
Left - Skewed
Mean < Median
(Image will be Uploaded soon)
Symmetric
Mean = Median
(Image will be Uploaded soon)
Right- Skewed
Mean > Median
(Image will be Uploaded soon)
Numerical Descriptive Measures For Population
Numerical descriptive measures described previously are of samples, not population
Numerative descriptive measure, describing a population known as parameters, and are represented by Greek letters.
Important population parameters are population mean, population variance, and population standard deviation.
Population Mean - The population mean is the sum of all the values in the population divided by the size of the population, N.
\[\mu\]=\[\frac{\sum_{i=1}^{N}Xi}{N}\]=\[\frac{X_{1} + X_{2} + X_{3}...XN}{N}\]
Where,
\[\mu\] - Population Mean
N - Population Size
Xi - ith value of the variable X
Population Variance - The population variance is the average of the square deviation of values from the mean.
\[\sigma^{2}\] = \[\sqrt{\frac{\sum_{i=1}^{N}(Xi - \mu)^{2}}{N}}\]
Where,
\[\mu\] - Population Mean
N - Population Size
Xi - ith value of the variable X
Population Standard Deviation - It is the most commonly used measure of variations and has the same unit as the original data.
Population Standard Deviation : \[\sigma\] = \[\sqrt{\frac{\sum_{i=1}^{N}(Xi - \mu)^{2}}{N}}\]
Where,
\[\mu\] - Population Mean
N- Population size
Xi - ith value of the variable X
Solved Examples
1. Find the mean, median, mode, and range for the data given below.
12 , 17, 12, 13, 12, 14, 13, 21, 12
Solution:
Mean - It is the sum of all the values divided by the number of values as shown below.
Mean = \[\frac{12+17+12+13+12+14+13+21+12}{9}\] = 14
Median - The median is the middle or central value of the data set. To calculate the median, we will arrange the data in ascending order as 12, 12, 12, 12, 13, 13, 14, 17, 21.
There are 9 numbers, so the middle value is
\[\frac{9+1}{2}\]= 5
= 5th number
Therefore, the median is 13
Mode - The value that occurs most frequently in a given data is termed as mode. Accordingly, 12 is the mode.
Range - Largest Value - Smallest Value
Largest Value = 21
Smallest Value = 12
Range = 21 - 12 = 9
2. Find the coefficient of variation for the data given below.
Stock A
Average Price of Last Year = 60
Standard Deviation = 6
Stock B
Average Price of Last Year = 100
Standard Deviation = 6
Solution:
Stock A :
Average Price of Last Year = 60
Standard Deviation = 6
CV of stock A = \[(\frac{S}{\bar{X}})\] \[\times\] 100% = \[\frac{6}{60}\] \[\times\] 100% = 10%
CV of stock B = \[(\frac{S}{\bar{X}})\] \[\times\] 100% = \[\frac{6}{100}\] \[\times\] 100% = 6%
Both stock A and stock B have a similar standard deviation, but stock B is less variable in comparison to its price.
FAQs on Statistics Numerical Measures
1. Which measure of central tendency is most preferred?
The mean is considered to be the best measure of central tendency as it includes all the values in the data set for its calculations, and any change in any of the scores will affect the mean value. However, there are some situations where other measures of central tendency are more preferred.
Some scores in the distribution have undetermined values.
There are few scores in the distribution
Data are quantified on an ordinary scale
There is an open-end distribution
Mode is generally preferred when data is measured on a nominal scale whereas geometric mean is preferred when data is measured on a logarithmic scale.
2. What are the descriptive statistics?
Descriptive statistics, as the name suggests, refers to the analysis, summarise, presentation of findings related to the data set that is derived from a sample or entire population.
3. What is the measure of variation or spread?
The measure of spread or variation is the way of summarizing a group data by describing how the scores are spread out in the data set. For example, the mean score of 100 students may be 75 out of 100. However, not all students in a class have 75 out of 100. Instead, their score will be spread out. Some students will have lower marks whereas others will have higher marks. The measure of spread helps to summarize how spread out these scores are. To describe the measure of spread precisely, several statistics are available to us including range, interquartile range, variance, standard deviation, etc.