Measure of Central tendency

Statistics is the study of data. Data is collected resources that is translated into a meaningful information. Data is a measured values and it can be classified into four different perspectives. तथ्याङक शाश्त्र भनेको डाटाको अध्ययन गर्ने गणितको एउटा खण्ड हो। डाटा भन्नाले संकलन गरिएको कच्चा संसाधन हो जसलाई अर्थपूर्ण सुचनाको रुपमा मा प्रशोधन गर्नु पर्ने हुन्छ । समग्रमा, डाटा भन्नाले मापन गरिएको मान हो र जसलाई चार फरक दृष्टिकोणका आधारमा वर्गीकृत गर्न सकिन्छ।

  1. Based on collection (Primary and Secondary) संग्रहमा आधारित (प्राथमिक र माध्यमिक)
  2. Based on Scale/ Measurement (Nominal, Ordinal, Interval, Ratio) मापनमा आधारित (नाम बुझाउने, क्रम बुझाउने, अन्तराल बुझाउने, अनुपात बुझाउने)
  3. Based on nature (Qualitative and Quantitative) प्रकृतिका आधारित (गुणात्मक र मात्रात्मक)
  4. Based on distribution (Individual, Discrete, Continuous) बर्गिकरण \ वितरणका आधारमा (व्यक्तिगत, खण्डित, निरन्तर)
Based on these data, there are two common types of statistics.
  1. Descriptive statistics
  2. Inferential statistics
Descriptive statistics
A statistics that collects, organize and summarize the information is called Descriptive statistics. For example bar graph and mean.
Inferential statistics
A statistics that utilize current data and predicts it for future reference, is called inferential statistics. For example hypothesis test or regression analysis.


Measure of Central tendency

Measure of central tendency लाई केन्द्रीय प्रवृत्तिको मापन भनिन्छ । यसले तथ्याङकको केन्द्रमा हुने प्रवृत्तिको एकल मान (डाटा सेटको प्रतिनिधि मान) लाई जनाउदछ जसले डाटाको सम्पूर्ण मात्रात्मक सेटको प्रतिनिधित्व गर्दछ। यस केन्द्रीय प्रवृत्तिको मापनलाई स्थान वा स्थितिको मापन पनि भनिन्छ, यसैलाई औसत मापन पनि भनिन्छ।
The Measure of central tendency is a statistic that summarizes the entire quantitative set of data in a single value (a representative value of the data set) having a tendency to concentrate somewhere in the center of the data. Therefore, the tendency of the observations to cluster in the central part of the data is called the central tendency. It measures the central location (or position) of data set. It is also known as average.

NOTE
  1. केन्द्रीय प्रवृत्तिको मापन जहिले पनि डाटा सेटको दायरा भित्र पर्दछ। The Measure of central tendency lies somewhere within the range of the data set
  2. डाटा लाई फरक अर्डरमा पुनर्व्यवस्थित गर्दा केन्द्रीय प्रवृत्तिको मापनमा परिवर्तन हुदैन । The Measure of central tendency remain unchanged by a rearrangement of the data set
The most common types of such central tendencies are:
  1. मध्ययक (Mean)
  2. मध्यिका (Median), Quartile, Decile, Percentile
  3. रीत (Mode)}



Mean

Mean is measure of central tendency that utilize each and every data to give a single best value. The arithmetic mean or simply mean is also knows as average, which is obtained by dividing the sum of all the observations by total number of observations (summed).
It is denoted by \bar{X} and define as follows.

Individual data Discrete data Continuous data
Arithmetic Mean \bar{X}=\frac{\sum{x}}{n}\bar{X}=\frac{\sum{fx}}{n}\bar{X}=\frac{\sum{fm}}{n}
Geometric Mean \bar{X}= \left (\prod x \right)^{\frac{1}{n}}\bar{X}=\left (\prod f x \right )^{\frac{1}{n}}\bar{X}=\left( \prod f m \right )^{\frac{1}{n}}
Harmonic Mean \bar{X}= \frac{n}{\sum \left( \frac{1}{x}\right )}\bar{X}=\frac{n}{\sum \left( \frac{f}{x}\right )}\bar{X}=\frac{n}{\sum \left( \frac{f}{m}\right )}
Weighted Mean \bar{X}= \frac{\sum (w.x)}{\sum w}\bar{X}= \frac{\sum (w.x)}{\sum w}\bar{X}= \frac{\sum (w.x)}{\sum w}
The common type of mean are
  1. अंकगणितिय मध्यक (AM) Arithmetic Mean
    The arithmetic mean answers the question, “if all the quantities have same value, what is the value to achieve the same total?” The answer is AM. For example, let Ram has Rs 100 and Shyam has Rs 120, then the avarage amount is AM, which is answered by
    AM =\frac{a+b}{2} =\frac{100+120}{2} =Rs 110
    In the figure below, a+b is same as AM+AM.
  2. ज्यामितीय मध्यक (GM) Geometric Mean
    The geometric mean answers the question, “if all the quantities have same value, what is the value to achieve the same product?”. The geometric mean is a useful when we expect changes in data in percentages as rate of change or ratios. It is utilised in the field of finance for the purpose of determining average growth rates, which are also known as the compounded annual growth rate. For example, let Ram deposited Rs 100 in a bank, on which 80% growth in first year and 25% growth in second year, then the average profit is GM, which is answered by
    GM =\sqrt{1.80 \times 1.25}=1.50, the average growth is 50%
    Please note that, the situation can NOT be explained by \frac{80+25}{2} =52.5\%
    In the figure below, a*b is same as GM*GM.
  3. हार्मोनिक मध्यक (HM)Harmonic Mean
    Harmonic Mean is used to calculate average speeds of various distances covered.For example, Let Ram traveled 100km with fuel efficiency 25KM per liter and next 100km with fuel efficiency 16KM per liter, then the average fuel efficiency is HM, which is answered by
    HM =\frac{2*25*16}{25+16}=19.51
    Please note that, the situation can NOT be explained by AM or GM
    Because
    The full efficiency for first 100 km is \frac{100}{25}=4 liter
    The full efficiency for second 100 km is \frac{100}{16}=6.25 liter
    The total fuel efficiency is
    \frac{200}{4+6.25}=19.51
  4. भारित मध्यक (WM)Weighted Mean}
    A weighted mean is a kind of average where some data points contribute more “weight” than others. If all the weights are equal, then the weighted mean equals the arithmetic mean.



Application of Mean
The mean is calculated from all data value, so it is affected by each and every value of data set. It is applicable if the data distribution represents
  1. Quantitative data
  2. Closed ended
  3. Normally distributed data



Relation between AM, GM and HM

Let a and b are two non-negative numbers, then

  1. GM^2=AM \times HM
  2. AM \ge GM \ge HM Arithmetic mean is greater than geometric mean and harmonic mean, and geometric mean is greater than harmonic mean.

Let a and b are two non-negative numbers then,
AM=\frac{a+b}{2}, GM=\sqrt{ab}, HM=\frac{2ab}{a+b}
The proof are as follows:

  1. Now, we have
    GM^2=ab
    or GM^2=\frac{a+b}{2} \times \frac{2ab}{a+b}
    or GM^2=AM \times HM
  2. Now, we have
    AM-GM=\frac{a+b}{2}-\sqrt{ab}
    or AM-GM=\frac{a+b-2\sqrt{ab}}{2}
    or AM-GM=\frac{{{\sqrt{a}}^{2}}+{{\sqrt{b}}^{2}}-2\sqrt{a}\sqrt{b}}{2}
    or AM-GM=\frac{{{( \sqrt{a}-\sqrt{b} )}^{2}}}{2}
    or AM\ge GM (1)
    Similarly,
    GM-HM=\sqrt{ab}-\frac{2ab}{a+b}
    or GM-HM=\frac{\sqrt{ab}( a+b )-2ab}{a+b}
    or GM-HM=\frac{\sqrt{ab}( a+b )-2\sqrt{ab}\sqrt{ab}}{a+b}
    or GM-HM=\frac{\sqrt{ab}}{a+b}( a+b-2\sqrt{ab} )
    or GM-HM=\frac{\sqrt{ab}}{a+b}{{( \sqrt{a}-\sqrt{b} )}^{2}}
    or GM\ge HM(2)
    Combining (1) and (2), we get
    AM\ge GM\ge HM
Visualization of the proof

Let us suppose that a and b are two given numbers. Now, draw a semi circle with diameter a+b.

  1. Visualization of AM
    By the property of radius and diameter, we get that
    AM =\frac{a+b}{2}

  2. Visualization of GM
    By the mean proportionality property (squaring a rectangle), we can obtain by using the property of similarity that, DQ is the geometric mean given by
    GM =\sqrt{ab}

  3. Visualization of HM
    By using proportionality, we get
    Triangle ADQ and QDB are similar with AD=a, DB=b, so we have
    \frac{GM}{a}=\frac{b}{GM}
    or GM= \sqrt{ab}
    Again, by using the property of similarity on OCDE, we get that, QR is the harmonic mean given by
    HM =\frac{2ab}{a+b}
    By using proportionality, we get
    Triangle DRQ and ODQ are similar with QR=GM,QD=\sqrt{ab}, OD=\frac{a-b}{2}, so we have
    \frac{HM}{\sqrt{ab}}=\frac{\sqrt{ab}}{\frac{a+b}{2}}
    or HM= \frac{2ab}{a+b}




Example 1

Find the mean of the numbers 3, −7, 5, 13, −2
The sum of the numbers is
\sum X= 3 − 7 + 5 + 13 − 2 = 12
There are 5 numbers, so n=5.
Hence, the mean of the numbers is
\bar{X}=\frac{\sum X}{n}=\frac{12}{5}=2.4




Example 2
Find the mean of the wages from the following data
Wages507090110130150
Number of Workers245621
Based on the data given above, the frequency table is prepared as below.
Wages (X) Number of workers (f) f.x
50 2 100
70 4 280
90 5 450
110 6 660
130 2 260
150 1 150
\sum f=n=20 \sum f x=1900
Based on the formula, the mean wages is
\bar{X}=\frac{\sum fx}{n}=\frac{1900}{20}=95




Example 3
Find the average marks from the following data
Marks of the Students0-2020-4040-6060-8080-100
Number of Students2050554015
Based on the data given above, the frequency table is prepared as below.
Marks of students (X) Mid value of marks m Number of students (f) f.m
0-20 10 20 200
20-40 30 50 1500
40-60 50 55 2750
60-80 70 40 2800
80-100 90 15 1350
\sum f=n=180 \sum fm=8600
Based on the formula, the average marks is
\bar{X}=\frac{\sum fx}{n}=\frac{8600}{180}=47.8




Median

Median is a measure of central tendency that utilize middle portion of the data to give a single best value. The median is the middle value of the data series when the values are placed in order of magnitude (in ascending or descending order). Therefore, Median is not affected by extreme values. It is denoted by Md and define as follows.

Individual Discrete Continuous
Median M_d=\frac{n+1}{2} \text{th item} M_d=\frac{n+1}{2} \text{th item} M_{d-class}=\frac{n+1}{2} \text{th item}
with M_d=L+\frac{\frac{N}{2}-cf}{f} \times i
Calculating the median is also very simple. Here are the steps:
  1. Sort the data in an ascending order.
  2. Find the middle number of the sorted data.
  3. If there’s an odd number of data, get the value exactly in the middle.
  4. If there’s an even number of data, get the mean of the two middle values.
Application of Median: The median doesn’t know how far the data is. It only help to split data in two parts. It is applicable if the distribution represents
  1. Qualitative data
  2. Open ended or Skewed data



Example 1

Find the median of the following wages(in hundreds): 40,30,35,42,32,45,48
Given wages (in hundreds) are
40,30,35,42,32,45,48
Arranging the wages in ascending order, we get

30323540424548
1st item2nd item3rd item4th item5th item6th item7th item

Here, the number of data are n=7, thus, based on the formula, the Median is
M_d= \left (\frac{n+1}{2} \right )^{th} item
or M_d= \left (\frac{7+1}{2} \right )^{th} item
or M_d= 4^th item
or M_d= 40 hundreds

Example 2
Find the median of the wages from the following data
Wages507090110130150
Number of Workers245621
Based on the data given above, the frequency table is prepared as below.
Wages X Number of Workers f Cumulative frequency cf
50 2 2
70 4 6
90 5 11
110 6 17
130 2 19
150 1 20
Here, the number of data are n=20 , thus, based on the formula, the Median is
M_d= \left (\frac{n+1}{2} \right )^{th} item
or M_d= \left (\frac{20+1}{2} \right )^{th} item
or M_d= 10.5^th item
or M_d= 90
Example 3
Find the median marks from the following data
Marks of the Students0-2020-4040-6060-8080-100
Number of Students2050554015
Based on the data given above, the frequency table is prepared as below.
Marks of the Students X Number of Students f Cumulative frequency cf
0-20 20 20
20-40 50 70
40-60 55 125
60-80 40 165
80-100 15 180

Here, the number of data are n=180 , thus, based on the formula, the Median class is
Md Class= \left (\frac{n}{2} \right )^{th} item
or Md Class= \left (\frac{180}{2} \right )^{th} item
or Md Class= 90^th item
Here, 90^th item lies in the cf of 125, thus
L=40,f=55, cf=70,i=20
Hence, the Median is
M_d=L+\frac{\frac{N}{2}-cf}{f} \times i
or M_d=40+\frac{\frac{180}{2}-70}{55} \times 20=47.27

Example 4
Find the median marks from the following data
Marks of the Students0-2020-4040-6060-8080-100
Number of Students23546
Based on the data given above, the frequency table is prepared as below. \
Marks of the Students X Number of Students f Cumulative frequency cf
0-20 2 2
20-40 3 5
40-60 5 10
60-80 4 14
80-100 6 20

Here, the number of data are n=20, thus, based on the formula, the Median class is
Md Class= \left (\frac{n}{2} \right )^{th} item
or Md Class = \left (\frac{20}{2} \right )^{th} item
or Md Class = 10^th item
Here, 10^th item lies in the cf of 10, thus
L=40,f=5, cf=5,i=20
Hence, the Median is
M_d=L+\frac{\frac{N}{2}-cf}{f} \times i
or M_d=40+\frac{\frac{20}{2}-5}{5} \times 20=60
NOTE
In the example above, student may ask that the median 60 does not lie in the class 40-60 as instructed for inclusive data groupings, teaches need to encourage the usual rules for computing.




Quartile, Decile and Percentile

The formula for Quartile, Decile and Percentile are similar as of Median.
Individual Discrete Continuous
Quartile
k=1,2,3
Q_k=\frac{k(n+1)}{4} \text{th item} Q_k=\frac{k(n+1)}{4} \text{th item} Q_{k-class}=\frac{k(n)}{4} \text{th item}
with Q_k=L+\frac{\frac{kn}{4}-cf}{f} \times i
Decile
k=1,2,\cdots 9
D_k=\frac{k(n+1)}{4} \text{th item} D_k=\frac{k(n+1)}{4} \text{th item} D_{k-class}=\frac{k(n)}{4} \text{th item}
with D_k=L+\frac{\frac{kn}{4}-cf}{f} \times i
Percentile
k=1,2,\cdots 99
P_k=\frac{k(n+1)}{4} \text{th item} P_k=\frac{k(n+1)}{4} \text{th item} P_{k-class}=\frac{k(n)}{4} \text{th item}
with P_k=L+\frac{\frac{kn}{4}-cf}{f} \times i



Mode

The concept of mode, as a measure of central tendency, is preferable when it is desired to know the most typical value, e.g., the most common size of shoes, the most common size of a ready-made garment, the most common size of income, the most common size of pocket expenditure of a college student, the most common size of a family in a locality, the most common duration of cure of viral-fever, the most popular candidate in an election, etc.
Thus, Mode is a measure of central tendency that utilize fashionable (most repeated data) information to give a single best value. So, Mode is an average value which occurs most frequently in a set of data i.e. it indicates the most frequent (common) results. It is not affected by every values. It is denoted by Mo and define as follows.

Individual Discrete Continuous
Mode Repeated data Repeated data/ Table analysis M_0=L+\frac{f_1-f_0}{2f_1-f_0-f_2} \times i
M_0=L+\frac{f_2}{f_0+f_2} \times i
Application of Mode: The mode doesn’t know anything about any number in the collection but the one which appears most frequently. It is best applicable when concerning about
  1. Frequency related data
  2. Fashionable data
Example 1
Find the mode value of the following data: 3, 7, 5, 13, 20, 23, 39, 23, 40, 23, 14, 12, 56, 23, 29
Given data set are
3, 7, 5, 13, 20, 23, 39, 23, 40, 23, 14, 12, 56, 23, 29
In frequency table, the data set becomes
X3 5 7 12 13 14 20 23 29 39 40 56
f1 1 1 1 1 1 1 4 1 1 1 1
Being highest frequency 4, the mode value is 23.
Example 2
Find the Mode of the wages from the following data\par
Wages507090110130150
Number of Workers245621
Being highest frequency 6, the mode value is 110.
Example 3
Find the Mode of from the following data
Wages0-1010-2020-3030-4040-5050-6060-70
Number of Workers4121518321413

Being highest frequency 4, the model class is 40-50. Thus,
L=40,f_0=18,f_1=32,f_2=14,i=10
Hence, using formula, the Mode is
M_0=L+\frac{f_1-f_0}{2f_1-f_0-f_2} \times i
or M_0=40+\frac{32-18}{2 \times 32-18-14} \times 10=44.47

Analytical method to find the Mode

If the frequency distribution is regular, then mode is determined by the value corresponding to maximum frequency. There may be a situation where frequency distribution is NOT regular, means the concentration of observations around a value having maximum frequency is less than the concentration of observations around some other value. In such a situation, mode cannot be determined by the use of maximum frequency criterion. Further, there may be concentration of observations around more than one value of the variable and, accordingly, the distribution is said to be bi-modal or multi-modal depending upon whether it is around two or more than two values. In such cases, we use analytical method (also called tabular or grouping or empirical method) to find the Mode.
दिएको श्रेणिमा Mode अस्पष्ट भएमा वा तलका निम्न अवस्थामा यो बिधीको प्रयोग गरिन्छ ।

  1. highest frequency सख्या एक भन्दा बढी भएमा
  2. highest frequency तथ्याङकको सुरु वा अन्यतिर भएमाा
  3. highest frequency को वरिपरि ठुला frequency भएमाा
  4. frequency को अनियमित घटबढ भएमाा
यस अवस्थामा Mode पत्ता लगाउन Empirical Method (Mode=3 Median -2 Mean) वा analytical method प्रयोग गर्न सकिन्छ । तर यि दुबै बिधीमध्ये analytical method लाई बढी बिश्वासनिय मानिन्छ।
Example 4
Find the Mode of from the following data
Wages102030405060708090
Number of Workers1517222120934

Here, the maximum frequency is 22, however three are big frequencies around 22, thus we use analytical method to find the Mode.
Hence, based on the rule, the analytic table is given as below.

Wages f1st + 2nd2nd+ 3rd1st+2nd+3rd 2nd+3rd+4th3rd+4th+5th
10 1
6
20 5 23
22
30 17 44
39
40 22 60
43
50 21 63
41
60 20 50
29
70 9 32
12
80 3 16
7
90 4
  1. Prepare a table consisting of 7 column, 1st column for X, 2nd column for frequencies of X.
  2. In third column, add the frequencies, starting from the top and grouped in twos.
  3. In forth column, add the frequencies, starting from the second and grouped in twos.
  4. In fifth column, add the frequencies, starting from the top and grouped in threes .
  5. In sixth column, add frequencies, starting from the top second and grouped in threes.
  6. In seventh column, add the frequencies, starting from the top third and grouped in threes.
  7. Finally, prepare frequency chart based on the analytic table
Based on the analytic table, the frequency chart is prepared as below.
Column102030405060708090
11
211
311
4111
5111
6111
Total24531
Here, the highest frequency is aligned with 50, therefore, Mode=50.


Relation between Mean Median and Mode

A distribution in which the values of mean, median and mode coincide (i.e. mean = median = mode) is known as a symmetrical distribution.
Conversely, when values of mean, median and mode are not equal the distribution is known as asymmetrical or skewed distribution. In moderately skewed or asymmetrical distribution, a very important relationship exists among these three measures of central tendency. In such distributions
Mode = 3 Median – 2 Mean

Symmetrical Distribution

Leave a Reply

Your email address will not be published. Required fields are marked *