Measures of the Location of the Data
Learning Outcomes
- Recognize, describe, and calculate the measures of location of data: quartiles and percentiles.
The common measures of location are quartiles and percentiles.
Quartiles are special percentiles. The first quartile, [latex]Q_1[/latex], is the same as the [latex]25[/latex]th percentile, and the third quartile, [latex]Q_3[/latex], is the same as the [latex]75[/latex]th percentile. The median, [latex]M[/latex], is called both the second quartile and the [latex]50[/latex]th percentile.
The following video gives an introduction to Median, Quartiles and Interquartile Range, the topic you will learn in this section.
To calculate quartiles and percentiles, the data must be ordered from smallest to largest. Quartiles divide ordered data into quarters. Percentiles divide ordered data into hundredths. To score in the [latex]90[/latex]th percentile of an exam does not mean, necessarily, that you received [latex]90[/latex]% on a test. It means that [latex]90[/latex]% of test scores are the same or less than your score and [latex]10[/latex]% of the test scores are the same or greater than your test score.
Percentiles are useful for comparing values. For this reason, universities and colleges use percentiles extensively. One instance in which colleges and universities use percentiles is when SAT results are used to determine a minimum testing score that will be used as an acceptance factor. For example, suppose Duke accepts SAT scores at or above the [latex]75[/latex]th percentile. That translates into a score of at least [latex]1220[/latex].
Percentiles are mostly used with very large populations. Therefore, if you were to say that [latex]90[/latex]% of the test scores are less (and not the same or less) than your score, it would be acceptable because removing one particular data value is not significant.
The median is a number that measures the “center” of the data. You can think of the median as the “middle value,” but it does not actually have to be one of the observed values. It is a number that separates ordered data into halves. Half the values are the same number or smaller than the median, and half the values are the same number or larger. For example, consider the following data.
[latex]1[/latex]; [latex]11.5[/latex]; [latex]6[/latex]; [latex]7.2[/latex]; [latex]4[/latex]; [latex]8[/latex]; [latex]9[/latex]; [latex]10[/latex]; [latex]6.8[/latex]; [latex]8.3[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]10[/latex]; [latex]1[/latex]
Ordered from smallest to largest:
[latex]1[/latex]; [latex]1[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]4[/latex]; [latex]6[/latex]; [latex]6.8[/latex]; [latex]7.2[/latex]; [latex]8[/latex]; [latex]8.3[/latex]; [latex]9[/latex]; [latex]10[/latex]; [latex]10[/latex]; [latex]11.5[/latex]
Since there are [latex]14[/latex] observations, the median is between the seventh value, [latex]6.8[/latex], and the eighth value, [latex]7.2[/latex]. To find the median, add the two values together and divide by two.
The median is seven. Half of the values are smaller than seven and half of the values are larger than seven.
Quartiles are numbers that separate the data into quarters. Quartiles may or may not be part of the data. To find the quartiles, first find the median or second quartile. The first quartile, [latex]Q_1[/latex], is the middle value of the lower half of the data, and the third quartile, [latex]Q_3[/latex], is the middle value, or median, of the upper half of the data. To get the idea, consider the same data set:
[latex]1[/latex]; [latex]1[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]4[/latex]; [latex]6[/latex]; [latex]6.8[/latex]; [latex]7.2[/latex]; [latex]8[/latex]; [latex]8.3[/latex]; [latex]9[/latex]; [latex]10[/latex]; [latex]10[/latex]; [latex]11.5[/latex]
The median or second quartile is seven. The lower half of the data are [latex]1[/latex], [latex]1[/latex], [latex]2[/latex], [latex]2[/latex], [latex]4[/latex], [latex]6[/latex], [latex]6.8[/latex]. The middle value of the lower half is two.
[latex]1[/latex]; [latex]1[/latex]; [latex]2[/latex]; [latex]2[/latex]; [latex]4[/latex]; [latex]6[/latex]; [latex]6.8[/latex]
The number two, which is part of the data, is the first quartile. One-fourth of the entire sets of values are the same as or less than two and three-fourths of the values are more than two.
The upper half of the data is [latex]7.2[/latex], [latex]8[/latex], [latex]8.3[/latex], [latex]9[/latex], [latex]10[/latex], [latex]10[/latex], [latex]11.5[/latex]. The middle value of the upper half is nine.
The third quartile, [latex]Q_3[/latex], is nine. Three-fourths ([latex]75[/latex]%) of the ordered data set are less than nine. One-fourth ([latex]25[/latex]%) of the ordered data set are greater than nine. The third quartile is part of the data set in this example.
The interquartile range is a number that indicates the spread of the middle half or the middle [latex]50[/latex]% of the data. It is the difference between the third quartile ([latex]Q_3[/latex]) and the first quartile ([latex]Q_1[/latex]).
[latex]IQR[/latex] = [latex]Q_3[/latex] – [latex]Q_1[/latex]
The IQR can help to determine potential outliers. A value is suspected to be a potential outlier if it is less than (1.5)(IQR) below the first quartile or more than (1.5)(IQR) above the third quartile. Potential outliers always require further investigation.
NOTE
A potential outlier is a data point that is significantly different from the other data points. These special data points may be errors or some kind of abnormality or they may be a key to understanding the data.
Example
For the following [latex]13[/latex] real estate prices, calculate the [latex]IQR[/latex] and determine if any prices are potential outliers. Prices are in dollars.
[latex]389,950[/latex]; [latex]230,500[/latex]; [latex]158,000[/latex]; [latex]479,000[/latex]; [latex]639,000[/latex]; [latex]114,950[/latex]; [latex]5,500,000[/latex]; [latex]387,000[/latex]; [latex]659,000[/latex]; [latex]529,000[/latex]; [latex]575,000[/latex]; [latex]488,800[/latex]; [latex]1,095,000[/latex]
[reveal-answer q=”283390″]Show Solution[/reveal-answer]
[hidden-answer a=”283390″]
Order the data from smallest to largest.
[latex]114,950[/latex]; [latex]158,000[/latex]; [latex]230,500[/latex]; [latex]387,000[/latex]; [latex]389,950[/latex]; [latex]479,000[/latex]; [latex]488,800[/latex]; [latex]529,000[/latex]; [latex]575,000[/latex]; [latex]639,000[/latex]; [latex]659,000[/latex]; [latex]1,095,000[/latex]; [latex]5,500,000[/latex]
[latex]M = 488,800[/latex]
[latex]Q_1 = \frac{230,500 + 387,000}{2} = 308,750[/latex]
[latex]Q_3 = \frac{639,000 + 659,000}{2} = 649,000[/latex]
[latex]IQR = 649,000 – 308,750 = 340,250[/latex]
[latex](1.5)(IQR) = (1.5)(340,250) = 510,375[/latex]
[latex]Q_1 – (1.5)(IQR) = 308,750 – 510,375 = –201,625[/latex]
[latex]Q_3 + (1.5)(IQR) = 649,000 + 510,375 = 1,159,375[/latex]
No house price is less than [latex]–201,625[/latex]. However, [latex]5,500,000[/latex] is more than [latex]1,159,375[/latex]. Therefore, [latex]5,500,000[/latex] is a potential outlier.
[/hidden-answer]
Try It
For the following [latex]11[/latex] salaries, calculate the [latex]IQR[/latex] and determine if any salaries are outliers. The salaries are in dollars.
[latex]$33,000[/latex] [latex]$64,500[/latex] [latex]$28,000[/latex] [latex]$54,000[/latex] [latex]$72,000[/latex] [latex]$68,500[/latex] [latex]$69,000[/latex] [latex]$42,000[/latex] [latex]$54,000[/latex] [latex]$120,000[/latex] [latex]$40,500[/latex]
[reveal-answer q=”283391″]Show Solution[/reveal-answer]
[hidden-answer a=”283391″]
Order the data from smallest to largest.
[latex]$28,000[/latex] [latex]$33,000[/latex] [latex]$40,500[/latex] [latex]$42,000[/latex] [latex]$54,000[/latex] [latex]$54,000[/latex] [latex]$64,500[/latex] [latex]$68,500[/latex] [latex]$69,000[/latex] [latex]$72,000[/latex] [latex]$120,000[/latex]
Median = [latex]$54,000[/latex]
[latex]Q_1[/latex] = [latex]$40,500[/latex]
[latex]Q_3[/latex] = [latex]$69,000[/latex]
[latex]IQR[/latex] = [latex]$69,000[/latex] – [latex]$40,500[/latex] = [latex]$28,500[/latex]
(1.5)([latex]IQR[/latex]) = (1.5)($28,500) = [latex]$42,750[/latex]
[latex]Q_1[/latex] – (1.5)([latex]IQR[/latex]) = [latex]$40,500[/latex] – [latex]$42,750[/latex] = [latex]–$2,250[/latex]
[latex]Q_3[/latex] + (1.5)([latex]IQR[/latex]) = [latex]$69,000[/latex] + [latex]$42,750[/latex] = [latex]$111,750[/latex]
No salary is less than [latex]$2,250[/latex]. However, [latex]$120,000[/latex] is more than [latex]$111,750[/latex], so [latex]$120,000[/latex] is a potential outlier.
[/hidden-answer]
Try It
Find the interquartile range for the following two data sets and compare them.
Test Scores for Class A
[latex]69[/latex]; [latex]96[/latex]; [latex]81[/latex]; [latex]79[/latex]; [latex]65[/latex]; [latex]76[/latex]; [latex]83[/latex]; [latex]99[/latex]; [latex]89[/latex]; [latex]67[/latex]; [latex]90[/latex]; [latex]77[/latex]; [latex]85[/latex]; [latex]98[/latex]; [latex]66[/latex]; [latex]91[/latex]; [latex]77[/latex]; [latex]69[/latex]; [latex]80[/latex]; [latex]94[/latex]
Test Scores for Class B
[latex]90[/latex]; [latex]72[/latex]; [latex]80[/latex]; [latex]92[/latex]; [latex]90[/latex]; [latex]97[/latex]; [latex]92[/latex]; [latex]75[/latex]; [latex]79[/latex]; [latex]68[/latex]; [latex]70[/latex]; [latex]80[/latex]; [latex]99[/latex]; [latex]95[/latex]; [latex]78[/latex]; [latex]73[/latex]; [latex]71[/latex]; [latex]68[/latex]; [latex]95[/latex]; [latex]100[/latex]
[reveal-answer q=”283392″]Show Solution[/reveal-answer]
[hidden-answer a=”283392″]
Class A
Order the data from smallest to largest.
[latex]65[/latex]; [latex]66[/latex]; [latex]67[/latex]; [latex]69[/latex]; [latex]69[/latex]; [latex]76[/latex]; [latex]77[/latex]; [latex]77[/latex]; [latex]79[/latex]; [latex]80[/latex]; [latex]81[/latex]; [latex]83[/latex]; [latex]85[/latex]; [latex]89[/latex]; [latex]90[/latex]; [latex]91[/latex]; [latex]94[/latex]; [latex]96[/latex]; [latex]98[/latex]; [latex]99[/latex]
[latex]\displaystyle {Median}=\frac{{{80}+{81}}}{{2}}={80.5}[/latex]
[latex]{Q}_{{1}}=\frac{{{69}+{76}}}{{2}}={72.5}[/latex]
[latex]{Q}_{{3}}=\frac{{{90}+{91}}}{{2}}={90.5}[/latex]
[latex]IQR[/latex] = [latex]90.5[/latex] – [latex]72.5[/latex] = [latex]18[/latex]
Class B
Order the data from smallest to largest.
[latex]68[/latex]; [latex]68[/latex]; [latex]70[/latex]; [latex]71[/latex]; [latex]72[/latex]; [latex]73[/latex]; [latex]75[/latex]; [latex]78[/latex]; [latex]79[/latex]; [latex]80[/latex]; [latex]80[/latex]; [latex]90[/latex]; [latex]90[/latex]; [latex]92[/latex]; [latex]92[/latex]; [latex]95[/latex]; [latex]95[/latex]; [latex]97[/latex]; [latex]99[/latex]; [latex]100[/latex]
[latex]\displaystyle{Median}=\frac{{{80}+{80}}}{{2}}={80}[/latex]
[latex]{Q}_{{1}}=\frac{{{72}+{73}}}{{2}}={72.5}[/latex]
[latex]{Q}_{{3}}=\frac{{{92}+{95}}}{{2}}={93.5}[/latex]
[latex]IQR[/latex] = [latex]93.5[/latex] – [latex]72.5[/latex] = [latex]21[/latex]
The data for Class B has a larger [latex]IQR[/latex], so the scores between [latex]Q_3[/latex] and [latex]Q_1[/latex] (middle [latex]50[/latex]%) for the data for Class B are more spread out and not clustered about the median.
[/hidden-answer]
Example
Fifty statistics students were asked how much sleep they get per school night (rounded to the nearest hour). The results were:
| Amount of Sleep per School Night (Hours) | Frequency | Relative Frequency | Cumulative Relative Frequency |
|---|---|---|---|
| [latex]4[/latex] | [latex]2[/latex] | [latex]0.04[/latex] | [latex]0.04[/latex] |
| [latex]5[/latex] | [latex]5[/latex] | [latex]0.10[/latex] | [latex]0.14[/latex] |
| [latex]6[/latex] | [latex]7[/latex] | [latex]0.14[/latex] | [latex]0.28[/latex] |
| [latex]7[/latex] | [latex]12[/latex] | [latex]0.24[/latex] | [latex]0.52[/latex] |
| [latex]8[/latex] | [latex]14[/latex] | [latex]0.28[/latex] | [latex]0.80[/latex] |
| [latex]9[/latex] | [latex]7[/latex] | [latex]0.14[/latex] | [latex]0.94[/latex] |
| [latex]10[/latex] | [latex]3[/latex] | [latex]0.06[/latex] | [latex]1.00[/latex] |
Find the [latex]28[/latex]th percentile. Notice the [latex]0.28[/latex] in the “cumulative relative frequency” column. Twenty-eight percent of [latex]50[/latex] data values is [latex]14[/latex] values. There are [latex]14[/latex] values less than the [latex]28[/latex]th percentile. They include the two [latex]4[/latex]s, the five [latex]5[/latex]s, and the seven [latex]6[/latex]s. The [latex]28[/latex]th percentile is between the last six and the first seven. The [latex]28[/latex]th percentile is [latex]6.5[/latex].
Find the median. Look again at the “cumulative relative frequency” column and find [latex]0.52[/latex]. The median is the [latex]50[/latex]th percentile or the second quartile. [latex]50[/latex]% of [latex]50[/latex] is [latex]25[/latex]. There are [latex]25[/latex] values less than the median. They include the two [latex]4[/latex]s, the five [latex]5[/latex]s, the seven [latex]6[/latex]s, and eleven of the [latex]7[/latex]s. The median or [latex]50[/latex]th percentile is between the [latex]25[/latex]th, or seven, and [latex]26[/latex]th, or seven, values. The median is seven.
Find the third quartile. The third quartile is the same as the [latex]75[/latex]th percentile. You can “eyeball” this answer. If you look at the “cumulative relative frequency” column, you find [latex]0.52[/latex] and [latex]0.80[/latex]. When you have all the fours, fives, sixes and sevens, you have [latex]52[/latex]% of the data. When you include all the [latex]8[/latex]s, you have [latex]80[/latex]% of the data. The [latex]75[/latex]th percentile, then, must be an eight. Another way to look at the problem is to find [latex]75[/latex]% of [latex]50[/latex], which is [latex]37.5[/latex],and round up to [latex]38[/latex]. The third quartile, [latex]Q_3[/latex], is the 38th value, which is an eight. You can check this answer by counting the values. (There are [latex]37[/latex] values below the third quartile and 12 values above.)
Try It
Forty bus drivers were asked how many hours they spend each day running their routes (rounded to the nearest hour). Find the [latex]65[/latex]th percentile.
| Amount of time spent on route (hours) | Frequency | Relative Frequency | Cumulative Relative Frequency |
|---|---|---|---|
| [latex]2[/latex] | [latex]12[/latex] | [latex]0.30[/latex] | [latex]0.30[/latex] |
| [latex]3[/latex] | [latex]14[/latex] | [latex]0.35[/latex] | [latex]0.65[/latex] |
| [latex]4[/latex] | [latex]10[/latex] | [latex]0.25[/latex] | [latex]0.90[/latex] |
| [latex]5[/latex] | [latex]4[/latex] | [latex]0.10[/latex] | [latex]1.00[/latex] |
[reveal-answer q=”283393″]Show Solution[/reveal-answer]
[hidden-answer a=”283393″]
The [latex]65[/latex]th percentile is between the last three and the first four.
The [latex]65[/latex]th percentile is [latex]3.5[/latex].
[/hidden-answer]
Example
| Amount of Sleep per School Night (Hours) | Frequency | Relative Frequency | Cumulative Relative Frequency |
|---|---|---|---|
| [latex]4[/latex] | [latex]2[/latex] | [latex]0.04[/latex] | [latex]0.04[/latex] |
| [latex]5[/latex] | [latex]5[/latex] | [latex]0.10[/latex] | [latex]0.14[/latex] |
| [latex]6[/latex] | [latex]7[/latex] | [latex]0.14[/latex] | [latex]0.28[/latex] |
| [latex]7[/latex] | [latex]12[/latex] | [latex]0.24[/latex] | [latex]0.52[/latex] |
| [latex]8[/latex] | [latex]14[/latex] | [latex]0.28[/latex] | [latex]0.80[/latex] |
| [latex]9[/latex] | [latex]7[/latex] | [latex]0.14[/latex] | [latex]0.94[/latex] |
| [latex]10[/latex] | [latex]3[/latex] | [latex]0.06[/latex] | [latex]1.00[/latex] |
- Find the [latex]80[/latex]th percentile.
- Find the [latex]90[/latex]th percentile.
- Find the first quartile. What is another name for the first quartile?
[reveal-answer q=”283394″]Show Solution[/reveal-answer]
[hidden-answer a=”283394″]
Using the data from the frequency table, we have:
- The [latex]80[/latex]th percentile is between the last eight and the first nine in the table (between the [latex]40[/latex]th and [latex]41[/latex]st values). Therefore, we need to take the mean of the [latex]40[/latex]th an [latex]41[/latex]st values. The 80th percentile [latex]\displaystyle\frac{{{8}+{9}}}{{2}}={8.5}[/latex]
- The [latex]90[/latex]th percentile will be the [latex]45[/latex]th data value (location is [latex]0.90(50) = 45[/latex]) and the [latex]45[/latex]th data value is nine.
- [latex]Q_1[/latex] is also the [latex]25[/latex]th percentile. The [latex]25[/latex]th percentile location calculation: [latex]P_{25}[/latex] = [latex]0.25(50) = 12.5 ≈ 13[/latex] the [latex]13[/latex]th data value. Thus, the [latex]25[/latex]th percentile is six.
[/hidden-answer]
Try It
| Amount of time spent on route (hours) | Frequency | Relative Frequency | Cumulative Relative Frequency |
|---|---|---|---|
| [latex]2[/latex] | [latex]12[/latex] | [latex]0.30[/latex] | [latex]0.30[/latex] |
| [latex]3[/latex] | [latex]14[/latex] | [latex]0.35[/latex] | [latex]0.65[/latex] |
| [latex]4[/latex] | [latex]10[/latex] | [latex]0.25[/latex] | [latex]0.90[/latex] |
| [latex]5[/latex] | [latex]4[/latex] | [latex]0.10[/latex] | [latex]1.00[/latex] |
Find the third quartile. What is another name for the third quartile?
[reveal-answer q=”283395″]Show Solution[/reveal-answer]
[hidden-answer a=”283395″]
The third quartile is the [latex]75[/latex]th percentile, which is four. The [latex]65[/latex]th percentile is between three and four, and the [latex]90[/latex]th percentile is between four and [latex]5.75[/latex]. The third quartile is between [latex]65[/latex] and [latex]90[/latex], so it must be four.
[/hidden-answer]
Collaborative Exercise
Your instructor or a member of the class will ask everyone in class how many sweaters they own. Answer the following questions:
- How many students were surveyed?
- What kind of sampling did you do?
- Construct two different histograms. For each, starting value = _____ ending value = ____.
- Find the median, first quartile, and third quartile.
- Construct a table of the data to find the following:
-
- the 10th percentile
- the 70th percentile
- the percent of students who own less than four sweaters
A Formula for Finding the [latex]k[/latex]th Percentile
If you were to do a little research, you would find several formulas for calculating the [latex]k[/latex]th percentile. Here is one of them.
[latex]k[/latex] = the [latex]k[/latex]th percentile. It may or may not be part of the data.
[latex]i[/latex] = the index (ranking or position of a data value)
[latex]n[/latex] = the total number of data
- Order the data from smallest to largest.
- Calculate [latex]\displaystyle{i}=\frac{{k}}{{100}}{({n}+{1})}[/latex]
- If [latex]i[/latex] is an integer, then the [latex]k[/latex]th percentile is the data value in the [latex]i[/latex]th position in the ordered set of data.
- If [latex]i[/latex] is not an integer, then round [latex]i[/latex] up and round [latex]i[/latex] down to the nearest integers. Average the two data values in these two positions in the ordered data set. This is easier to understand in an example.
Example
Listed are twenty-nine ages for trees found in the Saint Louis Botanical Garden in order from smallest to largest.
[latex]18[/latex]; [latex]21[/latex]; [latex]22[/latex]; [latex]25[/latex]; [latex]26[/latex]; [latex]27[/latex]; [latex]29[/latex]; [latex]30[/latex]; [latex]31[/latex]; [latex]33[/latex]; [latex]36[/latex]; [latex]37[/latex]; [latex]41[/latex]; [latex]42[/latex]; [latex]47[/latex]; [latex]52[/latex]; [latex]55[/latex]; [latex]57[/latex]; [latex]58[/latex]; [latex]62[/latex]; [latex]64[/latex]; [latex]67[/latex]; [latex]69[/latex]; [latex]71[/latex]; [latex]72[/latex]; [latex]73[/latex]; [latex]74[/latex]; [latex]76[/latex]; [latex]77[/latex]
- Find the [latex]70[/latex]th percentile.
- Find the [latex]83[/latex]rd percentile.
[reveal-answer q=”283396″]Show Solution[/reveal-answer]
[hidden-answer a=”283396″]
- [latex]k[/latex] = [latex]70[/latex]
- [latex]i[/latex] = the index
- [latex]n[/latex] = [latex]29[/latex]
- [latex]k[/latex] = [latex]83[/latex]rd percentile
- [latex]i[/latex] = the index
- [latex]n[/latex] = [latex]29[/latex]
[/hidden-answer]
Try It
Listed are [latex]29[/latex] ages for Academy Award winning best actors in order from smallest to largest.
[latex]18[/latex]; [latex]21[/latex]; [latex]22[/latex]; [latex]25[/latex]; [latex]26[/latex]; [latex]27[/latex]; [latex]29[/latex]; [latex]30[/latex]; [latex]31[/latex]; [latex]33[/latex]; [latex]36[/latex]; [latex]37[/latex]; [latex]41[/latex]; [latex]42[/latex]; [latex]47[/latex]; [latex]52[/latex]; [latex]55[/latex]; [latex]57[/latex]; [latex]58[/latex]; [latex]62[/latex]; [latex]64[/latex]; [latex]67[/latex]; [latex]69[/latex]; [latex]71[/latex]; [latex]72[/latex]; [latex]73[/latex]; [latex]74[/latex]; [latex]76[/latex]; [latex]77[/latex]
Calculate the [latex]20[/latex]th percentile and the [latex]55[/latex]th percentile.
[reveal-answer q=”283397″]Show Solution[/reveal-answer]
[hidden-answer a=”283397″]
[latex]k[/latex] = [latex]20[/latex]. Index = [latex]\displaystyle{i}=\frac{{k}}{{100}}{({n}+{1})}=\frac{{20}}{{100}}{({29}+{1})}={6}[/latex] The age in the sixth position is [latex]27[/latex]. The [latex]20[/latex]th percentile is [latex]27[/latex] years.
[latex]k[/latex] = [latex]55[/latex]. Index = [latex]\displaystyle{i}=\frac{{k}}{{100}}{({n}+{1})}=\frac{{55}}{{100}}{({29}+{1})}={16.5}[/latex]. Round down to [latex]16[/latex] and up to [latex]17[/latex]. The age in the [latex]16[/latex]th position is [latex]52[/latex] and the age in the [latex]17[/latex]th position is [latex]55[/latex]. The average of [latex]52[/latex] and [latex]55[/latex] is [latex]53.5[/latex]. The [latex]55[/latex]th percentile is [latex]53.5[/latex] years.
[/hidden-answer]
NOTE
You can calculate percentiles using calculators and computers. There are a variety of online calculators.
A Formula for Finding the Percentile of a Value in a Data Set
- Order the data from smallest to largest.
- [latex]x[/latex] = the number of data values counting from the bottom of the data list up to but not including the data value for which you want to find the percentile.
- [latex]y[/latex] = the number of data values equal to the data value for which you want to find the percentile.
- [latex]n[/latex] = the total number of data.
- Calculate [latex]\displaystyle\frac{{{x}+{0.5}{y}}}{{n}}{({100})}[/latex]. Then round to the nearest integer.
Example
Listed are [latex]29[/latex] ages for Academy Award winning best actors in order from smallest to largest.
[latex]18[/latex]; [latex]21[/latex]; [latex]22[/latex]; [latex]25[/latex]; [latex]26[/latex]; [latex]27[/latex]; [latex]29[/latex]; [latex]30[/latex]; [latex]31[/latex]; [latex]33[/latex]; [latex]36[/latex]; [latex]37[/latex]; [latex]41[/latex]; [latex]42[/latex]; [latex]47[/latex]; [latex]52[/latex]; [latex]55[/latex]; [latex]57[/latex]; [latex]58[/latex]; [latex]62[/latex]; [latex]64[/latex]; [latex]67[/latex]; [latex]69[/latex]; [latex]71[/latex]; [latex]72[/latex]; [latex]73[/latex]; [latex]74[/latex]; [latex]76[/latex]; [latex]77[/latex]
- Find the percentile for [latex]58[/latex].
- Find the percentile for [latex]25[/latex].
[reveal-answer q=”283398″]Show Solution[/reveal-answer]
[hidden-answer a=”283398″]
- Counting from the bottom of the list, there are [latex]18[/latex] data values less than [latex]58[/latex]. There is one value of [latex]58[/latex].
- Counting from the bottom of the list, there are three data values less than [latex]25[/latex]. There is one value of [latex]25[/latex].
[/hidden-answer]
Try It
Listed are [latex]30[/latex] ages for New York Times published columnists in order from smallest to largest.
[latex]18[/latex]; [latex]21[/latex]; [latex]22[/latex]; [latex]25[/latex]; [latex]26[/latex]; [latex]27[/latex]; [latex]29[/latex]; [latex]30[/latex]; [latex]31[/latex], [latex]31[/latex]; [latex]33[/latex]; [latex]36[/latex]; [latex]37[/latex]; [latex]41[/latex]; [latex]42[/latex]; [latex]47[/latex]; [latex]52[/latex]; [latex]55[/latex]; [latex]57[/latex]; [latex]58[/latex]; [latex]62[/latex]; [latex]64[/latex]; [latex]67[/latex]; [latex]69[/latex]; [latex]71[/latex]; [latex]72[/latex]; [latex]73[/latex]; [latex]74[/latex]; [latex]76[/latex]; [latex]77[/latex]
Find the percentiles for [latex]47[/latex] and [latex]31[/latex].
[reveal-answer q=”283399″]Show Solution[/reveal-answer]
[hidden-answer a=”283399″]
Percentile for [latex]47[/latex]: Counting from the bottom of the list, there are [latex]15[/latex] data values less than [latex]47[/latex]. There is one value of [latex]47[/latex].
[latex]x=15\quad\text{and}\quad{y=1}[/latex]
[latex]\dfrac{x+0.5y}{n}(100)=\dfrac{15+0.5(1)}{30}(100)=51.67[/latex]
[latex]47[/latex] is the [latex]52[/latex]nd percentile.
Percentile for [latex]31[/latex]: Counting from the bottom of the list, there are eight data values less than [latex]31[/latex]. There are [latex]two[/latex] values of [latex]31[/latex].
[latex]x=8\quad\text{and}\quad{y=2}[/latex]
[latex]\dfrac{x+0.5y}{n}(100)=\dfrac{8+0.5(2)}{30}(100)=30[/latex]
[latex]31[/latex] is the [latex]30[/latex]th percentile.
[/hidden-answer]
Interpreting Percentiles, Quartiles, and Median
A percentile indicates the relative standing of a data value when data are sorted into numerical order from smallest to largest. Percentages of data values are less than or equal to the [latex]p[/latex]th percentile. For example, [latex]15[/latex]% of data values are less than or equal to the [latex]15[/latex]th percentile.
- Low percentiles always correspond to lower data values.
- High percentiles always correspond to higher data values.
A percentile may or may not correspond to a value judgment about whether it is “good” or “bad.” The interpretation of whether a certain percentile is “good” or “bad” depends on the context of the situation to which the data applies. In some situations, a low percentile would be considered “good;” in other contexts a high percentile might be considered “good”. In many situations, there is no value judgment that applies.
Understanding how to interpret percentiles properly is important not only when describing data, but also when calculating probabilities in later chapters of this text.
Guideline
When writing the interpretation of a percentile in the context of the given data, the sentence should contain the following information.
- information about the context of the situation being considered
- the data value (value of the variable) that represents the percentile
- the percent of individuals or items with data values below the percentile
- the percent of individuals or items with data values above the percentile.
Example
On a timed math test, the first quartile for time it took to finish the exam was [latex]35[/latex] minutes. Interpret the first quartile in the context of this situation.
[reveal-answer q=”283400″]Show Solution[/reveal-answer]
[hidden-answer a=”283400″]
- Twenty-five percent of students finished the exam in [latex]35[/latex] minutes or less.
- Seventy-five percent of students finished the exam in [latex]35[/latex] minutes or more.
- A low percentile could be considered good, as finishing more quickly on a timed exam is desirable. (If you take too long, you might not be able to finish.)
[/hidden-answer]
Try It
For the [latex]100[/latex]-meter dash, the third quartile for times for finishing the race was [latex]11.5[/latex] seconds. Interpret the third quartile in the context of the situation.
[reveal-answer q=”283401″]Show Solution[/reveal-answer]
[hidden-answer a=”283401″]
Twenty-five percent of runners finished the race in [latex]11.5[/latex] seconds or more. Seventy-five percent of runners finished the race in [latex]11.5[/latex] seconds or less. A lower percentile is good because finishing a race more quickly is desirable.
[/hidden-answer]
Example
On a [latex]20[/latex] question math test, the [latex]70[/latex]th percentile for number of correct answers was [latex]16[/latex]. Interpret the [latex]70[/latex]th percentile in the context of this situation.
[reveal-answer q=”283402″]Show Solution[/reveal-answer]
[hidden-answer a=”283402″]
- Seventy percent of students answered [latex]16[/latex] or fewer questions correctly.
- Thirty percent of students answered [latex]16[/latex] or more questions correctly.
- A higher percentile could be considered good, as answering more questions correctly is desirable.
[/hidden-answer]
Try It
On a [latex]60[/latex] point written assignment, the [latex]80[/latex]th percentile for the number of points earned was [latex]49[/latex]. Interpret the [latex]80[/latex]th percentile in the context of this situation.
[reveal-answer q=”283403″]Show Solution[/reveal-answer]
[hidden-answer a=”283403″]
Eighty percent of students earned [latex]49[/latex] points or fewer. Twenty percent of students earned 49 or more points. A higher percentile is good because getting more points on an assignment is desirable.
[/hidden-answer]
Example
At a community college, it was found that the [latex]30[/latex]th percentile of credit units that students are enrolled for is seven units. Interpret the [latex]30[/latex]th percentile in the context of this situation.
[reveal-answer q=”283404″]Show Solution[/reveal-answer]
[hidden-answer a=”283404″]
- Thirty percent of students are enrolled in seven or fewer credit units.
- Seventy percent of students are enrolled in seven or more credit units.
- In this example, there is no “good” or “bad” value judgment associated with a higher or lower percentile. Students attend community college for varied reasons and needs, and their course load varies according to their needs.
[/hidden-answer]
Try It
During a season, the [latex]40[/latex]th percentile for points scored per player in a game is eight. Interpret the [latex]40[/latex]th percentile in the context of this situation.
[reveal-answer q=”283405″]Show Solution[/reveal-answer]
[hidden-answer a=”283405″]
Forty percent of players scored eight points or fewer. Sixty percent of players scored eight points or more. A higher percentile is good because getting more points in a basketball game is desirable.
[/hidden-answer]
Example
Sharpe Middle School is applying for a grant that will be used to add fitness equipment to the gym. The principal surveyed [latex]15[/latex] anonymous students to determine how many minutes a day the students spend exercising. The results from the [latex]15[/latex] anonymous students are shown.
[latex]0[/latex] minutes; [latex]40[/latex] minutes; [latex]60[/latex] minutes; [latex]30[/latex] minutes; [latex]60[/latex] minutes
[latex]10[/latex] minutes; [latex]45[/latex] minutes; [latex]30[/latex] minutes; [latex]300[/latex] minutes; [latex]90[/latex] minutes;
[latex]30[/latex] minutes; [latex]120[/latex] minutes; [latex]60[/latex] minutes; [latex]0[/latex] minutes; [latex]20[/latex] minutes
Determine the following five values.
- Min = [latex]0[/latex]
- [latex]Q_1[/latex] = [latex]20[/latex]
- Med = [latex]40[/latex]
- [latex]Q_3[/latex] = [latex]60[/latex]
- Max = [latex]300[/latex]
[reveal-answer q=”283406″]Show Solution[/reveal-answer]
[hidden-answer a=”283406″]
If you were the principal, would you be justified in purchasing new fitness equipment? Since [latex]75[/latex]% of the students exercise for [latex]60[/latex] minutes or less daily, and since the [latex]IQR[/latex] is [latex]40[/latex] minutes [latex](60 – 20 = 40)[/latex], we know that half of the students surveyed exercise between [latex]20[/latex] minutes and [latex]60[/latex] minutes daily. This seems a reasonable amount of time spent exercising, so the principal would be justified in purchasing the new equipment.
However, the principal needs to be careful. The value [latex]300[/latex] appears to be a potential outlier.
[latex]Q_3[/latex] + [latex]1.5[/latex]([latex]IQR[/latex]) = [latex]60 + (1.5)(40) = 120[/latex].
The value [latex]300[/latex] is greater than [latex]120[/latex] so it is a potential outlier. If we delete it and calculate the five values, we get the following values:
- Min = [latex]0[/latex]
- [latex]Q_1[/latex] = [latex]20[/latex]
- [latex]Q_3[/latex] = [latex]60[/latex]
- Max = [latex]120[/latex]
We still have [latex]75[/latex]% of the students exercising for [latex]60[/latex] minutes or less daily and half of the students exercising between [latex]20[/latex] and [latex]60[/latex] minutes a day. However, [latex]15[/latex] students is a small sample and the principal should survey more students to be sure of his survey results.
[/hidden-answer]
Concept Review
The values that divide a rank-ordered set of data into [latex]100[/latex] equal parts are called percentiles. Percentiles are used to compare and interpret data. For example, an observation at the [latex]50[/latex]th percentile would be greater than [latex]50[/latex] percent of the other observations in the set. Quartiles divide data into quarters. The first quartile ([latex]Q_1[/latex]) is the 25th percentile, the second quartile ([latex]Q_2[/latex] or median) is [latex]50[/latex]th percentile, and the third quartile ([latex]Q_3[/latex]) is the the [latex]75[/latex]th percentile. The interquartile range, or [latex]IQR[/latex], is the range of the middle [latex]50[/latex] percent of the data values. The [latex]IQR[/latex] is found by subtracting [latex]Q_1[/latex] from [latex]Q_3[/latex], and can help determine outliers by using the following two expressions.
- [latex]Q_3[/latex] + [latex]IQR[/latex]([latex]1.5[/latex])
- [latex]Q_1[/latex] – [latex]IQR[/latex]([latex]1.5[/latex])
Formula Review
[latex]\displaystyle{i}={(\frac{{k}}{{100}})}{({n}+{1})}[/latex]where
[latex]i[/latex] = the ranking or position of a data value,
[latex]k[/latex] = the kth percentile,
[latex]n[/latex] = total number of data.
Expression for finding the percentile of a data value:
[latex]\displaystyle{(\frac{{{x}+{0.5}{y}}}{{n}})}{({100})}[/latex]
where
[latex]x[/latex] = the number of values counting from the bottom of the data list up to but not including the data value for which you want to find the percentile,
[latex]y[/latex]= the number of data values equal to the data value for which you want to find the percentile,
[latex]n[/latex] = total number of data
References
Cauchon, Dennis, Paul Overberg. “Census data shows minorities now a majority of U.S. births.” USA Today, 2012. Available online at http://usatoday30.usatoday.com/news/nation/story/2012-05-17/minority-birthscensus/55029100/1 (accessed April 3, 2013).
Data from the United States Department of Commerce: United States Census Bureau. Available online at http://www.census.gov/ (accessed April 3, 2013).
“1990 Census.” United States Department of Commerce: United States Census Bureau. Available online at http://www.census.gov/main/www/cen1990.html (accessed April 3, 2013).
Data from
San Jose Mercury News.
Data from
Time Magazine; survey by Yankelovich Partners, Inc.