Box Plots
Learning Outcomes
- Display data graphically and interpret graphs: stemplots, histograms, and box plots.
- Recognize, describe, and calculate the measures of location of data: quartiles and percentiles.
Box plots (also called box-and-whisker plots or box-whisker plots) give a good graphical image of the concentration of the data. They also show how far the extreme values are from most of the data. A box plot is constructed from five values: the minimum value, the first quartile, the median, the third quartile, and the maximum value. We use these values to compare how close other data values are to them.
To construct a box plot, use a horizontal or vertical number line and a rectangular box. The smallest and largest data values label the endpoints of the axis. The first quartile marks one end of the box and the third quartile marks the other end of the box. Approximately the middle [latex]50[/latex] percent of the data fall inside the box. The “whiskers” extend from the ends of the box to the smallest and largest data values. The median or second quartile can be between the first and third quartiles, or it can be one, or the other, or both. The box plot gives a good, quick picture of the data.
Note
You may encounter box-and-whisker plots that have dots marking outlier values. In those cases, the whiskers are not extending to the minimum and maximum values.
Consider, again, this dataset.
[latex]1[/latex], [latex]1[/latex], [latex]2[/latex], [latex]2[/latex], [latex]4[/latex], [latex]6[/latex], [latex]6.8[/latex], [latex]7.2[/latex], [latex]8[/latex], [latex]8.3[/latex], [latex]9[/latex], [latex]10[/latex], [latex]10[/latex], [latex]11.5[/latex]
The first quartile is two, the median is seven, and the third quartile is nine. The smallest value is one, and the largest value is [latex]11.5[/latex]. The following image shows the constructed box plot.
Note
See the calculator instructions on the TI web site.
The two whiskers extend from the first quartile to the smallest value and from the third quartile to the largest value. The median is shown with a dashed line.
Note
It is important to start a box plot with a scaled number line. Otherwise the box plot may not be useful.
Example
The following data are the heights of [latex]40[/latex] students in a statistics class.
[latex]59[/latex]; [latex]60[/latex]; [latex]61[/latex]; [latex]62[/latex]; [latex]62[/latex]; [latex]63[/latex]; [latex]63[/latex]; [latex]64[/latex]; [latex]64[/latex]; [latex]64[/latex]; [latex]65[/latex]; [latex]65[/latex]; [latex]65[/latex]; [latex]65[/latex]; [latex]65[/latex]; [latex]65[/latex]; [latex]65[/latex]; [latex]65[/latex]; [latex]65[/latex]; [latex]66[/latex]; [latex]66[/latex]; [latex]67[/latex]; [latex]67[/latex]; [latex]68[/latex]; [latex]68[/latex]; [latex]69[/latex]; [latex]70[/latex]; [latex]70[/latex]; [latex]70[/latex]; [latex]70[/latex]; [latex]70[/latex]; [latex]71[/latex]; [latex]71[/latex]; [latex]72[/latex]; [latex]72[/latex]; [latex]73[/latex]; [latex]74[/latex]; [latex]74[/latex]; [latex]75[/latex]; [latex]77[/latex]
Construct a box plot with the following properties; the calculator instructions for the minimum and maximum values as well as the quartiles follow the example.
- Minimum value = [latex]59[/latex]
- Maximum value = [latex]77[/latex]
- Q1: First quartile = [latex]64.5[/latex]
- Q2: Second quartile or median= [latex]66[/latex]
- Q3: Third quartile = [latex]70[/latex]
- Each quarter has approximately [latex]25[/latex]% of the data.
- The spreads of the four quarters are [latex]64.5 – 59 = 5.5[/latex] (first quarter), [latex]66 – 64.5 = 1.5[/latex] (second quarter), [latex]70 – 66 = 4[/latex] (third quarter), and [latex]77 – 70 = 7[/latex] (fourth quarter). So, the second quarter has the smallest spread and the fourth quarter has the largest spread.
- Range = maximum value – the minimum value = 77 – 59 = 18
- Interquartile Range: [latex]IQR[/latex] = [latex]Q_3[/latex] – [latex]Q_1[/latex] = [latex]70 – 64.5 = 5.5[/latex].
- The interval [latex]59–65[/latex] has more than [latex]25[/latex]% of the data so it has more data in it than the interval [latex]66[/latex] through [latex]70[/latex] which has [latex]25[/latex]% of the data.
- The middle [latex]50[/latex]% (middle half) of the data has a range of [latex]5.5[/latex] inches.
USING THE TI-83, 83+, 84, 84+ CALCULATOR
To find the minimum, maximum, and quartiles:
Enter data into the list editor (Pres STAT 1:EDIT). If you need to clear the list, arrow up to the name L1, press CLEAR, and then arrow down.
Put the data values into the list L1.
Press STAT and arrow to CALC. Press 1:1-VarStats. Enter L1.
Press ENTER.
Use the down and up arrow keys to scroll.
Smallest value = [latex]59[/latex].
Largest value = [latex]77[/latex].
[latex]Q_1[/latex]: First quartile = [latex]64.5[/latex].
[latex]Q_2[/latex]: Second quartile or median = [latex]66[/latex].
[latex]Q_3[/latex]: Third quartile = [latex]70[/latex].
To construct the box plot:
Press 4:Plotsoff. Press ENTER.
Arrow down and then use the right arrow key to go to the fifth picture, which is the box plot. Press ENTER.
Arrow down to Xlist: Press 2nd 1 for L1
Arrow down to Freq: Press ALPHA. Press 1.
Press Zoom. Press 9: ZoomStat.
Press TRACE, and use the arrow keys to examine the box plot.
Try It
The following data are the number of pages in [latex]40[/latex] books on a shelf. Construct a box plot using a graphing calculator, and state the interquartile range.
[latex]136[/latex]; [latex]140[/latex]; [latex]178[/latex]; [latex]190[/latex]; [latex]205[/latex]; [latex]215[/latex]; [latex]217[/latex]; [latex]218[/latex]; [latex]232[/latex]; [latex]234[/latex]; [latex]240[/latex]; [latex]255[/latex]; [latex]270[/latex]; [latex]275[/latex]; [latex]290[/latex]; [latex]301[/latex]; [latex]303[/latex]; [latex]315[/latex]; [latex]317[/latex]; [latex]318[/latex]; [latex]326[/latex]; [latex]333[/latex]; [latex]343[/latex]; [latex]349[/latex]; [latex]360[/latex]; [latex]369[/latex]; [latex]377[/latex]; [latex]388[/latex]; [latex]391[/latex]; [latex]392[/latex]; [latex]398[/latex]; [latex]400[/latex]; [latex]402[/latex]; [latex]405[/latex]; [latex]408[/latex]; [latex]422[/latex]; [latex]429[/latex]; [latex]450[/latex]; [latex]475[/latex]; [latex]512[/latex]
[reveal-answer q=”124075″]Show Solution[/reveal-answer]
[hidden-answer a=”124075″]
[latex]IQR[/latex] = [latex]158[/latex]
[/hidden-answer]
This video explains what descriptive statistics are needed to create a box and whisker plot.
For some sets of data, some of the largest value, smallest value, first quartile, median, and third quartile may be the same. For instance, you might have a data set in which the median and the third quartile are the same. In this case, the diagram would not have a dotted line inside the box displaying the median. The right side of the box would display both the third quartile and the median. For example, if the smallest value and the first quartile were both one, the median and the third quartile were both five, and the largest value was seven, the box plot would look like:
In this case, at least [latex]25[/latex]% of the values are equal to one. Twenty-five percent of the values are between one and five, inclusive. At least [latex]25[/latex]% of the values are equal to five. The top [latex]25[/latex]% of the values fall between five and seven, inclusive.
Example
Test scores for a college statistics class held during the day are:
[latex]99[/latex]; [latex]56[/latex]; [latex]78[/latex]; [latex]55.5[/latex]; [latex]32[/latex]; [latex]90[/latex]; [latex]80[/latex]; [latex]81[/latex]; [latex]56[/latex]; [latex]59[/latex]; [latex]45[/latex]; [latex]77[/latex]; [latex]84.5[/latex]; [latex]84[/latex]; [latex]70[/latex]; [latex]72[/latex]; [latex]68[/latex]; [latex]32[/latex]; [latex]79[/latex]; [latex]90[/latex]
Test scores for a college statistics class held during the evening are:
[latex]98[/latex]; [latex]78[/latex]; [latex]68[/latex]; [latex]83[/latex]; [latex]81[/latex]; [latex]89[/latex]; [latex]88[/latex]; [latex]76[/latex]; [latex]65[/latex]; [latex]45[/latex]; [latex]98[/latex]; [latex]90[/latex]; [latex]80[/latex]; [latex]84.5[/latex]; [latex]85[/latex]; [latex]79[/latex]; [latex]78[/latex]; [latex]98[/latex]; [latex]90[/latex]; [latex]79[/latex]; [latex]81[/latex]; [latex]25.5[/latex]
- Find the smallest and largest values, the median, and the first and third quartile for the day class.
- Find the smallest and largest values, the median, and the first and third quartile for the night class.
- For each data set, what percentage of the data is between the smallest value and the first quartile? the first quartile and the median? the median and the third quartile? the third quartile and the largest value? What percentage of the data is between the first quartile and the largest value?
- Create a box plot for each set of data. Use one number line for both box plots.
- Which box plot has the widest spread for the middle [latex]50[/latex]% of the data (the data between the first and third quartiles)? What does this mean for that set of data in comparison to the other set of data?
[reveal-answer q=”124076″]Show Solution[/reveal-answer]
[hidden-answer a=”124076″]
Solution:
- Min = [latex]32[/latex]
- [latex]Q_1[/latex] = [latex]56[/latex]
- [latex]M[/latex] = [latex]74.5[/latex]
- [latex]Q_3[/latex] = [latex]82.5[/latex]
- Max = [latex]99[/latex]
- Min = [latex]25.5[/latex]
- [latex]Q_1[/latex] = [latex]78[/latex]
- [latex]M[/latex] = [latex]81[/latex]
- [latex]Q_3[/latex] = [latex]89[/latex]
- Max = [latex]98[/latex]
- Day class: There are six data values ranging from [latex]32[/latex] to [latex]56[/latex]: [latex]30[/latex]%. There are six data values ranging from [latex]56[/latex] to [latex]74.5[/latex]: [latex]30[/latex]%. There are five data values ranging from [latex]74.5[/latex] to [latex]82.5[/latex]: [latex]25[/latex]%. There are five data values ranging from [latex]82.5[/latex] to [latex]99[/latex]: [latex]25[/latex]%. There are [latex]16[/latex] data values between the first quartile, [latex]56[/latex], and the largest value, [latex]99[/latex]: [latex]75[/latex]%. Night class:
- The first data set has the wider spread for the middle [latex]50[/latex]% of the data. The [latex]IQR[/latex] for the first data set is greater than the [latex]IQR[/latex] for the second set. This means that there is more variability in the middle [latex]50[/latex]% of the first data set.
[/hidden-answer]
Try It
The following data set shows the heights in inches for the boys in a class of [latex]40[/latex] students.
[latex]66[/latex]; [latex]66[/latex]; [latex]67[/latex]; [latex]67[/latex]; [latex]68[/latex]; [latex]68[/latex]; [latex]68[/latex]; [latex]68[/latex]; [latex]68[/latex]; [latex]69[/latex]; [latex]69[/latex]; [latex]69[/latex]; [latex]70[/latex]; [latex]71[/latex]; [latex]72[/latex]; [latex]72[/latex]; [latex]72[/latex]; [latex]73[/latex]; [latex]73[/latex]; [latex]74[/latex]
The following data set shows the heights in inches for the girls in a class of [latex]40[/latex] students.
[latex]61[/latex]; [latex]61[/latex]; [latex]62[/latex]; [latex]62[/latex]; [latex]63[/latex]; [latex]63[/latex]; [latex]63[/latex]; [latex]65[/latex]; [latex]65[/latex]; [latex]65[/latex]; [latex]66[/latex]; [latex]66[/latex]; [latex]66[/latex]; [latex]67[/latex]; [latex]68[/latex]; [latex]68[/latex]; [latex]68[/latex]; [latex]69[/latex]; [latex]69[/latex]; [latex]69[/latex]
Construct a box plot using a graphing calculator for each data set, and state which box plot has the wider spread for the middle [latex]50[/latex]% of the data.
[reveal-answer q=”124077″]Show Solution[/reveal-answer]
[hidden-answer a=”124077″]
[latex]IQR[/latex] for the boys = [latex]4[/latex]
[latex]IQR[/latex] for the girls = [latex]5[/latex]
The box plot for the heights of the girls has the wider spread for the middle [latex]50[/latex]% of the data.
[/hidden-answer]
example
Graph a box-and-whisker plot for the data values shown.
[latex]10[/latex]; [latex]10[/latex]; [latex]10[/latex]; [latex]15[/latex]; [latex]35[/latex]; [latex]75[/latex]; [latex]90[/latex]; [latex]95[/latex]; [latex]100[/latex]; [latex]175[/latex]; [latex]420[/latex]; [latex]490[/latex]; [latex]515[/latex]; [latex]515[/latex]; [latex]790[/latex]
The five numbers used to create a box-and-whisker plot are:
- Min: [latex]10[/latex]
- [latex]Q_1[/latex]: [latex]15[/latex]
- Med: [latex]95[/latex]
- [latex]Q_3[/latex]: [latex]490[/latex]
- Max: [latex]790[/latex]
The following graph shows the box-and-whisker plot.
Try It
Follow the steps you used to graph a box-and-whisker plot for the data values shown.
[latex]0[/latex]; [latex]5[/latex]; [latex]5[/latex]; [latex]15[/latex]; [latex]30[/latex]; [latex]30[/latex]; [latex]45[/latex]; [latex]50[/latex]; [latex]50[/latex]; [latex]60[/latex]; [latex]75[/latex]; [latex]110[/latex]; [latex]140[/latex]; [latex]240[/latex]; [latex]330[/latex]
[reveal-answer q=”124078″]Show Solution[/reveal-answer]
[hidden-answer a=”124078″]
The data are in order from least to greatest. There are [latex]15[/latex] values, so the eighth number in order is the median: [latex]50[/latex]. There are seven data values written to the left of the median and [latex]7[/latex] values to the right. The five values that are used to create the boxplot are:
- Min: [latex]0[/latex]
- [latex]Q_1[/latex]: [latex]15[/latex]
- Med: [latex]50[/latex]
- [latex]Q_3[/latex]: [latex]110[/latex]
- Max: [latex]330[/latex]
[/hidden-answer]
Concept Review
Box plots are a type of graph that can help visually organize data. To graph a box plot the following data points must be calculated: the minimum value, the first quartile, the median, the third quartile, and the maximum value. Once the box plot is graphed, you can display and compare distributions of data.
References
Data from West Magazine.
Additional Resources
Use the online imathAS box plot tool to create box and whisker plots.