Pin it!
Google Plus

State Names

  • Lesson
Data Analysis and Probability
Samuel E. Zordak
Location: unknown

Using multiple representations, students analyze the frequency of letters that occur in the names of all 50 states.

Say to students, "Spell the names of all 50 states." Before they get too far in writing all the state names, ask the following questions as an introduction to this lesson:

  • Which letter will you use most? Which letter will you use least?
  • Will you use every letter of the alphabet? Are there any letters that you will not use at all?
  • Which state name has the most letters?

Allow students to speculate answers to each of these questions, and have them justify their guesses.

Display the names of all 50 states. (You can display the names as an alphabetical list, or you can simply display a map of the United States that shows the state names.) Ask, "Could you answer those questions just by looking at all of the names like this?" The point in asking this question is to make students realize that the data needs to be organized in a better way.

Then, have students use the State Names Activity Sheet to identify the frequency of each letter.

pdficon State Names Activity Sheet

Circulate as students work, and observe their process. If students are not using a systematic approach, ask questions such as, "How will your method guarantee that each letter is counted exactly once?" To check student work, the following are the frequencies of each letter when all 50 state names are written:






















































































Allowing students to determine the frequency by hand is valuable for two reasons. First, it gives students practice using a systematic process for organizing information. Second, and more importantly, the tick marks used to keep track of letter frequency will form a representation when the tally is complete; the number of tick marks indicates how often each letter occurs, and the amount of space required to record all of the tick marks gives a visual representation of the relative frequency.

Once the frequency analysis is completed, explain to students that the data can be represented in various ways.

appicon Bar Grapher Tool 

Have students create a bar graph of their data using the Bar Grapher Tool. Students can enter the data they collected into the data box.

Alternatively, depending on the availability of technology and the amount of time you want students to spend entering data, you can copy-and-paste the frequency data below into the Bar Grapher Tool, and display it using a projection device.

61, A
2, B
12, C
11, D
28, E
2, F
8, G
15, H
44, I
1, J
10, K
15, L
14, M
43, N
36, O
4, P
0, Q
22, R
32, S
19, T
8, U
5, V
11, W
2, X
6, Y
1, Z

To display the data in the Bar Grapher, highlight the data above, right-click, and choose "Copy". In the Bar Grapher, press the Clear Data button. Click anywhere inside the data box to activate the cursor, and remove the words "(Enter Text Here)". Then right-click in the data box, and choose "Paste" to place the data. Finally, press Graph Data to display a bar graph. (To get the data to display, you may need to change the maximum and minimum values.)

A correct display of the data will appear as follows:

1889 states bar graph tool 

The benefit of using the Bar Grapher Tool is that it minimizes the possibility of human error. Students will not be overwhelmed by the mechanics of constructing a bar graph; instead, they will be able to see how a bar graph organizes the data, and they can interpret the data once it is in the proper form. In addition, the data can be easily manipulated. For instance, if students wish to create a bar graph of the letter frequencies in the state postal codes, they would just need to change the values associated with each letter and hit Graph Data.

Once the bar graph is created, ask students, "Think about the questions that I asked you earlier. Which questions could you answer easily by seeing the data in this form?" Students should realize that the letters used most and least often are easy to identify, by the heights of the bars.

Explain to students that another way to represent data is using a stem-and-leaf plot. This type of graph divides each piece of data into a stem and a leaf. With a two-digit number, the tens digit is the stem, and the units digit is the leaf; for instance, the stem of 36 is 3, and the leaf is 6. (For larger numbers, the stems and leafs may change. In fact, it is unusual to use the units digit as the leaf if the range of the numbers is more than 100.) All numbers with the same stem are then grouped together.

Work with the class to create a stem-and-leaf plot. One way to do this is to assign each letter A–Z to a different student. Each student is responsible for indicating how her number would then be transferred to the stem-and-leaf plot. To model the process, you might assign a letter to yourself; or, to allow students to help one another, you could assign several letters to each group of students.

The data from the frequency analysis above would be represented in the stem-and-leaf plot shown below:

     0 | 0 1 2 2 2 4 5 6 8 8
1 | 0 1 1 2 4 5 5 9
2 | 2 8 Key: 3 | 6 means 36
3 | 2 6
4 | 3 4
5 |
6 | 1

Again ask the students, "Think about the questions that I asked you earlier. Which questions could you answer easily by seeing the data in a stem-and-leaf plot?" Students should realize that the letters used most and least often are easy to identify. The letter associated with the highest number (61) is A, and the letter associated with the lowest number (0) is Q.

For older students, the data can also be represented as a box-and-whisker plot. Work with the class to identify the five-number summary of the data set. A box-and-whisker plot is a visual representation of these results.

Five-Number Summary 

The five-number summary consists of the upper and lower extremes, the median, and the upper and lower quartile.

  • The upper and lower extremes are the greatest and least numbers that occur in the set.
  • The median is the middle term when the data is arranged from least to greatest.
  • The upper and lower quartiles are the median of the upper and lower halves of the data, respectively. Note that there are various methods used for determining the upper and lower quartiles of a set of data; refer to your local curriculum for the method that should be used with your students. If there is no recommended method for your district or state, then you can use the following process for identifying the upper and lower quartiles:
    • Arrange the data in order from least to greatest, and identify the median.
    • Identify the middle term of each half of the data on either side of the median. These values are the upper and lower extremes.

Example: Consider the set {1, 3, 4, 5, 6, 7, 9}.

  • The lower extreme is 1.
  • The lower half is {1, 3, 4}, and the middle term of that half is 3. Therefore, the lower quartile is 3.
  • The median is the middle term, 5.
  • The upper half is {6, 7, 9}, and the middle term of that half is 7. Therefore, the upper quartile is 7.
  • The upper extreme is 9.

appicon Box Plotter Tool 

To help your students best understand how the five-number summary is converted to a box-and-whisker plot, copy-and-paste the following into the Box Plotter Tool.


To display the data in the Box Plotter, highlight the data above, right-click, and choose "Copy". In the Box Plotter, select "My Data" as the data set. In the data box, remove all of the data that currently appears. Then, right-click in the data-box, and choose "Paste" to place the data. Finally, press Update Boxplot to display the data in a box-and-whisker plot.

1889 box plotter tool 

Return to the questions posed at the beginning of the lesson, and let students use their graphs to answer them:

  • Which letter is used most often? [A.]
  • Are there any letters that are not used at all? [Q.]
  • What state name contains the most letters? [North Carolina, South Carolina, and Massachusetts all contain 13 letters. The state name with the most different letters is New Hampshire, with 11 different letters.]

Note that students will not be able to answer the third question from the representations of data that were created during the lesson. This can lead to a nice discussion about how data is used, and what data is necessary to answer different questions. To answer the question about the state with the most letters, students can use their results from the second assessment option.

Assessment Options

  1. Have students write a paragraph in their math journals about the advantage of organizing data in bar graphs, stem-and-leaf plots, and box-and-whisker plots, as opposed to interpreting raw data. In addition, have them compare bar graphs, stem-and-leaf plots, and box-and-whisker plots, and indicate which representation is most useful.
  2. Allow students to use the number of letters in the state names, state postal codes, or other sets of data to create various representations. (Using the data for the number of letters in each state name will allow students to answer the question, "What state requires the most letters to spell?", which was asked during the lesson but could not be answered using the letter frequency data.)


  1. Allow students to do a frequency analysis by including the Canadian provinces or the Mexican states when collecting data, or repeat the activity for these other countries in North America.
  2. Students can use the State Data Map to investigate other data about states. In addition to considering the number of letters in state names, students can also explore population, number of senators, gasoline usage, and other data sets.
    appicon State Data Map

Questions for Students 

1. Is it possible to determine which letter is used most in the names of all 50 states just by looking at a list of names? What is a better way to determine what letter is used most?

[Count the number of times each letter is used, and organize the data in a table or graph.]

2. What are some of the advantages of using bar graphs, stem-and-leaf plots, and box-and-whisker graphs?

[Bar graphs show the relative amounts of items in the data. Just by looking at the heights of the bars, you can easily determine which items are more common. A stem-and-leaf plot allows you to easily determine where most of the data occurs—at the upper end, the lower end, or in the middle. A box-and-whisker plot allows you to easily see how the data is divided.]

Teacher Reflection 

  • How did technology help or hinder student learning?
  • Did students attain the objectives for this lesson? That is, did students understand that various representations can be used to organize data?
  • Were students enthusiastic about this lesson? If so, what contributed to their enthusiasm? If not, what can be done to get them more enthused the next time this lesson is taught?
  • Was it necessary to adjust the lesson plan while teaching? Why were adjustments necessary?

Learning Objectives

Students will:

  • Determine the number of times that each letter of the alphabet is used when writing the names of all 50 states.
  • Understand how various representations, including steam-and-leaf plots, box-and-whisker plots, and histograms, can be used to organize the data.

NCTM Standards and Expectations

  • Select, create, and use appropriate graphical representations of data, including histograms, box plots, and scatterplots.
  • Collect data using observations, surveys, and experiments.
  • Represent data using tables and graphs such as line plots, bar graphs, and line graphs.
  • Describe the shape and important features of a set of data and compare related data sets, with an emphasis on how the data are distributed.
  • Use measures of center, focusing on the median, and understand what each does and does not indicate about the data set.
  • Compare different representations of the same data and evaluate how well each representation shows important aspects of the data.

Common Core State Standards – Mathematics

Grade 3, Measurement & Data

  • CCSS.Math.Content.3.MD.B.3
    Draw a scaled picture graph and a scaled bar graph to represent a data set with several categories. Solve one- and two-step ''how many more'' and ''how many less'' problems using information presented in scaled bar graphs. For example, draw a bar graph in which each square in the bar graph might represent 5 pets.

Grade 6, Stats & Probability

  • CCSS.Math.Content.6.SP.A.3
    Recognize that a measure of center for a numerical data set summarizes all of its values with a single number, while a measure of variation describes how its values vary with a single number.

Grade 6, Stats & Probability

  • CCSS.Math.Content.6.SP.B.4
    Display numerical data in plots on a number line, including dot plots, histograms, and box plots.

Grade 7, Stats & Probability

  • CCSS.Math.Content.7.SP.C.6
    Approximate the probability of a chance event by collecting data on the chance process that produces it and observing its long-run relative frequency, and predict the approximate relative frequency given the probability. For example, when rolling a number cube 600 times, predict that a 3 or 6 would be rolled roughly 200 times, but probably not exactly 200 times.

Grade 6, Stats & Probability

  • CCSS.Math.Content.6.SP.A.2
    Understand that a set of data collected to answer a statistical question has a distribution which can be described by its center, spread, and overall shape.

Grade 8, Stats & Probability

  • CCSS.Math.Content.8.SP.A.4
    Understand that patterns of association can also be seen in bivariate categorical data by displaying frequencies and relative frequencies in a two-way table. Construct and interpret a two-way table summarizing data on two categorical variables collected from the same subjects. Use relative frequencies calculated for rows or columns to describe possible association between the two variables. For example, collect data from students in your class on whether or not they have a curfew on school nights and whether or not they have assigned chores at home. Is there evidence that those who have a curfew also tend to have chores?