In part 4 you will investigate the centroid of a data set and its significance for the line fitted to the data.
Recall that the idea of a centroid of a set of points comes from the idea of center of mass in physics, where each point has the same mass. In physics, the center of mass is a point whose motion represents the motion of the entire set of points. In data analysis, we don't have masses and motion, but we can think of the centroid as a representative "average" point that represents the entire
set of points. It may not be one of the data points, but it is somewhere in the" middle" of them. It is a "center" about which the data points are scattered.
How can we find such a point? Remember that the data points (x, y) have the x-variable as the first coordinate and the corresponding values of the y-variable as the second coordinate. We can find such a point for a given data set by separating out the x-values from the y-values and thinking about a "central" value for the x-values and, separately, a "central" value for the y-values and then putting these two central values together. The mean is a "central" value for a set of univariate data. So we use the mean to do it.
For example, at the beginning of this i-Math we were interested in the relationship between a person's height and weight. The data points were pairs (x, y) where x was height and y was weight. But we can think of the height measurements as a univariate data set in their own right, and the weight measurements as a univariate data set in their own right.
In the graph below, the x-values of the data points are points on the x-axis and the y-values of the data points are points on the y-axis. You can compute the mean of each of these univariate sets of data, that is, the mean of the x-values (x-bar) (for example, the mean height) and the mean of the y-values (y-bar) (for example,
the mean weight) and locate these values on the x-axis and the y-axis, respectively. Next plot the point (x-bar, y-bar). This is the point we call the centroid of the bivariate data set. Go to Questions.
Finding the Relationship Between the Regression Line and the Centroid
- CLEAR the graph and plot two points
- On your own paper, determine the centroid of these two points.
- On your own paper, compute the midpoint of the two points. Compare this midpoint to the centroid you computed. Explain the connection.
- Click on SHOW CENTROID. Compare the coordinates of the centroid shown to the coordinates you computed in parts (a) and (b). Explain and resolve any differences.
- CLEAR the graph and plot three points.
- On your own paper, determine the centroid of these three points.
- Click on SHOW CENTROID. Confirm that you get the same point as in part (a).
- Click on SHOW LINE. What happens?
- Experiment with several scatterplots to find the relationship between the centroid and the least-squares regression line. Describe this relationship.
Notice: use the link below to go the the applet, rather than scrolling down. Go to the Regression Line Applet
|