itomath.com

Regression Line $y=mx+b$

Let's revisit the example where we looked at the correlation between time spent studying and grades.
Some of the data points were removed to simplify this example.

Use your GDC to find Pearson't correlation coefficent (to 3 significant figures). $r=$

The mean hours of study $\left(\bar{x}\right)$ for the 10 students is

The mean average test score $\left(\bar{y}\right)$ for the 10 students is

hours

%

Since $r=0.916$, there is a linear correlation between hours of study per week and average test score. Therefore, it is acceptable to draw a line that appoximates the relationship between two variables.

When we drew the line of best fit by eye, we found that different lines could be drawn.

In this section, we will use technology to find the best line through the data.

In the window above, move the two black points to adjust the line.
Notice how the distances from the points to the line change.
Using technology, we can find the line that minimizes these distances. This is called the regression line.
Click here (password: b7$%E%O^) to see how to find the regression line using your GDC.

Use your GDC to find the regression line $mx+b$ (to 3 significant figures) $m=$

(to 3 significant figures) $b=$

So, the equation of the regression line (to 3 significant figures) is $y=$


Graph this line in the Desmos window above to see how close you were.
The gradient $m=2.39$ can be interpreted as meaning:
"for every additional hour of study, a student can expect their grade to increase by %".
The $y$-intercept $y=41.4$ can be interpreted as meaning:
"a student who studies hours per week can expect their grade to be %".


Since there is a strong correlation, it is reasonable to use this equation to estimate other values.
For example, using your equation $y=2.39x+41.4$, estimate the average test score for a student who:

studies 12 hours per week. (to 3 significant figures)

studies 18 hours per week. (to 3 significant figures)

studies 40 hours per week. (to 3 significant figures)

%

%

%

Clearly, the last estimate is unrealistic and unreliable because $x=40$ lies far outside the data points.
This is called .