How to Calculate Regression Analysis
Learn how to perform simple linear regression analysis. This guide covers the least squares method, calculating slope and intercept, interpreting regression output, and assessing model fit with R².
What Is Linear Regression?
Simple linear regression models the relationship between a single predictor variable x and a continuous response variable y using the equation ŷ = β₀ + β₁x, where β₀ is the y-intercept and β₁ is the slope. The goal is to find the line that best fits the data by minimizing the sum of squared residuals (the vertical distances between observed and predicted values). This method is called ordinary least squares (OLS).
Formulas for Slope and Intercept
The least-squares slope is: β₁ = Σ[(xᵢ − x̄)(yᵢ − ȳ)] / Σ(xᵢ − x̄)², which is the covariance of x and y divided by the variance of x. The intercept is: β₀ = ȳ − β₁ · x̄. These two formulas guarantee that the regression line always passes through the point (x̄, ȳ), the centroid of the data.
Step-by-Step Calculation
Step 1: Compute x̄ and ȳ. Step 2: For each pair (xᵢ, yᵢ), compute (xᵢ − x̄), (yᵢ − ȳ), their product, and (xᵢ − x̄)². Step 3: Sum the products to get the numerator of β₁ and sum the squared x-deviations for the denominator. Step 4: Compute β₁ = Σ(xᵢ − x̄)(yᵢ − ȳ) / Σ(xᵢ − x̄)². Step 5: Compute β₀ = ȳ − β₁ · x̄. Step 6: Write the regression equation ŷ = β₀ + β₁x.
Worked Example
Using the data pairs (x, y): (1,3), (2,5), (3,4), (4,7), (5,8): x̄ = 3, ȳ = 5.4. The numerator Σ(xᵢ − x̄)(yᵢ − ȳ) = (−2)(−2.4) + (−1)(−0.4) + (0)(−1.4) + (1)(1.6) + (2)(2.6) = 4.8 + 0.4 + 0 + 1.6 + 5.2 = 12. Σ(xᵢ − x̄)² = 4+1+0+1+4 = 10. So β₁ = 12/10 = 1.2 and β₀ = 5.4 − 1.2(3) = 1.8. The regression equation is ŷ = 1.8 + 1.2x.
Measuring Model Fit with R²
The coefficient of determination R² measures how well the regression line explains the variability in y. It is computed as R² = 1 − (SS_res / SS_tot), where SS_res = Σ(yᵢ − ŷᵢ)² (sum of squared residuals) and SS_tot = Σ(yᵢ − ȳ)² (total sum of squares). R² ranges from 0 to 1; an R² of 0.85 means 85% of the variation in y is explained by the linear model. For simple linear regression, R² equals the square of the Pearson correlation r.
Checking Regression Assumptions
OLS regression requires linearity (the true relationship is linear), independence of observations, homoscedasticity (constant variance of residuals across all levels of x), and normally distributed residuals. Diagnostic plots — residuals vs. fitted values, a Q-Q plot of residuals, and a scale-location plot — help you assess these assumptions. Violations may require transformations (e.g., log of y) or alternative methods like weighted least squares.
Multiple Regression Overview
Multiple linear regression extends the model to include two or more predictors: ŷ = β₀ + β₁x₁ + β₂x₂ + … + βₖxₖ. The coefficients are estimated simultaneously using matrix algebra: β = (XᵀX)⁻¹Xᵀy. Adjusted R² is preferred over R² for multiple regression because it penalizes for adding predictors that do not improve fit. Multicollinearity — high correlation among predictors — can inflate standard errors and make coefficient estimates unstable.
Try These Calculators
Put what you learned into practice with these free calculators.
Related Guides
How to Calculate Correlation Coefficient
Understand how to calculate the Pearson correlation coefficient r from scratch. Learn the formula, step-by-step process, and how to interpret the strength and direction of a linear relationship.
How to Calculate Variance
Learn how to calculate population and sample variance step by step. Understand the variance formula, why we divide by n−1 for samples, and how variance relates to standard deviation.
How to Calculate Confidence Intervals
Step-by-step guide to calculating confidence intervals. Learn when to use z-intervals vs. t-intervals, how to choose a confidence level, and how to interpret the results.