How to Calculate Correlation Coefficient
Understand how to calculate the Pearson correlation coefficient r from scratch. Learn the formula, step-by-step process, and how to interpret the strength and direction of a linear relationship.
What Is the Correlation Coefficient?
The Pearson correlation coefficient r quantifies the strength and direction of a linear relationship between two continuous variables. It ranges from −1 to +1: a value of +1 indicates a perfect positive linear relationship, −1 indicates a perfect negative relationship, and 0 indicates no linear relationship. It is one of the most widely used statistics in data analysis, research, and machine learning.
The Pearson r Formula
The formula is: r = Σ[(xᵢ − x̄)(yᵢ − ȳ)] / √[Σ(xᵢ − x̄)² · Σ(yᵢ − ȳ)²]. The numerator is the sum of the products of each pair's deviations from their respective means (the covariance, unnormalized). The denominator scales that covariance by the product of the two standard deviations, making r unit-free and bounded between −1 and +1.
Step-by-Step Calculation
Step 1: Calculate the mean x̄ of the x-values and ȳ of the y-values. Step 2: For each data pair (xᵢ, yᵢ), compute the deviation from the mean: (xᵢ − x̄) and (yᵢ − ȳ). Step 3: Multiply each pair of deviations together and sum all products — this is the numerator. Step 4: Separately sum the squared x-deviations and the squared y-deviations, multiply those sums, and take the square root — this is the denominator. Step 5: Divide numerator by denominator.
Worked Example
Consider two variables: hours studied (x) = [2, 4, 6, 8] and exam scores (y) = [55, 65, 75, 85]. The means are x̄ = 5 and ȳ = 70. The deviations are (−3, −15), (−1, −5), (1, 5), (3, 15). The numerator Σ[(xᵢ − x̄)(yᵢ − ȳ)] = 45 + 5 + 5 + 45 = 100. Σ(xᵢ − x̄)² = 20, Σ(yᵢ − ȳ)² = 500. So r = 100 / √(20 × 500) = 100 / 100 = 1.0, indicating a perfect positive linear relationship.
Interpreting the Value of r
As a rough guideline: |r| ≥ 0.9 is very strong, 0.7–0.9 is strong, 0.5–0.7 is moderate, 0.3–0.5 is weak, and |r| < 0.3 is negligible. The sign indicates direction: positive r means both variables tend to increase together, negative r means one tends to decrease as the other increases. Always visualize the data in a scatter plot before drawing conclusions, as r only captures linear associations.
Coefficient of Determination (r²)
Squaring r gives r², the coefficient of determination, which represents the proportion of variance in one variable that is predictable from the other. For example, r = 0.8 means r² = 0.64, so 64% of the variation in y is explained by the linear relationship with x. This is a more intuitive measure of practical significance than r itself.
Limitations and Assumptions
Pearson r assumes both variables are continuous and approximately normally distributed, the relationship is linear, and there are no extreme outliers (which can heavily distort r). Correlation does not imply causation — a high r between two variables may reflect a third confounding variable. For ordinal data or non-linear relationships, Spearman's rank correlation is more appropriate.
Try These Calculators
Put what you learned into practice with these free calculators.
Related Guides
How to Calculate Regression Analysis
Learn how to perform simple linear regression analysis. This guide covers the least squares method, calculating slope and intercept, interpreting regression output, and assessing model fit with R².
How to Calculate Variance
Learn how to calculate population and sample variance step by step. Understand the variance formula, why we divide by n−1 for samples, and how variance relates to standard deviation.
How to Calculate Z-Score
Learn how to calculate a z-score step by step. Understand the z-score formula, what it means, and how to use it to compare data points across different distributions.