How to Calculate Correlation Coefficient

Understand how to calculate the Pearson correlation coefficient r from scratch. Learn the formula, step-by-step process, and how to interpret the strength and direction of a linear relationship.

What Is the Correlation Coefficient?

The Pearson correlation coefficient r quantifies the strength and direction of a linear relationship between two continuous variables. It ranges from −1 to +1: a value of +1 indicates a perfect positive linear relationship, −1 indicates a perfect negative relationship, and 0 indicates no linear relationship. It is one of the most widely used statistics in data analysis, research, and machine learning.

The Pearson r Formula

The formula is: r = Σ[(xᵢ − x̄)(yᵢ − ȳ)] / √[Σ(xᵢ − x̄)² · Σ(yᵢ − ȳ)²]. The numerator is the sum of the products of each pair's deviations from their respective means (the covariance, unnormalized). The denominator scales that covariance by the product of the two standard deviations, making r unit-free and bounded between −1 and +1.

Step-by-Step Calculation

Step 1: Calculate the mean x̄ of the x-values and ȳ of the y-values. Step 2: For each data pair (xᵢ, yᵢ), compute the deviation from the mean: (xᵢ − x̄) and (yᵢ − ȳ). Step 3: Multiply each pair of deviations together and sum all products — this is the numerator. Step 4: Separately sum the squared x-deviations and the squared y-deviations, multiply those sums, and take the square root — this is the denominator. Step 5: Divide numerator by denominator.

Worked Example

Consider two variables: hours studied (x) = [2, 4, 6, 8] and exam scores (y) = [55, 65, 75, 85]. The means are x̄ = 5 and ȳ = 70. The deviations are (−3, −15), (−1, −5), (1, 5), (3, 15). The numerator Σ[(xᵢ − x̄)(yᵢ − ȳ)] = 45 + 5 + 5 + 45 = 100. Σ(xᵢ − x̄)² = 20, Σ(yᵢ − ȳ)² = 500. So r = 100 / √(20 × 500) = 100 / 100 = 1.0, indicating a perfect positive linear relationship.

Interpreting the Value of r

As a rough guideline: |r| ≥ 0.9 is very strong, 0.7–0.9 is strong, 0.5–0.7 is moderate, 0.3–0.5 is weak, and |r| < 0.3 is negligible. The sign indicates direction: positive r means both variables tend to increase together, negative r means one tends to decrease as the other increases. Always visualize the data in a scatter plot before drawing conclusions, as r only captures linear associations.

Coefficient of Determination (r²)

Squaring r gives r², the coefficient of determination, which represents the proportion of variance in one variable that is predictable from the other. For example, r = 0.8 means r² = 0.64, so 64% of the variation in y is explained by the linear relationship with x. This is a more intuitive measure of practical significance than r itself.

Limitations and Assumptions

Pearson r assumes both variables are continuous and approximately normally distributed, the relationship is linear, and there are no extreme outliers (which can heavily distort r). Correlation does not imply causation — a high r between two variables may reflect a third confounding variable. For ordinal data or non-linear relationships, Spearman's rank correlation is more appropriate.

Try These Calculators

Put what you learned into practice with these free calculators.