Inter-Rater Reliability Calculator Formula

Understand the math behind the inter-rater reliability calculator. Each variable explained with a worked example.

Formulas Used

Cohen's Kappa

kappa = (p_observed - p_expected) / (1 - p_expected)

Observed Agreement

observed_agreement = p_observed * 100

Chance Agreement

chance_agreement = p_expected * 100

Total Observations

total_observations = n_total

Variable	Description	Default
`agree_both_yes`	Both Raters Agree Yes	40
`agree_both_no`	Both Raters Agree No	30
`rater1_yes_rater2_no`	Rater 1 Yes, Rater 2 No	10
`rater1_no_rater2_yes`	Rater 1 No, Rater 2 Yes	20
`n_total`	Derived value`= agree_both_yes + agree_both_no + rater1_yes_rater2_no + rater1_no_rater2_yes`	calculated
`p_observed`	Derived value`= (agree_both_yes + agree_both_no) / n_total`	calculated
`p_yes_r1`	Derived value`= (agree_both_yes + rater1_yes_rater2_no) / n_total`	calculated
`p_yes_r2`	Derived value`= (agree_both_yes + rater1_no_rater2_yes) / n_total`	calculated
`p_no_r1`	Derived value`= 1 - p_yes_r1`	calculated
`p_no_r2`	Derived value`= 1 - p_yes_r2`	calculated
`p_expected`	Derived value`= p_yes_r1 * p_yes_r2 + p_no_r1 * p_no_r2`	calculated

Cohen's Kappa measures inter-rater agreement while correcting for chance. It is more robust than simple percent agreement.

Kappa = (P_observed - P_expected) / (1 - P_expected)

< 0: Less than chance agreement

0.01-0.20: Slight agreement

0.21-0.40: Fair agreement

0.41-0.60: Moderate agreement

0.61-0.80: Substantial agreement

0.81-1.00: Almost perfect agreement

Two graders evaluate 100 essays: both say pass (40), both say fail (30), only Rater 1 passes (10), only Rater 2 passes (20).

agree_both_yes = 40agree_both_no = 30rater1_yes_rater2_no = 10rater1_no_rater2_yes = 20

Always use Kappa when reporting inter-rater reliability in research. Percent agreement inflates reliability by ignoring chance.

Values above 0.60 indicate substantial agreement. For high-stakes decisions, aim for 0.80 or higher.

Cohen's Kappa is designed for two raters. For multiple raters, use Fleiss' Kappa instead.

Ready to run the numbers?