Sxx Variance Formula
x <- c(2, 4, 6, 8, 10)
Sxx <- sum((x - mean(x))^2)
print(Sxx) # 40
Most regression functions (like lm() in R or stats.linregress in Python) compute Sxx internally, but knowing it helps you debug or compute statistics manually.
import numpy as np
x = np.array([2, 4, 6, 8, 10])
Sxx = np.sum((x - np.mean(x))**2)
# Or: Sxx = np.sum(x**2) - (np.sum(x)**2)/len(x)
print(Sxx) # 40.0
To avoid rounding errors and reduce computation, Sxx can be expressed in an algebraically equivalent form using the sum of squares and the sum of the data:
[ S_xx = \sum x_i^2 - \frac(\sum x_i)^2n ]
This is often called the shortcut formula for Sxx and is derived as follows:
[ \beginaligned \sum (x_i - \barx)^2 &= \sum (x_i^2 - 2x_i\barx + \barx^2) \ &= \sum x_i^2 - 2\barx\sum x_i + n\barx^2 \endaligned ] Sxx Variance Formula
Since (\sum x_i = n\barx), substitute:
[ = \sum x_i^2 - 2(n\barx)\barx + n\barx^2 = \sum x_i^2 - n\barx^2 ]
And because (\barx = \frac\sum x_in), we have (n\barx^2 = \frac(\sum x_i)^2n). Hence:
[ S_xx = \sum x_i^2 - \frac(\sum x_i)^2n ] x <- c(2, 4, 6, 8, 10) Sxx
This second formula is computationally more efficient, especially when working with large datasets or when only summary statistics are available.
Let’s take a small dataset: ( x = [4, 8, 6, 5, 3] )
Step 1: Find ( n ) and ( \sum x_i ) ( n = 5 ) ( \sum x_i = 4 + 8 + 6 + 5 + 3 = 26 )
Step 2: Find ( \barx ) ( \barx = 26 / 5 = 5.2 ) Most regression functions (like lm() in R or stats
Step 3: Calculate Sxx using definitional method [ \beginaligned & (4-5.2)^2 = (-1.2)^2 = 1.44 \ & (8-5.2)^2 = (2.8)^2 = 7.84 \ & (6-5.2)^2 = (0.8)^2 = 0.64 \ & (5-5.2)^2 = (-0.2)^2 = 0.04 \ & (3-5.2)^2 = (-2.2)^2 = 4.84 \ \endaligned ] Sum: ( 1.44 + 7.84 + 0.64 + 0.04 + 4.84 = 14.8 ) [ S_xx = 14.8 ]
Step 4: Compute variance Sample variance ( s^2 = S_xx / (n-1) = 14.8 / 4 = 3.7 ) Sample standard deviation ( s = \sqrt3.7 \approx 1.9235 )
Step 5: Verify with computational formula ( \sum x_i^2 = 16 + 64 + 36 + 25 + 9 = 150 ) ( (\sum x_i)^2 / n = 26^2 / 5 = 676 / 5 = 135.2 ) ( S_xx = 150 - 135.2 = 14.8 ) ✅
The Sxx variance formula is far more than a notational convenience; it is a fundamental building block in statistical analysis. By quantifying total squared deviation from the mean, Sxx enables the calculation of variance, standard deviation, regression slope estimates, and the precision of those estimates. Its dual forms — the definitional sum of squared differences and the computational shortcut — offer flexibility and numerical stability. Mastery of Sxx is essential for anyone seeking to understand data variability and the mechanics of least squares regression.
import numpy as np
x = [4, 8, 6, 5, 3]
n = len(x)
sum_x = sum(x)
sum_x_sq = sum(xi**2 for xi in x)
Sxx = sum_x_sq - (sum_x**2)/n
variance = Sxx / (n-1)
print(f"Sxx = Sxx, Variance = variance")
While ( S_xx = \sum (x_i - \barx)^2 ) is the definition, it is not always the easiest to compute by hand, especially for large ( n ). Two alternative formulas are computationally more efficient and less prone to rounding error.