Principal Components Analysis (PCA)

Dimension reduction by linearly combining correlated variables
Variable-unique variance not partitioned from shared variance
Communalities all set equal to 1 (range is 0 to 1)
(Communality measures the proportion of common (total) variance explained by a single variable)
Used to identify a smaller number of linear combinations that retain the maximum amount of variance of all the variables
Smaller numbers of combinations are called components
The goals are:
    • Reduce data to scores on a smaller set of composite variables: PCA should be preferred
    • Study latent constructs: PCA should not be preferred
Compute correlation (square) matrix eigenvectors
Eigenvectors (called components)
    • First eigenvector, $\boldsymbol{e}_1$, is the linear combination of observed variables that has maximum variance
    • Second eigenvector, $\boldsymbol{e} _2$, is the linear combination that is orthogonal to (independent of) $\boldsymbol{e}_1$ and has maximum variance
    • $\boldsymbol{e}_3$ is the linear combination that is orthogonal to both $\boldsymbol{e}_1$ and $\boldsymbol{e}_2$ and has maximum variance
    • etc
Compute correlation (square) matrix eigenvalues
Eigenvalues
    • Every eigenvector corresponds to an eigenvalue (a scalar)
    • The first eigenvector, $\boldsymbol{e}_1$, corresponds with the eigenvalue $\lambda_1$
    • An eigenvalue indicates how much observed variance is explained by an eigenvector, so $\lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_p$
    • Thus, the $p$ eigenvalues always sum to $p$
    • Analyzing $\boldsymbol{R}$, so all variables have variance=1
    • Proportion of variance explained by the $k$th component is $\frac{\lambda_k}{\sum_{i=1}^p \lambda_i}$
Which variables have approximately the same weight
That is, which are correlated?
Which variables do not have the same weight
That is, which are less correlated?
Example: How do thermal inertia, location, crater diameter, dust coverage, and number of crater layers relate among themselves?