Principal Components Analysis (PCA)

Dimension reduction by linearly combining correlated variables

Variable-unique variance not partitioned from shared variance

Communalities all set equal to 1 (range is 0 to 1)

(Communality measures the proportion of common (total) variance explained by a single variable)

Used to identify a smaller number of linear combinations that retain the maximum amount of variance of all the variables

Smaller numbers of combinations are called components

The goals are:

- Reduce data to scores on a smaller set of composite variables: PCA should be preferred
- Study latent constructs: PCA should not be preferred

Compute correlation (square) matrix eigenvectors

Eigenvectors (called components)

- First eigenvector, $\boldsymbol{e}_1$, is the linear combination of observed variables that has maximum variance
- Second eigenvector, $\boldsymbol{e} _2$, is the linear combination that is orthogonal to (independent of) $\boldsymbol{e}_1$ and has maximum variance
- $\boldsymbol{e}_3$ is the linear combination that is orthogonal to both $\boldsymbol{e}_1$ and $\boldsymbol{e}_2$ and has maximum variance
- etc

Compute correlation (square) matrix eigenvalues

Eigenvalues

Which variables have approximately the same weight

That is, which are correlated?

Which variables do not have the same weight

That is, which are less correlated?

Example: How do thermal inertia, location, crater diameter, dust coverage, and number of crater layers relate among themselves?