A Python implementation of Robust PCA using Principal Component Pursuit by alternating directions (ADMM). The theory and algorithm are described in: Robust Principal Component Analysis? (Candès et al., 2011)
Robust PCA decomposes a data matrix D into two components:
D = L + S
where:
- L is a low-rank matrix (captures the underlying structure)
- S is a sparse matrix (captures outliers/corruptions)
This is useful when your data has:
- Underlying patterns shared across observations (low-rank structure)
- Sparse corruptions, outliers, or anomalies
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ │ │ │ │ · · │
│ Corrupted │ = │ Low-rank │ + │ · ·│
│ Data │ │ Structure │ │ Sparse │
│ D │ │ L │ │ Outliers S │
└─────────────┘ └─────────────┘ └─────────────┘
from r_pca import RobustPCA, np
# Create data: 3 groups with constant values
data = np.hstack([
np.ones((100, 40)) * 10, # Group 1: all 10s
np.ones((100, 40)) * 20, # Group 2: all 20s
np.ones((100, 40)) * 30 # Group 3: all 30s
])
# Corrupt 5% of entries
mask = np.random.rand(*data.shape) < 0.05
data[mask] = 0
# Decompose into low-rank + sparse
rpca = RobustPCA(data)
L, S = rpca.fit(max_iter=500)
# L recovers the original structure
# S captures the corrupted entriesFull working version of the usage example above, with visualization:
python -m examples.readme_example| Metric | Value |
|---|---|
| ‖D - (L+S)‖_F | ≈ 0.0001 |
| ‖D - (L+S)‖_F / ‖D‖_F | ≈ 10⁻⁸ |
RPCA can separate sparse corruptions (text/watermarks) from the underlying low-rank image structure.
python -m examples.image_watermark_removal| Metric | Value |
|---|---|
| ‖D - (L+S)‖_F | ≈ 0.00001 |
| ‖D - (L+S)‖_F / ‖D‖_F | ≈ 10⁻⁷ |
How it works:
- Natural images have low-rank structure (smooth gradients, repeated patterns)
- Text/watermarks are sparse (only affect a small fraction of pixels)
- L recovers the clean image, S extracts the watermark
Real photographs from the Columbia Object Image Library. 72 views of a rotating object create natural low-rank structure.
python -m examples.coil20_recovery| Metric | Value |
|---|---|
| ‖D - (L+S)‖_F / ‖D‖_F | ≈ 10⁻⁶ |
| ‖L - D_clean‖_F / ‖D_clean‖_F | ≈ 0.13 |
How it works:
- 72 images of the same object from different angles → low-rank matrix
- 5% of pixels corrupted (set to 0) → sparse component
- RPCA recovers the clean images by exploiting multi-view redundancy
Parameters:
D: Input data matrix (numpy array)mu: Augmented Lagrangian parameter (default: auto-computed)lmbda: Sparsity regularization (default:1/sqrt(max(n,m)))
Run the ADMM algorithm to decompose D = L + S.
Returns: (L, S) - the low-rank and sparse components
This implementation follows Algorithm 1 (Principal Component Pursuit by Alternating Directions) from the published paper (Candès et al., 2011).
Note: The arXiv preprint (v1, 2009) contains a different formulation of the algorithm. This code implements the version from the final ACM publication, which has corrected signs in the ADMM updates.
while not converged:
L = svd_threshold(D - S + μ⁻¹Y, μ⁻¹) # Singular value thresholding
S = shrink(D - L + μ⁻¹Y, λμ⁻¹) # Element-wise soft thresholding
Y = Y + μ(D - L - S) # Dual variable updateThis solves the convex optimization problem:
minimize ||L||_* + λ||S||_1
subject to D = L + S
where ||L||_* is the nuclear norm (sum of singular values) and ||S||_1 is the element-wise L1 norm.
μ = (n × m) / (4 × ||D||_1)— controls convergence rateλ = 1 / √(max(n, m))— balances low-rank vs sparse (from the paper's theoretical results)
python -m pytest test_r_pca.py -v

