Matrix Decompositions#

This tutorial covers Panther’s randomized matrix decomposition algorithms: CQRRPT and RSVD.

Introduction to Randomized Matrix Decomposition**Mathematical Background**

RSVD computes a low-rank approximation of the SVD: \(A \approx U \Sigma V^T\) where:

  • \(U\) has orthonormal columns

  • \(\Sigma\) is diagonal with singular values

  • \(V\) has orthonormal columns

The algorithm uses random sampling to efficiently compute the dominant singular vectors.

Internal Algorithm Details

RSVD’s implementation uses a sophisticated blocked approach:

  1. Power Iteration Sketching: Creates initial random sketch \(\Omega \in \mathbb{R}^{n \times k}\) and applies alternating power iterations:

    \[\begin{split}\begin{align} Y^{(0)} &= A \Omega \\ Y^{(1)} &= A^T Y^{(0)} \\ Y^{(2)} &= A Y^{(1)} \\ &\vdots \end{align}\end{split}\]

    with periodic orthonormalization for numerical stability

  2. Range Finding: Computes orthonormal basis \(Q\) spanning the approximate range of \(A\):

    \[Y = A \Omega \quad \text{and} \quad Q, R = \text{QR}(Y)\]
  3. Blocked QB Decomposition: Uses adaptive blocking to build factorization \(A \approx Q B\):

    • Processes matrix in blocks of size 64 (or smaller near target rank)

    • For each block, finds range of current residual: \(A_{res} - Q_i B_i\)

    • Re-orthogonalizes against existing \(Q\) to maintain orthogonality

    • Updates residual: \(A_{res} \leftarrow A_{res} - Q_i B_i\)

  4. Adaptive Termination: Monitors approximation error using matrix norms:

    \[\text{error} = \sqrt{|\|A\|_F^2 - \|B\|_F^2|} \cdot \frac{\sqrt{\|A\|_F^2 + \|B\|_F^2}}{\|A\|_F}\]

    Terminates when error stops decreasing or falls below tolerance

  5. Final SVD: Performs exact SVD on the much smaller matrix \(B\):

    \[B = \tilde{U} \Sigma V^T\]

    Then reconstructs: \(U = Q \tilde{U}\)

This blocked approach is more memory-efficient and numerically stable than standard randomized SVD, especially for matrices where the target rank is a significant fraction of the matrix dimensions.——————————————-

Traditional matrix decompositions like QR and SVD have \(O(n^3)\) complexity for \(n \\times n\) matrices. Randomized methods can achieve near-optimal results with \(O(n^2)\) complexity, making them suitable for large-scale problems.

Panther provides two main randomized decompositions:

  • CQRRPT: CholeskyQR with Randomized column Pivoting for tall matrices

  • RSVD: Randomized Singular Value Decomposition

CholeskyQR with Randomized Pivoting (CQRRPT)#

Mathematical Background

CQRRPT computes a QR decomposition \(A = QR\) where:

  • \(Q\) has orthonormal columns

  • \(R\) is upper triangular

  • Column pivoting improves numerical stability

The algorithm uses random sampling and tournament pivoting to reduce communication costs.

Internal Algorithm Details

CQRRPT’s implementation follows this detailed process:

  1. Random Sketching Phase: First computes \(M_{sk} = S \cdot M\) where \(S\) is a \(d \times m\) sketching matrix with \(d = \lceil \gamma \cdot n \rceil\) (compression ratio)

  2. Pivoted QR on Sketch: Performs QR with column pivoting on the much smaller sketched matrix using LAPACK’s GEQP3 routine:

    \[M_{sk} \Pi = Q_{sk} R_{sk}\]

    where \(\Pi\) is the permutation matrix encoding column pivots

  3. Rank Estimation: Estimates numerical rank by examining diagonal elements of \(R_{sk}\):

    \[k = |\{i : |R_{sk}(i,i)| > \epsilon\}|\]

    with \(\epsilon = 10^{-6}\) for float32, \(10^{-12}\) for float64

  4. Column Selection: Selects the \(k\) most important columns: \(M_k = M[:, \Pi[:k]]\)

  5. Triangular Solve: Solves \(R_{sk}[:k,:k] \cdot M_{pre} = M_k^T\) to get preliminary factorization

  6. Cholesky Refinement: Computes \(G = M_{pre}^T M_{pre}\) and its Cholesky factorization \(G = L L^T\)

  7. Adaptive Rank Refinement: Monitors condition number via Cholesky diagonal:

    \[\text{condition} = \frac{\max(|L_{ii}|)}{\min(|L_{ii}|)} \leq \sqrt{\frac{\epsilon}{u}}\]

    where \(u\) is machine epsilon, stopping when condition becomes too large

  8. Final Factorization: Computes final \(Q_k\) and \(R_k\) using the refined rank

Basic Usage

import torch
import panther as pr

# Create a test matrix
m, n = 1000, 500
A = torch.randn(m, n, dtype=torch.float64)

# Standard CQRRPT
Q, R, P = pr.linalg.cqrrpt(A)

print(f"Original matrix: {A.shape}")
print(f"Q matrix: {Q.shape}")
print(f"R matrix: {R.shape}")
print(f"Permutation: {P.shape}")

# Verify decomposition: A[:, P] ≈ Q @ R
reconstruction_error = torch.norm(A[:, P] - Q @ R)
print(f"Reconstruction error: {reconstruction_error:.2e}")

Customizing CQRRPT Parameters

# CQRRPT with custom parameters
Q, R, P = pr.linalg.cqrrpt(
    A,
    gamma=1.5  # Oversampling parameter
)

# Check orthogonality of Q
orthogonality_error = torch.norm(Q.T @ Q - torch.eye(Q.shape[1], dtype=A.dtype))
print(f"Q orthogonality error: {orthogonality_error:.2e}")

Performance Comparison

import time
import torch
import panther as pr

def compare_qr_methods(matrix_sizes):
    """Compare CQRRPT with standard QR."""
    results = {}

    for m, n in matrix_sizes:
        print(f"\nTesting {m}x{n} matrix:")
        A = torch.randn(m, n, dtype=torch.float64)

        # Standard PyTorch QR
        start_time = time.time()
        Q_torch, R_torch = torch.linalg.qr(A)
        torch_time = time.time() - start_time

        # Panther CQRRPT
        start_time = time.time()
        Q_cqrrpt, R_cqrrpt, P_cqrrpt = pr.linalg.cqrrpt(A)
        cqrrpt_time = time.time() - start_time

        # Compare accuracy
        torch_error = torch.norm(A - Q_torch @ R_torch)
        cqrrpt_error = torch.norm(A[:, P_cqrrpt] - Q_cqrrpt @ R_cqrrpt)

        # Store results
        results[(m, n)] = {
            'torch_time': torch_time,
            'cqrrpt_time': cqrrpt_time,
            'speedup': torch_time / cqrrpt_time,
            'torch_error': torch_error.item(),
            'cqrrpt_error': cqrrpt_error.item()
        }

        # Print results
        print(f"  PyTorch QR: {torch_time:.4f}s, error: {torch_error:.2e}")
        print(f"  CQRRPT:    {cqrrpt_time:.4f}s, error: {cqrrpt_error:.2e}")
        print(f"  Speedup:   {torch_time / cqrrpt_time:.2f}x")

    return results


# Test different matrix sizes
sizes = [(500, 250), (1000, 500), (2000, 1000)]
comparison_results = compare_qr_methods(sizes)

Using CQRRPT for Least Squares

def solve_least_squares_cqrrpt(A, b):
    """Solve least squares problem using CQRRPT."""

    # Compute CQRRPT decomposition
    Q, R, P = pr.linalg.cqrrpt(A)

    # Solve R x = Q^T b for the permuted variables
    QtB = Q.T @ b
    QtB = QtB.unsqueeze(1)
    x_permuted = torch.linalg.solve_triangular(R, QtB, upper=True)

    # Unpermute the solution
    x = torch.zeros_like(x_permuted)
    x[P] = x_permuted

    return x

# Example: Solve linear regression problem
n_samples, n_features = 1000, 200
A = torch.randn(n_samples, n_features, dtype=torch.float64)
x_true = torch.randn(n_features, dtype=torch.float64)
b = A @ x_true + 0.01 * torch.randn(n_samples, dtype=torch.float64)

# Solve using CQRRPT
x_estimated = solve_least_squares_cqrrpt(A, b)

# Compare with true solution
solution_error = torch.norm(x_estimated - x_true)
print(f"Solution error: {solution_error:.4f}")

# Verify residual
residual = torch.norm(A @ x_estimated - b)
print(f"Residual norm: {residual:.4f}")

Randomized SVD (RSVD)#

Mathematical Background

RSVD computes a low-rank approximation of the SVD: \(A \approx U \Sigma V^T\) where:

  • \(U\) has orthonormal columns

  • \(\Sigma\) is diagonal with singular values

  • \(V\) has orthonormal columns

The algorithm uses random sampling to efficiently compute the dominant singular vectors.

Basic RSVD Usage

# Create a low-rank matrix plus noise
m, n, rank = 1000, 800, 50

# Generate low-rank matrix
U_true = torch.randn(m, rank)
V_true = torch.randn(n, rank)
sigma_true = torch.logspace(0, -2, rank)  # Decreasing singular values

A_lowrank = U_true @ torch.diag(sigma_true) @ V_true.T
noise = 0.01 * torch.randn(m, n)
A = A_lowrank + noise

print(f"Matrix shape: {A.shape}")
print(f"True rank: {rank}")

# Compute RSVD
target_rank = 60  # Slightly higher than true rank
U, S, V = pr.linalg.randomized_svd(A, k=target_rank, tol=1e-5)

# Note: RSVD uses blocked QB internally:
# - Processes in blocks of min(64, target_rank)
# - Adaptively determines actual rank based on tolerance
# - May return fewer than target_rank components if early termination occurs

print(f"\\nRSVD results:")
print(f"U shape: {U.shape}")
print(f"S shape: {S.shape}")
print(f"V shape: {V.shape}")

# Reconstruct and check error
A_reconstructed = U @ torch.diag(S) @ V.T
reconstruction_error = torch.norm(A - A_reconstructed)
frobenius_norm = torch.norm(A)

print(f"\\nReconstruction error: {reconstruction_error:.4f}")
print(f"Relative error: {reconstruction_error/frobenius_norm:.6f}")

Advanced RSVD Parameters

# RSVD with custom tolerance for higher accuracy
U, S, V = pr.linalg.randomized_svd(A, k=50, tol=1e-8)

# Analyze singular values
print("Top 10 singular values:")
for i in range(min(10, len(S))):
    print(f"  σ_{i+1}: {S[i]:.6f}")

# Compare with exact SVD (for smaller matrices)
if min(A.shape) <= 500:
    U_exact, S_exact, V_exact = torch.linalg.svd(A, full_matrices=False)

    # Compare singular values
    rank_to_compare = min(len(S), len(S_exact))
    sv_error = torch.norm(S[:rank_to_compare] - S_exact[:rank_to_compare])
    print(f"\\nSingular value error: {sv_error:.2e}")

Adaptive Rank Selection

def adaptive_rsvd(A, energy_threshold=0.99, max_rank=None):
    """Automatically determine rank based on energy threshold."""

    max_rank = max_rank or min(A.shape) // 2

    # Compute RSVD with generous rank
    U, S, V = pr.linalg.randomized_svd(A, k=max_rank, tol=1e-6)

    # Find rank that captures desired energy
    total_energy = torch.sum(S**2)
    cumulative_energy = torch.cumsum(S**2, dim=0)
    energy_ratio = cumulative_energy / total_energy

    # Find first index where energy threshold is exceeded
    rank_needed = torch.argmax((energy_ratio >= energy_threshold).float()) + 1

    print(f"Adaptive rank selection:")
    print(f"  Energy threshold: {energy_threshold}")
    print(f"  Selected rank: {rank_needed}")
    print(f"  Energy captured: {energy_ratio[rank_needed-1]:.6f}")

    # Return truncated decomposition
    return U[:, :rank_needed], S[:rank_needed], V[:, :rank_needed], rank_needed

# Example usage
U_adaptive, S_adaptive, V_adaptive, optimal_rank = adaptive_rsvd(A, energy_threshold=0.95)

A_adaptive = U_adaptive @ torch.diag(S_adaptive) @ V_adaptive.T
adaptive_error = torch.norm(A - A_adaptive)
print(f"Adaptive reconstruction error: {adaptive_error:.4f}")

RSVD for Principal Component Analysis

def rsvd_pca(X, n_components, center_data=True):
 """Perform PCA using randomized SVD."""

 # Center data
 if center_data:
     mean = torch.mean(X, dim=0)
     X_centered = X - mean
 else:
     mean = torch.zeros(X.shape[1])
     X_centered = X

 U, S, V = pr.linalg.randomized_svd(X_centered, k=n_components, tol=1e-6)

 components = V
 explained_variance = S**2 / (X.shape[0] - 1)
 total_variance = torch.sum(torch.var(X_centered, dim=0))
 explained_variance_ratio = explained_variance / total_variance

 X_transformed = X_centered @ components

 return {
     'components': components,
     'explained_variance': explained_variance,
     'explained_variance_ratio': explained_variance_ratio,
     'singular_values': S,
     'mean': mean,
     'X_transformed': X_transformed
 }

# Example: PCA on synthetic data
n_samples, n_features = 1000, 200

# Generate data with known structure
latent_dim = 10
latent_factors = torch.randn(n_samples, latent_dim)
loading_matrix = torch.randn(n_features, latent_dim)
X = latent_factors @ loading_matrix.T + 0.1 * torch.randn(n_samples, n_features)

# Perform RSVD-PCA
pca_result = rsvd_pca(X, n_components=15)

print(f"Data shape: {X.shape}")
print(f"Transformed shape: {pca_result['X_transformed'].shape}")
print(f"Cumulative explained variance: {torch.cumsum(pca_result['explained_variance_ratio'], 0)[:5]}")

Matrix Completion with RSVD

def matrix_completion_rsvd(A_observed, mask, rank, n_iterations=10):
    """Simple matrix completion using iterative RSVD."""

    # Initialize missing entries with column means
    A_filled = A_observed.clone()
    for j in range(A_filled.shape[1]):
        col_mask = mask[:, j]
        if col_mask.sum() > 0:
            col_mean = A_observed[col_mask, j].mean()
            A_filled[~col_mask, j] = col_mean

    for iteration in range(n_iterations):
        # Compute low-rank approximation
        U, S, V = pr.linalg.randomized_svd(A_filled, k=rank, tol=1e-6)
        A_lowrank = U @ torch.diag(S) @ V.T

        # Keep observed entries, update missing ones
        A_filled = torch.where(mask, A_observed, A_lowrank)

        # Compute objective (only on observed entries)
        if iteration % 2 == 0:
            objective = torch.norm((A_observed - A_lowrank)[mask])
            print(f"Iteration {iteration}: Objective = {objective:.6f}")

    return A_filled

# Example: Matrix completion
m, n, true_rank = 200, 150, 10

# Generate low-rank matrix
U_true = torch.randn(m, true_rank)
V_true = torch.randn(n, true_rank)
A_complete = U_true @ V_true.T

# Create random mask (observe 60% of entries)
mask = torch.rand(m, n) < 0.6
A_observed = torch.where(mask, A_complete, torch.tensor(0.0))

print(f"Matrix completion problem:")
print(f"  Matrix size: {A_complete.shape}")
print(f"  True rank: {true_rank}")
print(f"  Observed entries: {mask.sum()}/{mask.numel()} ({100*mask.float().mean():.1f}%)")

# Perform matrix completion
A_completed = matrix_completion_rsvd(A_observed, mask, rank=15, n_iterations=20)

# Evaluate completion quality
test_mask = ~mask
completion_error = torch.norm((A_complete - A_completed)[test_mask])
test_norm = torch.norm(A_complete[test_mask])
relative_error = completion_error / test_norm

print(f"  Completion results:")
print(f"  Test error: {completion_error:.4f}")
print(f"  Relative test error: {relative_error:.6f}")

GPU Acceleration#

Using CUDA for Large Matrices

if torch.cuda.is_available():
    device = torch.device('cuda')

    # Large matrix on GPU
    m, n = 5000, 3000
    A_gpu = torch.randn(m, n, device=device, dtype=torch.float32)

    print(f"GPU matrix shape: {A_gpu.shape}")
    print(f"GPU memory used: {torch.cuda.memory_allocated()/1024**2:.1f} MB")

    # RSVD on GPU
    start_time = time.time()
    U_gpu, S_gpu, V_gpu = pr.linalg.randomized_svd(A_gpu, k=100, tol=1e-5)
    gpu_time = time.time() - start_time

    print(f"GPU RSVD time: {gpu_time:.4f}s")

    # Memory cleanup
    del A_gpu, U_gpu, S_gpu, V_gpu
    torch.cuda.empty_cache()

Memory-Efficient Large Matrix Processing

def chunked_rsvd(A, rank, chunk_size=1000):
    """Process very large matrices in chunks."""

    m, n = A.shape

    if min(m, n) <= chunk_size:
        # Small enough to process directly
        return pr.linalg.randomized_svd(A, k=rank, tol=1e-6)

    # Process in chunks and combine
    if m >= n:
        # Tall matrix: chunk rows
        Q_list = []
        for i in range(0, m, chunk_size):
            end_i = min(i + chunk_size, m)
            A_chunk = A[i:end_i, :]

            # Random projection
            Omega = torch.randn(n, rank + 10)
            Y_chunk = A_chunk @ Omega
            Q_chunk, _ = torch.linalg.qr(Y_chunk)
            Q_list.append(Q_chunk)

        # Combine Q matrices
        Q_combined = torch.cat(Q_list, dim=0)

        # Final RSVD on smaller matrix
        B = Q_combined.T @ A
        U_B, S, V = pr.linalg.randomized_svd(B, k=rank, tol=1e-6)
        U = Q_combined @ U_B

    else:
        # Wide matrix: chunk columns
        # Similar process but transpose operations
        U, S, V = chunked_rsvd(A.T, rank)
        U, V = V.T, U.T

    return U, S, V

Practical Applications#

1. Image Compression

def compress_image_rsvd(image_tensor, compression_ratio=0.1):
    """Compress image using RSVD."""

    # image_tensor should be (height, width, channels)
    h, w, c = image_tensor.shape

    compressed_channels = []
    for channel in range(c):
        # Extract channel
        channel_data = image_tensor[:, :, channel]

        # Determine rank for compression
        max_rank = min(h, w)
        target_rank = int(max_rank * compression_ratio)

        # Apply RSVD
        U, S, V = pr.linalg.randomized_svd(channel_data, k=target_rank, tol=1e-6)

        # Reconstruct channel
        compressed_channel = U @ torch.diag(S) @ V.T
        compressed_channels.append(compressed_channel)

    # Combine channels
    compressed_image = torch.stack(compressed_channels, dim=2)

    # Calculate compression statistics
    original_elements = h * w * c
    compressed_elements = c * target_rank * (h + w + 1)
    actual_ratio = compressed_elements / original_elements

    return compressed_image, actual_ratio, target_rank

# Example usage (requires PIL/torchvision for real images)
# For demonstration, create a synthetic image
h, w, c = 256, 256, 3
synthetic_image = torch.randn(h, w, c)

compressed, ratio, rank = compress_image_rsvd(synthetic_image, compression_ratio=0.2)

print(f"Image compression:")
print(f"  Original size: {h}x{w}x{c}")
print(f"  Compression ratio: {ratio:.3f}")
print(f"  Rank used: {rank}")

compression_error = torch.norm(synthetic_image - compressed)
print(f"  Reconstruction error: {compression_error:.4f}")

2. Dimensionality Reduction Pipeline

class RSVDDimensionalityReducer:
    """Complete dimensionality reduction pipeline using RSVD."""

    def __init__(self, n_components, whiten=False):
        self.n_components = n_components
        self.whiten = whiten
        self.components_ = None
        self.mean_ = None
        self.singular_values_ = None

    def fit(self, X):
        """Fit the model with X."""
        # Center data
        self.mean_ = torch.mean(X, dim=0)
        X_centered = X - self.mean_

        # Compute RSVD
        U, S, V = pr.linalg.randomized_svd(X_centered, k=self.n_components, tol=1e-6)

        self.singular_values_ = S
        self.components_ = V  # Components as columns

        return self

    def transform(self, X):
        """Apply dimensionality reduction to X."""
        X_centered = X - self.mean_
        X_transformed = X_centered @ self.components_

        if self.whiten:
            X_transformed = X_transformed / self.singular_values_

        return X_transformed

    def fit_transform(self, X):
        """Fit model and transform X."""
        return self.fit(X).transform(X)

    def inverse_transform(self, X_transformed):
        """Transform back to original space."""
        if self.whiten:
            X_transformed = X_transformed * self.singular_values_

        X_reconstructed = X_transformed @ self.components_.T + self.mean_
        return X_reconstructed

# Example: Dimensionality reduction on synthetic data
n_samples, n_features = 1000, 500
n_informative = 50

# Generate data with structure
informative_data = torch.randn(n_samples, n_informative)
random_projection = torch.randn(n_informative, n_features)
X = informative_data @ random_projection + 0.1 * torch.randn(n_samples, n_features)

# Apply dimensionality reduction
reducer = RSVDDimensionalityReducer(n_components=100)
X_reduced = reducer.fit_transform(X)
X_reconstructed = reducer.inverse_transform(X_reduced)

print(f"Dimensionality reduction results:")
print(f"  Original shape: {X.shape}")
print(f"  Reduced shape: {X_reduced.shape}")
print(f"  Reconstruction error: {torch.norm(X - X_reconstructed):.4f}")
print(f"  Explained variance: {torch.sum(reducer.singular_values_**2):.2f}")

Computational Complexity and Algorithm Comparison#

Complexity Analysis

Both algorithms achieve significant computational savings over classical methods:

Complexity Comparison#

Algorithm

Classical

CQRRPT

RSVD

QR Decomposition

\(O(mn^2)\)

\(O(mnd + d^3)\)

N/A

SVD

\(O(mn \min(m,n))\)

N/A

\(O(mnk + k^3)\)

Memory

\(O(mn)\)

\(O(mn + dn)\)

\(O(mn + mk)\)

Where: - \(m, n\): matrix dimensions - \(d\): sketch size (\(d \approx 1.2n\) to \(2n\) for CQRRPT) - \(k\): target rank (typically \(k \ll \min(m,n)\))

Key Algorithmic Differences

  1. CQRRPT Features:

    • Exact decomposition with column pivoting for numerical stability

    • Adaptive rank detection using Cholesky-based condition monitoring

    • Randomized sketching design reduces computational complexity

    • Best for: Full-rank problems, least squares, when exact QR is needed

  2. RSVD Features:

    • Low-rank approximation optimized for dominant singular vectors

    • Blocked processing with adaptive error monitoring

    • Power iterations enhance accuracy for well-separated singular values

    • Best for: Dimensionality reduction, PCA, matrix completion

Choosing the Right Algorithm

def choose_algorithm(A, use_case):
    """Guide for selecting between CQRRPT and RSVD."""
    m, n = A.shape

    if use_case == "least_squares":
        # CQRRPT provides exact QR with pivoting
        return "CQRRPT", "Exact solution with numerical stability"

    elif use_case == "dimensionality_reduction":
        # RSVD excels at low-rank approximations
        return "RSVD", "Efficient low-rank approximation"

    elif use_case == "matrix_completion":
        # RSVD's iterative nature suits completion algorithms
        return "RSVD", "Low-rank structure assumption"

    elif use_case == "numerical_rank":
        # CQRRPT's rank detection is more reliable
        return "CQRRPT", "Robust rank estimation via pivoting"

    # Matrix shape considerations
    if min(m, n) < 1000:
        return "Classical", "Small matrices: use standard algorithms"
    elif m >> n or n >> m:
        return "CQRRPT", "Rectangular matrices benefit from sketching"
    else:
        return "RSVD", "Square-ish matrices: target rank matters"

# Example usage
A = torch.randn(2000, 800)
for use_case in ["least_squares", "dimensionality_reduction", "matrix_completion"]:
    alg, reason = choose_algorithm(A, use_case)
    print(f"{use_case}: Use {alg} - {reason}")

Performance Tips and Best Practices#

1. Choosing Parameters

def recommend_parameters(matrix_shape, target_rank):
    """Recommend RSVD parameters based on matrix properties."""
    m, n = matrix_shape
    min_dim = min(m, n)

    # Tolerance recommendations based on target rank
    if target_rank < min_dim * 0.1:
        tolerance = 1e-8  # High accuracy for low-rank approximations
    elif target_rank < min_dim * 0.5:
        tolerance = 1e-6  # Balanced accuracy
    else:
        tolerance = 1e-4  # Faster computation for high-rank

    return {
        'k': target_rank,
        'tol': tolerance
    }

2. Error Analysis

def analyze_decomposition_error(A, U, S, Vt, rank_range=None):
    """Analyze approximation error vs rank."""

    if rank_range is None:
        rank_range = range(1, len(S) + 1, max(1, len(S) // 20))

    errors = []
    for r in rank_range:
        A_approx = U[:, :r] @ torch.diag(S[:r]) @ V[:r, :].T
        error = torch.norm(A - A_approx)
        errors.append(error.item())

    return list(rank_range), errors

3. Numerical Stability Analysis

def numerical_stability_test():
    """Compare numerical stability of CQRRPT vs RSVD."""

    # Create ill-conditioned matrix
    m, n = 500, 300
    U_cond = torch.randn(m, n)
    # Create exponentially decaying singular values
    singular_values = torch.logspace(0, -10, n)  # condition number ≈ 10^10
    V_cond = torch.randn(n, n)

    A_ill = U_cond @ torch.diag(singular_values) @ V_cond.T

    print(f"Matrix condition number: {torch.linalg.cond(A_ill):.2e}")

    # Test CQRRPT stability
    Q_cqrrpt, R_cqrrpt, P_cqrrpt = pr.linalg.cqrrpt(A_ill)
    cqrrpt_ortho = torch.norm(Q_cqrrpt.T @ Q_cqrrpt - torch.eye(Q_cqrrpt.shape[1]))
    cqrrpt_recon = torch.norm(A_ill[:, P_cqrrpt] - Q_cqrrpt @ R_cqrrpt)

    # Test RSVD stability
    k = min(100, n-10)  # Conservative rank
    U_rsvd, S_rsvd, V_rsvd = pr.linalg.randomized_svd(A_ill, k=k, tol=1e-8)
    rsvd_ortho_u = torch.norm(U_rsvd.T @ U_rsvd - torch.eye(U_rsvd.shape[1]))
    rsvd_ortho_v = torch.norm(V_rsvd.T @ V_rsvd - torch.eye(V_rsvd.shape[1]))
    rsvd_recon = torch.norm(A_ill - U_rsvd @ torch.diag(S_rsvd) @ V_rsvd.T)

    print(f"\\nOrthogonality errors:")
    print(f"  CQRRPT Q^T Q: {cqrrpt_ortho:.2e}")
    print(f"  RSVD U^T U: {rsvd_ortho_u:.2e}")
    print(f"  RSVD V^T V: {rsvd_ortho_v:.2e}")

    print(f"\\nReconstruction errors:")
    print(f"  CQRRPT: {cqrrpt_recon:.2e}")
    print(f"  RSVD: {rsvd_recon:.2e}")

    # Rank detection comparison
    print(f"\\nRank detection:")
    print(f"  CQRRPT effective rank: {Q_cqrrpt.shape[1]}")
    print(f"  RSVD computed rank: {len(S_rsvd)}")
    print(f"  True numerical rank (σ > 1e-12): {(singular_values > 1e-12).sum()}")

numerical_stability_test()

4. Precision and Tolerance Guidelines

def precision_guidelines():
    """Guidelines for choosing tolerances and precision."""

    guidelines = {
        "float32": {
            "cqrrpt_eps": 1e-6,
            "rsvd_tol": 1e-5,
            "use_case": "Memory-constrained, moderate accuracy",
            "max_reliable_condition": 1e6
        },
        "float64": {
            "cqrrpt_eps": 1e-12,
            "rsvd_tol": 1e-8,
            "use_case": "High accuracy requirements",
            "max_reliable_condition": 1e12
        }
    }

    print("Precision Guidelines:")
    print("=" * 50)

    for dtype, params in guidelines.items():
        print(f"\\n{dtype.upper()}:")
        for key, value in params.items():
            print(f"  {key}: {value}")

    print(f"\\nAdaptive tolerance selection:")
    print(f"  For condition number C:")
    print(f"    - Low (C < 1e6): Use default tolerances")
    print(f"    - Medium (1e6 ≤ C < 1e10): Increase tolerance 10x")
    print(f"    - High (C ≥ 1e10): Consider regularization")

precision_guidelines()

This tutorial provides a comprehensive guide to using Panther’s matrix decomposition algorithms. The next tutorial will cover building complete neural networks with sketched layers.