Large-Scale CDL for Audio Processing

Project Objectives

Development of an innovative Convolutional Dictionary Learning (CDL) framework capable of processing large-scale audio datasets while significantly reducing computational complexity and memory footprint. The framework aims to enable real-time audio processing through efficient sparse coding and dictionary learning algorithms.

Methodology

Linear Operators Framework

Implemented implicit matrix operations using LinearOperators to avoid explicit matrix construction, reducing memory usage from 132GB to 4MB for 2-hour audio signals. This approach enables efficient computation of convolution and correlation operations without storing large matrices.

Iterative Solvers

Developed optimized Conjugate Gradient and BiCGStab solvers for the sparse coding problem, achieving a 17.6x speedup through JIT compilation. The solvers efficiently handle the optimization problem: min½‖Y - D*Z‖₂² + λ‖Z‖₁.

Sparsity Exploitation

Leveraged the high sparsity of activation matrices (99% zeros) through intelligent encoding and sparse operations, enabling efficient storage and computation while maintaining signal quality.

Key Results

98.95%

Compression Rate

51.6 dB

PSNR

105

SI-SDR Score

17.6x

Speed Improvement

Performance Analysis

Systematic benchmarking of solvers (CG vs BiCGStab vs GMRES) under varying conditions
Optimization of hyperparameters: atom length (Td=5-1000), number of atoms (2-20), and sparsity levels
Validation on ESC-50 dataset with state-of-the-art compression results
Successful deployment tests on resource-constrained devices (Raspberry Pi 5)

Technical Skills & Tools

Python Signal Processing Iterative Optimization Object-Oriented Programming NumPy SciPy Memory Optimization Numba

Innovation & Impact

Pioneered a scalable and reproducible solution for audio processing that can be extended to other domains
Enabled real-time processing capabilities through significant memory and computational optimizations
Developed a modular framework that can be easily adapted for various time series applications
Achieved state-of-the-art compression rates while maintaining high signal quality