I am a postdoctoral research associate at the Program for Applied and Computational Mathematics at Princeton University. My research focuses on signal processing, machine learning, and statistical data analysis. More generally, I'm interested in how to identify and extract discriminative information from signals while remaining invariant to less relevant sources of variability such as translation, frequency-shifting and additive noise. Specific applications include audio classification and for speech and music, classification of biomedical signals, and clustering of cryo-EM images according to molecular state.
Joint Time-Frequency Scattering for Audio Classification
J. Andén, V. Lostanlen, and S. Mallat. 2015 IEEE International Workshop on Machine Learning for Signal Processing, Sept. 17-20, 2015, Boston, USA. (Best Paper Award, 2nd Place) (pdf)
We introduce the joint time-frequency scattering transform which addresses the inadequacy of the standard time scattering transform in representing non-separable time-frequency structure. The joint scattering transform is shown to adequately characterize time-varying filters and frequency modulated excitations. In addition, reconstruction examples illustrate the importance of properly characterizing this time-frequency structure. Finally, experiments on phone segment classification in the TIMIT corpus demonstrate the state-of-the-art performance of the new representation.
Covariance Estimation Using Conjugate Gradient for 3D Classification in Cryo-EM
J. Andén, E. Katsevich, and A. Singer. 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI), pp. 200-204. 2015, New York, USA.
In order to reconstruct different molecular structures from tomographic projections in cryo-EM, these projections must first be placed in different clusters. In order to do this, we estimate the covariance of the three-dimensional voxel structures of the molecules using a least-squares estimator. To solve the resulting large-scale linear system, we apply the conjugate gradient method. The resulting algorithm is fast and achieves state-of-the-art results in both simulated and experimental data.
Deep Scattering Spectrum
J. Andén and S. Mallat. IEEE Transactions on Signal Processing, vol. 62, no. 16, pp. 4114-4128, Aug. 15, 2014. (pdf)
The scattering transform for temporal signals is defined and described in detail. Properties of this transform for audio are illustrated using examples of amplitude modulation and frequency component interference as well as reconstruction of signals from their scattering transforms. For frequency transposition invariance and frequency-warping stability, the separable time and frequency scattering transform is introduced. Finally, state-of-the-art results are obtained using these representations for the problems of musical genre classification and phone identification on the GTZAN and TIMIT datasets, respectively.
The scattering transform applied to fetal heart rate signals is shown to provide meaningful information on subject health by characterizing the multiscale temporal dynamics of the signal through scaling coefficients. Notably, when used to classify a subject as healthy or non-healthy, these coefficients are shown to reduce the false positive rate (number of healthy subjects classified as non-healthy) by almost 50% compared to standard FIGO (International Federation of Gynecology and Obstetrics) guidelines while maintaining a 100% true positive rate (number of non-healthy subjects classified as non-healthy).
In order to judge the similarity of several environmental sounds, the scattering transform is used to define a time-shift invariant metric stable to time-warping deformation. Additional frequency transposition invariance is obtained by applying a second scattering transform along log-frequency. This metric outperforms state-of-the-art methods based on bags-of-frames and dynamic time warping applied to mel-frequency ceptral coefficient (MFCC) or log-spectrogram features.
The constant-Q structure of the mel scale for high frequencies is shown to stabilize mel-based representations to small dilations in the input signal. Since the scattering transform relies similarly on a constant-Q filter bank, it inherits this stability. In addition, a modulated source-filter model is introduced to illustrate how the second-order scattering coefficients capture important timbral information such as attacks, tremolo, vibrato, and chord structure.
This paper introduces the scattering transform in the audio context, extending mel-frequency ceptral coefficients (MFCCs) by recovering the lost high-frequency information due to temporal averaging. Comparing the results to traditional MFCC and Delta-MFCC features, scattering coefficients show a significant improvement on the GTZAN genre classification task. Using the algorithm developed by Irene Waldspurger, reconstructing audio signals from scattering coefficients is described with examples available online.
Together with Laurent Sifre, I have developed the ScatNet toolbox for calculating scattering transforms in MATLAB, complete with visualization and classification pipelines (affine space models and support vector classifiers) for duplicating the results of the above papers. Older MATLAB toolboxes scattering computation and affine space classifiers are available, but are no longer supported.
To speed up computation and reduce memory size, I have introduced some changes to the popular LIBSVM library for support vector machine (SVM) training. The libsvm-compact package extends the library to handle precomputed Gaussian kernels, 32-bit precision, triangular kernels, and multi-core training as well as in-place routines for MATLAB.
My office is in Fine 215.
I can be reached at firstname.lastname@example.org.