Sarthak Yadav


I am a PhD Fellow at the Department of Electronic Systems, Aalborg University and the Pioneer Center for Artificial Intelligence, Copenhagen. My research focuses on developing and understanding self-supervised audio and speech representations, and I'm advised by Prof. Zheng-Hua Tan, Prof. Lars Kai Hansen and Prof. Sergios Theodoridis.

Previously, after completing my MSc(R) in Computing Science from the University of Glasgow, under the guidance of Prof. Mary Ellen Foster, I worked as a Research Intern with the Speech and Audio Processing Group at the IDIAP Research Institute under the supervision of Dr. Mathew Magimai Doss. I primarily worked on explanability of speech and biosignal based DNNs for emotion recognition.

I also have extensive industrial experience as the ex-Lead Research Engineer at Staqu Technologies, where I led the design and development of several large scale mission-critical intelligent systems, spanning computer vision (for eg. violence recognition,scalable object detection and multispectral geospatial imaging), biometrics (speaker and face) and language understanding (ASR and NMT).

Updates
[06/24] Latest paper on selective SSMs accepted at INTERSPEECH 2024
[01/24] First paper of my PhD accepted at ICLR 2024
[06/23] Awarded DeiC compute grant for access to the LUMI Supercomputer
[10/22] Started as PhD Fellow at ES-AAU/Pioneer Center for AI
[05/22] MSc @ University of Glasgow Done!
[04/22] Started as Research Intern at IDIAP Research Institute

Email  |  CV  |  Google Scholar  |  LinkedIn  |  GitHub  |  Kaggle

Profile Photo
Publications
Audio xLSTMs: Learning Self-Supervised Audio Representations with xLSTMs
Sarthak Yadav, Sergios Theodoridis and Zheng-Hua Tan
Under peer review

Abstract / BibTex / Code

Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations
Sarthak Yadav and Zheng-Hua Tan
INTERSPEECH, 2024

Abstract / BibTex / Code

Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners
Sarthak Yadav, Sergios Theodoridis, Lars Kai Hansen, and Zheng-Hua Tan
Twelfth International Conference on Learning Representations (ICLR), 2024

Abstract / BibTex / Code

Towards learning emotion information from short segments of speech
Tilak Purohit, Sarthak Yadav, Bogdan Vlasenko, S. Pavankumar Dubagunta, and Mathew Magimai Doss
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Abstract / BibTex

Comparing Biosignal and Acoustic feature Representation for Continuous Emotion Recognition
Sarthak Yadav, Tilak Purohit, Zohreh Mostaani, Bogdan Vlasenko and Mathew Magimai Doss
Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge @ The 30th ACM International Conference on Multimedia, 2022

Abstract / BibTex

Learning neural audio features without supervision
Sarthak Yadav, Neil Zeghidour
INTERSPEECH 2022

Abstract / BibTex

Frequency and Temporal Convolutional Attention for Text-Independent Speaker Recognition
Sarthak Yadav, Atul Rai
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020

Abstract / BibTex

Learning Discriminative Features for Speaker Identification and Verification
Sarthak Yadav, Atul Rai
INTERSPEECH, 2018

Abstract / BibTex

Prediction of Ubiquitination Sites Using UbiNets
Sarthak Yadav, Manoj Kumar Gupta, Ankur Singh Bist
Advances in Fuzzy Systems, 2018

Abstract / BibTex



Projects and open-source contributions
Community Contributor, SpeechBrain

As a community contributor, my PyTorch implementation of "LEAF: A Learnable Frontend for Audio Classification", by Zeghidour et al., 2021 was merged in SpeechBrain release v0.5.12.

Code / Paper

Masked Autoencoders in Jax

Jax/Flax implementation of the paper "Masked Autoencoders Are Scalable Vision Learners", by He et al., 2021. Pre-trained models available soon!

Code / Paper

audax: a home for audio ML in Jax

A home for audio ML in Jax. Has common features, popular learnable frontends (eg. SincNet, LEAF), and pretrained supervised and self-supervised (eg. COLA) models. As opposed to popular frameworks, the objective is not to become an end-to-end, end-all-be-all DL framework, but instead to act as a starting point for doing things the jax way, through reference implementations and recipes, using the Jax / Flax / Optax stack.

Code

Raw waveform modelling using the LEAF front-end

PyTorch implementation of the paper "LEAF: A Learnable Frontend for Audio Classification", by Zeghidour et al., 2021. Includes training support on GPU and a single TPU node using torch-xla. Pre-trained models on several datasets released.

Code / Paper

Sound event recognition on FSD50K

PyTorch implementation of the paper "FSD50K: an Open Dataset of Human-Labeled Sound Events", by Fonseca et al., 2020.

Code / Paper


Template credits : Dr. Jon Barron