Keynote Talk

Detection and Localization of Sound Events

Prof. Tuomas Virtanen, Tampere University

Abstract:  With the emergence of advanced machine learning techniques and large-scale datasets, holistic analysis of realistic soundscapes becomes more and more appealing. In the case of everyday soundscapes this can mean not only recognizing what are the sounds present in an acoustic scene, but also where they are located and when they occur. This talk will discuss the task of joint detection and localization of sound events addressing the above problem. The state of the art methods typically use spectral representations and deep neural networks based on convolutional, recurrent, and attention layers that share many similarities to neighboring fields. However, the task also has several unique challenges, which will require specific solutions. We will give an overview of the task setup for training machine learning models, acoustic features for representing multichannel signals, topologies of deep neural networks, and loss functions for training systems. Since the performance of the methods is heavily based on the training data used, we will also discuss datasets that can be used for the development of methods and their preparation. We will discuss the recent DCASE evaluation campaign tasks that addressed the problem of joint detection and localization of sound events.
 
Biography:  Tuomas Virtanen is Professor at Tampere University, Finland, where he is leading the Audio Research Group. He received the M.Sc. and Doctor of Science degrees in information technology from Tampere University of Technology in 2001 and 2006, respectively. He has also been working as a research associate at Cambridge University Engineering Department, UK. He is known for his pioneering work on single-channel sound source separation using non-negative matrix factorization based techniques, and their application to noise-robust speech recognition and music content analysis. Recently he has done significant contributions to sound event detection in everyday environments. In addition to the above topics, his research interests include content analysis of audio signals in general and machine learning. He has authored more than 200 scientific publications on the above topics, which have been cited more than 14000 times. He has received the IEEE Signal Processing Society 2012 best paper award for his article “Monaural Sound Source Separation by Nonnegative Matrix Factorization with Temporal Continuity and Sparseness Criteria” as well as several other best paper awards. He is an IEEE Fellow, and the recipient of the ERC 2014 Starting Grant “Computational Analysis of Everyday Soundscapes”.