In connection with Sam Karimian-Azari’s Ph.D. defense, the two distinguished external assessment committee members will each give a talk on Wednesday September 28. The talks will take place in room 4.513 at 10:00-12:00.
Title: Plenacoustic Processing and the Ray Space Transform
Speaker: Prof. Augusto Sarti
Abstract: The literature of acoustic signal processing tends to rely on divide-and-conquer strategies derived from Fourier Acoustics, therefore it tends to inherit the limits that such a representation entails, in terms of resolution, frequency and far-field operation. Are there viable alternatives to this choice? In this talk I will first discuss what we can do with Plane-Wave Decomposition (PWD) and the related ray-based representation. I will then introduce the ray space and show how this can help overcoming the inherent limitations of such signal decomposition/processing tools. We will see, however, that more advantages come with rethinking our analysis approach and, in particular, our signal decomposition strategy. This we do by introducing a novel wave-field decomposition methodology based on Gabor frames, which is more suitable for local (in the space-time domain) representations. Based on this new framework for computational acoustics, I will introduce the ray-space transform and show how it can be used for efficiently and effectively approaching a far wider range of problems, ranging from simple source separation; to environment shape inference; to swift object-based manipulation of acoustic wavefields.
Title: Audio source separation: Challenges and recent advances
Speaker: Prof. Roland Badeau
Abstract: The classical problem of blind source separation (BSS) consists in recovering a number of unknown “source” signals from the observation of several “mixture” signals, by only assuming that the source signals are mutually independent. Independent component analysis (ICA) is a classical approach for solving this problem, when the mixture is linear instantaneous and (over-)determined. However, in the field of audio source separation, several challenging issues remain: for instance, the mixture is convolutive because of reverberation, and it is often under-determined and time-varying; source signals are non-stationary, and they often overlap in the time-frequency domain. Therefore audio source separation cannot be performed without exploiting some a priori knowledge about the source signals and about the mixture. In this talk, I will present some past and current investigations carried out at Telecom ParisTech to address these issues. Various parametric and probabilistic models will be introduced, that permit to exploit the information available about the source signals and about the mixtures. An application to score-based separation of musical sources will be presented.
On Septeber 27 2016, Sam Karimian-Azari will defend his Ph.D. entitled Fundamental Frequency and Direction-of-Arrival Estimation for Multichannel Speech Enhancement at AD:MT, Aalborg University in Rendsburggade 14. The asessment committee is comprised of Assoc. Prof. Kamal Nasrollahi (chairman, AAU), Assoc. Prof. Roland Badeau (Télécom ParisTech), and Assoc. Prof. Augusto Sarti (Politecnico di Milano). He was supervised by Professor Mads Græsbøll Christensen and Assistant Prof. Jesper Rindom Jensen. A small reception will be held after the defense.
Abstract: Audio systems receive the speech signals of interest usually in the presence of noise. The noise has profound impacts on the quality and intelligibility of the speech signals, and it is therefore clear that the noisy signals must be cleaned up before being played back, stored, or analyzed. We can estimate the speech signal of interest from the noisy signals using a priori knowledge about it. A human speech signal is broadband and consists of both voiced and unvoiced parts. The voiced part is quasi-periodic with a time-varying fundamental frequency (or pitch as it is commonly referred to). We consider the periodic signals basically as the sum of harmonics. Therefore, we can pass the noisy signals through bandpass filters centered at the frequencies of the harmonics to enhance the signal. In addition, although the frequencies of the harmonics are the same across the channels of a microphone array, the multichannel periodic signals may have different phases due to the time-differences-of-arrivals (TDOAs) which are related to the direction-of-arrival (DOA) of the impinging sound waves. Hence, the outputs of the array can be steered to the direction of the signal of interest in order to align their time differences which eventually may further reduce the effects of noise. This thesis introduces a number of principles and methods to estimate periodic signals in noisy environments with application to multichannel speech enhancement. We propose model-based signal enhancement concerning the model of periodic signals. Therefore, the parameters of the model must be estimated in advance. The signal of interest is often contaminated by different types of noise that may render many estimation methods suboptimal due to an incorrect white Gaussian noise assumption. We therefore propose robust estimators against the noise and focus on statistical-based and filtering-based methods by imposing distortionless constraints with explicit relations between the parameters of the harmonics. The estimated fundamental frequencies are expected to be continuous over time. Therefore, we concern the time-varying fundamental frequency in the statistical methods in order to lessen the estimation error. We also propose a maximum likelihood DOA estimator concerning the noise statistics and the linear relationship between the TDOAs of the harmonics. The estimators have benefits compared to the state-of-the-art statistical-based methods in colored noise. Evaluations of the estimators comparing with the minimum variance of the deterministic parameters and the other methods confirm that the proposed estimators are statistically efficient in colored noise and computationally simple. Finally, we propose model-based beamformers in multichannel speech signal enhancement by exploiting the estimated fundamental frequency and DOA of the signal of interest. This general framework is tailored to a number of beamformers concerning the spectral and spatial information of the periodic signals which are quasi-stationary in short intervals. Objective measures of speech quality and ineligibility confirm the advantage of the harmonic model-based beamformers over the traditional beamformers, which are non-parametric, and reveal the importance of an accurate estimate of the parameters of the model.
At The 15th International Workshop on Acoustic Signal Enhancemen 2016 (IWAENC) held September 13-16 in Xi’an, China, Audio Analysis Lab member Prof. Mads Græsbøll Christensen gave a keynote talk about the lab’s work. The slides can be downloaded here. IWAENC is a leading workshop in the signal processing community addressing the problems of acoustic signal processing.
Title: Statistical Parametric Speech Processing
Abstract: Parametric speech models have been around for many years but have always had their detractors. Two common arguments against such models are that it is too difficult to find their parameters and that the models do not take the complicated nature of real signals into account. In recent years, significant advances have been made in speech models and robust estimation using statistical principles, and it has been demonstrated that, regardless of any deficiencies in the model, the parametric methods outperform the more commonly used non-parametric methods (e.g., autocorrelation-based methods) for problems like pitch estimation. In this talk, state-of-the-art parametric speech models and statistical estimators for finding their parameters will be presented and their pros and cons discussed. The merits of the statistical, parametric approach to speech modeling will be demonstrated by showing how otherwise complicated problems can be solved comparably easily this way. Examples of such problems are pitch estimation for non-stationary speech, distortionless speech enhancement, noise statistics estimation, speech segmentation, multi-channel modeling, and model-based localization and beamforming with microphone arrays.
We are happy to announce that as of September 1, there are two newly appointed Assistant Professors in the Audio Analysis Lab.
The first is Jesper Rindom Jensen, who was previously postdoc with the lab. He has been with the lab since its inception and is a founding member. Prior to becoming assistant professor, he held an individual postdoc grant from the Danish Council for Independent Research for three years. Jesper Rindom Jensen has worked on various aspects of audio and acoustic signal problems, including single- and multi-channel noise reduction, beambforming, localization and tracking with microphone arrays, and is appointed as Assistant Professor in Microphone Array Signal Processing.
The second is Jesper Kjær Nielsen, a long-time collaborator with the Audio Analysis Lab, who was previously with Dept. of Electronic Systems. For the past few years, he has has worked on industrial research projects with B&O. He is an expert in statistical methods for signal processing, having worked on a wide range of problems, including sinusoidal parameter estimation, interpolation and extrapolation, pitch estimation, and fast implementations. He joins the lab with an appointment as Assistant Professor in Statistical Signal Processing.
We congratulate them both on their appointments and welcome newcomer Jesper Kjær Nielsen to the lab!
During the award ceremony at EUSIPCO 2016 in Budapest, Hungary, Mads Græsbøll Christensen of the Audio Analysis Lab received the EURASIP Early Career Award for significant contributions to statistical processing of audio and speech signals.
EURASIP Early Career Award is awarded to an outstanding researcher and engineer working within the technical scope of EURASIP at an early or mid-stage of their career whose current work shows not only significant scientific achievements but also high potential to advance scientific knowledge through novel, timely and significant endeavors. This award targets at researchers who are less than forty. It was the first time the award was given.
The Audio Analysis Lab was well represented at this year’s EUSIPCO, which was held in Budapest, Hungary, with the following presentations of papers:
- Multi-Pitch Estimation of Audio Recordings Using a Codebook-Based Approach Martin Weiss Hansen, Jesper Rindom Jensen and Mads Græsbøll Christensen (Aalborg University, Denmark)
- Computational Analysis of a Fast Algorithm for High-order Sparse Linear Prediction Tobias Lindstrøm Jensen (Aalborg University, Denmark); Daniele Giacobello (SONOS, Inc. USA); Toon van Waterschoot (KU Leuven, Belgium); Mads Græsbøll Christensen (Aalborg University, Denmark)
- Ad Hoc Microphone Array Beamforming Using the Primal-Dual Method of Multipliers Vincent Mohammad Tavakoli and Jesper Rindom Jensen (Aalborg University, Denmark); Richard Heusdens (Delft University of Technology, The Netherlands); Jacob Benesty (INRS-EMT, University of Quebec, Canada); Mads Græsbøll Christensen (Aalborg University, Denmark)
- Semi-non-intrusive objective intelligibility measure using spatial filtering in hearing aids Charlotte Sørensen (Aalborg University & GN Resound, Denmark); Jesper Bünsow Boldt (GN ReSound, Denmark); Fredrik Gran (GN Resound, Denmark); Mads Græsbøll Christensen (Aalborg University, Denmark)
- Grid Size Selection for Nonlinear Least-Squares Optimization in Spectral Estimation and Array Processing Jesper Kjær Nielsen (Aalborg University & Bang & Olufsen, Denmark); Tobias Lindstrøm Jensen, Jesper Rindom Jensen, Mads Græsbøll Christensen and Søren Holdt Jensen (Aalborg University, Denmark)
On August 19 2016 the annual Audio Analysis Workshop was held. This year’s edition was co-sponsored by the Audio Analysis Lab’s projects funded by the Danish Council for Independent Research and the Villum Foundation. It featured 14 scientific talks and 2 keynote talks with 18 participants from Lund University, Aalborg Unviversity, Delft University of Technology, GN Resound, and Ashton University. The two keynote talks were on the topic Parkinson’s disease, how it affects the voice, and how it can be detected from the voice. The first keynote talk, entitled Braak´s hypothesis and its impact on research and treatment in Parkinson´s disease was given by neurologist Lorenz Oppel, Aalborg University Hospital. The second keynote talk was given by Dr. Max Little, Ashton University, and was entitled Algorithms for feature extraction in voicebased analysis of Parkinson’s disease. In his talk, Max gave an overview of his many years of research on the topic. The scientific talks were on varied topics, including fast implementations, microphone arrays, music analysis, measurement of speech intelligibility, multi-pitch estimation, sparse approximations, classification of height, weight, and other things from speech, room geometry estimation, and speech enhancement.
On June 2 2016 Professor Mads Græsbøll Christensen of the Audio Analysis Lab gave his inaugural lecture, entitled “Statistical Parametric Speech Processing: Solving Problems with the Model-Based Approach”, at AD:MT, Aalborg University. Below you can see a video recording of the lecture.
Next week, Audio Analysis Lab members will present a number of papers at this year’s installment of ICASSP, which will be held in Shanghai, China:
- EXPERIMENTAL STUDY OF GENERALIZED SUBSPACE FILTERS FOR THE COCKTAIL PARTY SITUATION
- KALMAN FILTER FOR SPEECH ENHANCEMENT IN COCKTAIL PARTY SCENARIOS USING A CODEBOOK-BASED APPROACH
- FAST AND STATISTICALLY EFFICIENT FUNDAMENTAL FREQUENCY ESTIMATION
- DOA ESTIMATION OF AUDIO SOURCES IN REVERBERANT ENVIRONMENTS
- A PARTITIONED APPROACH TO SIGNAL SEPARATION WITH MICROPHONE AD HOC ARRAYS
- VARIABLE SPAN FILTERS FOR SPEECH ENHANCEMENT