We are extremely proud to announce that Audio Analysis Lab founding member Assistant Professor Jesper Rindom Jensen has been named Teacher of the Year by the Study Board of Media Technology following multiple nominations by students for his tireless efforts in the Media Technology B.Sc. program and the Sound & Music Computing M.Sc. program where he both teaches courses and supervises student projects. The study board is responsible for the curricula and the quality assurance of the following programs: IT, Communication and New Media, Lighting Design, Medialogy, Service Systems Design, Sound and Music Computing.
Author Archives: mgc
Tutorial by Audio Analysis Lab at Interspeech 2017
We are pleased to announce that the Audio Analysis Lab will be giving a tutorial this year at Interspeech 2017. The tutorial is entitled Statistical Parametric Speech Processing: Solving Problems with the Model-based Approach and covers much of the lab’s research in present and past projects. Interspeech 2017 will be held in beautiful Stockholm, Sweden August 20-24, and the tutorial will be held 9:30-12:00 on August 20. The tutorial will be given by Assistant Professor Jesper Rindom Jensen, Assistant Professor Jesper Kjær Nielsen, and Professor Mads Græsbøll Christensen. You can read more about the tutorial and the other tutorials here and you can sign up at the Interspeech homepage here once the registration opens. Below, you can also find additional information about the tutorial.
Title: Statistical Parametric Speech Processing: Solving Problems with the Model-based Approach
Organizers: Jesper Rindome Jensen, Jesper Kjær Nielsen, and Mads Græsbøll Christensen
Abstract: Parametric speech models have been around for many years but have always had their detractors. Two common arguments against such models are that it is too difficult to find their parameters and that the models do not take the complicated nature of real signals into account. In recent years, significant advances have been made in speech models and robust and computationally efficient estimation using statistical principles, and it has been demonstrated that, regardless of any deficiencies in the model, the parametric methods outperform the more commonly used non-parametric methods (e.g., autocorrelation-based methods) for problems like pitch estimation. The application of these principles, however, extend way beyond that problem. In this tutorial, state-of-the-art parametric speech models and statistical estimators for finding their parameters will be presented and their pros and cons discussed. The merits of the statistical, parametric approach to speech modeling will be demonstrated via a number of number of well-known problems in speech, audio and acoustic signal processing. Examples of such problems are pitch estimation for non-stationary speech, distortion-less speech enhancement, noise statistics estimation, speech segmentation, multi-channel modeling, and model-based localization and beamforming with microphone arrays.
Lab member inducted into the Danish Academy of Technical Sciences
Audio Analysis Lab founder and head Mads Græsbøll Christensen was induceted into the Danish Academy of Technical Sciences (ATV), along with 39 other new members, on April 26 2017 during the annual meeting in Copenhagen. You can read the press release here.
The Danish Academy of Technical Sciences (ATV) is an independent, member-driven think tank. ATV’s vision is that Denmark shall be one of five leading Science and Engineering regions in the world – to the benefit of future generations. In order to achieve this objective, ATV is undertaking a number of activities to the advantage of businesses, knowledge institutions and society as a whole. ATV has 800 members who are research directors, business executives, leading researchers and experts within their field.
YouTube channel launched
In an effort to increase the visibility of the lab and our research, we have launched the Audio Analysis Lab YouTube channel! On the channel, we will post videos about our research, ongoing and past. The videos will be based on our presentations of papers given at conferences, Ph.D. defenses, etc., but willl also include demos. The newly launched channel already features the following videos:
- Estimation of Multi-Pitch Signals Stereophonic Mixtures
- Pitch Estimation for Non-Stationary Speech
- Localization of Sound in Reverberant Environments
- Statistical Parametric Speech Processing
Below you can see one of the videos and you can access the channel here or from the menu on the homepage.
ICASSP 2017
The 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2017 is being held March 5-9, 2017 in New Orleans, USA. As usual, the Audio Analysis Lab is well-represented at the top signal processin conference in the world with the following presentations:
- MODEL BASED BINAURAL ENHANCEMENT OF VOICED AND UNVOICED SPEECH
- LEAST 1-NORM POLE-ZERO MODELING WITH SPARSE DECONVOLUTION FOR SPEECH ANALYSIS
- PITCH-BASED NON-INTRUSIVE OBJECTIVE INTELLIGIBILITY PREDICTION
- DISTRIBUTED MAX-SINR SPEECH ENHANCEMENT WITH AD HOC MICROPHONE ARRAYS
- HARMONIC MINIMUM MEAN SQUARED ERROR FILTERS FOR MULTICHANNEL SPEECH ENHANCEMENT
- ESTIMATION OF MULTIPLE PITCHES IN STEREOPHONIC MIXTURES USING A CODEBOOK-BASED APPROACH
- FAST HARMONIC CHIRP SUMMATION
- GREEDY ALTERNATIVE FOR ROOM GEOMETRY ESTIMATION FROM ACOUSTIC ECHOES: A SUBSPACE-BASED METHOD
Audio Analysis Lab featured in BrainBusiness newsletter
The Audio Analysis Lab and its activities were featured in BrainBusinness’ newsletter in December. The article focuses on the research acitivities in signal processing for hearing aids and our research in voice analysis for diagnosis of Parkinson’s disease. You can read the article here. BrainsBusiness is a unique platform for ICT innovation in North Denmark through the interaction of industry and university and the link to public authorities. The overall aim of BrainsBusiness is to contribute to the North Denmark ICT cluster becoming recognised as one of the most attractive and competitive ICT clusters in Europe.
Open Position as Ph.D. Student in Signal Processing for Sound Zones
At the Faculty of Engineering and Science, as of 1 January 2017 Technical Faculty of IT and Design, Department of Architecture, Design and Media Technology a PhD stipend is available within the general study programme Electrical and Electronic Engineering. The stipend is open for appointment from 1 January 2017 or as soon as possible hereafter.
The position is with the research group Audio Analysis Lab. The PhD student will work on a research project entitled Signal Processing for Sound Zones.
Sound zones are spatially confined regions in which different audio contents can be enjoyed in an acoustic environment. Thus, sound zones replace headphones as a means of creating an individualized listening experience while also allowing for social interaction. There exists many potential applications of this concept, including in home entertainment, museums, car cabin, and at hospitals. They can be created using loudspeaker arrays (a number of loudspeakers organized in a geometry) by altering the phase and amplitude of the loudspeaker signals. The state of the art can, however, typically only achieve an attenuation of 10-15 dB of interfering sounds from other zones (depending on the setup), which means that the interference is clearly audible and annoying. In short, the concept, while promising, does not presently work well enough for most applications. This project aims at making high-quality sound zones feasible via advanced signal processing.
The successful applicant should have a M.Sc. (or equivalent) in engineering within signal processing. Prior experience with audio and acoustic signal processing is a plus but not required. Moreover, the successful applicant should be fluent in English, have strong programming and math skills, and be familiar with MATLAB (or similar tools). The applicant must submit his/her M.Sc. thesis (or a draft thereof) as part of the application. The degree must be completed at the time of the appointment.
The Audio Analysis Lab at Aalborg University conducts basic and applied research in signal processing theory and methods aimed at or involving analysis of audio signals. The research focuses on problems such as compression, analysis, classification, separation, and enhancement of audio signals, as well as localization, identification and tracking using microphone arrays. The lab and its members are currently funded by grants from the Villum Foundation, the Danish Council for Strategic Research, the Danish Council for Independent Research, and Innovations Fund Denmark. The research projects are carried out in close collaboration with leading industrial partners and universities around the world.
You may obtain further information from Professor Mads Græsbøll Christensen, Audio Analysis Lab, Department of Architecture, Design and Media Technology , phone: +45 9940 9793, email: mgc@create.aau.dk concerning the scientific aspects of the stipend.
PhD stipends are allocated to individuals who holds a Master’s degree. PhD stipends are normally for a period of 3 years. It is a prerequisite for allocation of the stipend that the candidate will be enrolled as a PhD student at the Technical Doctoral School of IT and Design, in accordance with the regulations of Ministerial Order No. 1039 of August 27, 2013 on the PhD Programme at the Universities and Certain Higher Artistic Educational Institutions. According to the Ministerial Order, the progress of the PhD student shall be assessed every six months. It is a prerequisite for continuation of salary payment that the previous progress is approved at the time of the evaluation.
The qualifications of the applicant will be assessed by an assessment committee. On the basis of the recommendation of the assessment committee, the Dean of the Faculty of Engineering and Science will make a decision for allocating the stipend.
For further information about stipends and salary as well as practical issues concerning the application procedure contact Ms. Bettina Wedde, The Faculty of Engineering and Science, email: bew@adm.aau.dk, phone: +45 9940 9909.
You can read more and apply at http://www.stillinger.aau.dk/vis-stilling/?vacancy=875102.
Talk by IEEE Distinguished Lecturer Ken Sugiyama
Talks by Augusto Sarti and Roland Badeau
In connection with Sam Karimian-Azari’s Ph.D. defense, the two distinguished external assessment committee members will each give a talk on Wednesday September 28. The talks will take place in room 4.513 at 10:00-12:00.
Title: Plenacoustic Processing and the Ray Space Transform
Speaker: Prof. Augusto Sarti
Abstract: The literature of acoustic signal processing tends to rely on divide-and-conquer strategies derived from Fourier Acoustics, therefore it tends to inherit the limits that such a representation entails, in terms of resolution, frequency and far-field operation. Are there viable alternatives to this choice? In this talk I will first discuss what we can do with Plane-Wave Decomposition (PWD) and the related ray-based representation. I will then introduce the ray space and show how this can help overcoming the inherent limitations of such signal decomposition/processing tools. We will see, however, that more advantages come with rethinking our analysis approach and, in particular, our signal decomposition strategy. This we do by introducing a novel wave-field decomposition methodology based on Gabor frames, which is more suitable for local (in the space-time domain) representations. Based on this new framework for computational acoustics, I will introduce the ray-space transform and show how it can be used for efficiently and effectively approaching a far wider range of problems, ranging from simple source separation; to environment shape inference; to swift object-based manipulation of acoustic wavefields.
Title: Audio source separation: Challenges and recent advances
Speaker: Prof. Roland Badeau
Abstract: The classical problem of blind source separation (BSS) consists in recovering a number of unknown “source” signals from the observation of several “mixture” signals, by only assuming that the source signals are mutually independent. Independent component analysis (ICA) is a classical approach for solving this problem, when the mixture is linear instantaneous and (over-)determined. However, in the field of audio source separation, several challenging issues remain: for instance, the mixture is convolutive because of reverberation, and it is often under-determined and time-varying; source signals are non-stationary, and they often overlap in the time-frequency domain. Therefore audio source separation cannot be performed without exploiting some a priori knowledge about the source signals and about the mixture. In this talk, I will present some past and current investigations carried out at Telecom ParisTech to address these issues. Various parametric and probabilistic models will be introduced, that permit to exploit the information available about the source signals and about the mixtures. An application to score-based separation of musical sources will be presented.
Ph.D. Defense by Sam Karimian-Azari
On Septeber 27 2016, Sam Karimian-Azari will defend his Ph.D. entitled Fundamental Frequency and Direction-of-Arrival Estimation for Multichannel Speech Enhancement at AD:MT, Aalborg University in Rendsburggade 14. The asessment committee is comprised of Assoc. Prof. Kamal Nasrollahi (chairman, AAU), Assoc. Prof. Roland Badeau (Télécom ParisTech), and Assoc. Prof. Augusto Sarti (Politecnico di Milano). He was supervised by Professor Mads Græsbøll Christensen and Assistant Prof. Jesper Rindom Jensen. A small reception will be held after the defense.
Abstract: Audio systems receive the speech signals of interest usually in the presence of noise. The noise has profound impacts on the quality and intelligibility of the speech signals, and it is therefore clear that the noisy signals must be cleaned up before being played back, stored, or analyzed. We can estimate the speech signal of interest from the noisy signals using a priori knowledge about it. A human speech signal is broadband and consists of both voiced and unvoiced parts. The voiced part is quasi-periodic with a time-varying fundamental frequency (or pitch as it is commonly referred to). We consider the periodic signals basically as the sum of harmonics. Therefore, we can pass the noisy signals through bandpass filters centered at the frequencies of the harmonics to enhance the signal. In addition, although the frequencies of the harmonics are the same across the channels of a microphone array, the multichannel periodic signals may have different phases due to the time-differences-of-arrivals (TDOAs) which are related to the direction-of-arrival (DOA) of the impinging sound waves. Hence, the outputs of the array can be steered to the direction of the signal of interest in order to align their time differences which eventually may further reduce the effects of noise. This thesis introduces a number of principles and methods to estimate periodic signals in noisy environments with application to multichannel speech enhancement. We propose model-based signal enhancement concerning the model of periodic signals. Therefore, the parameters of the model must be estimated in advance. The signal of interest is often contaminated by different types of noise that may render many estimation methods suboptimal due to an incorrect white Gaussian noise assumption. We therefore propose robust estimators against the noise and focus on statistical-based and filtering-based methods by imposing distortionless constraints with explicit relations between the parameters of the harmonics. The estimated fundamental frequencies are expected to be continuous over time. Therefore, we concern the time-varying fundamental frequency in the statistical methods in order to lessen the estimation error. We also propose a maximum likelihood DOA estimator concerning the noise statistics and the linear relationship between the TDOAs of the harmonics. The estimators have benefits compared to the state-of-the-art statistical-based methods in colored noise. Evaluations of the estimators comparing with the minimum variance of the deterministic parameters and the other methods confirm that the proposed estimators are statistically efficient in colored noise and computationally simple. Finally, we propose model-based beamformers in multichannel speech signal enhancement by exploiting the estimated fundamental frequency and DOA of the signal of interest. This general framework is tailored to a number of beamformers concerning the spectral and spatial information of the periodic signals which are quasi-stationary in short intervals. Objective measures of speech quality and ineligibility confirm the advantage of the harmonic model-based beamformers over the traditional beamformers, which are non-parametric, and reveal the importance of an accurate estimate of the parameters of the model.