This study proposes a new spectral representation called the Zeros of Z-Transform (ZZT), which is an all-zero representation of the z-transform of the signal. In addition, new chirp group delay processing techniques are developed for analysis... Lire la suite
This study proposes a new spectral representation called the Zeros of Z-Transform (ZZT), which is an all-zero representation of the z-transform of the signal. In addition, new chirp group delay processing techniques are developed for analysis of resonances of a signal. The combination of the ZZT representation with the chirp group delay processing algorithms provides a useful domain to study resonance characteristics of source and filter components of speech. Using the two representations, effective algorithms are developed for: source-tract decomposition of speech, glottal flow parameter estimation, formant tracking and feature extraction for speech recognition. The ZZT representation is mainly important for theoretical studies. Studying the ZZT of a signal is essential to be able to develop effective chirp group delay processing methods. Therefore, first the ZZT representation of the source-filter model of speech is studied for providing a theoretical background. We confirm through ZZT representation that anti-causality of the glottal flow signal introduces mixed-phase characteristics in speech signals. The ZZT of windowed speech signals is also studied since windowing cannot be avoided in practical signal processing algorithms and the effect of windowing on ZZT representation is drastic. We show that separate patterns exist in ZZT representations of windowed speech signals for the glottal flow and the vocal tract contributions. A decomposition method for source-tract separation is developed based on these patterns in ZZT. We define chirp group delay as group delay calculated on a circle other than the unit circle in z-plane. The need to compute group delay on a circle other than the unit circle comes from the fact that group delay spectra are often very noisy and cannot be easily processed for formant tracking purposes (the reasons are explained through ZZT representation). In this thesis, we propose methods to avoid such problems by modifying the ZZT of a signal and further computing the chirp group delay spectrum. New algorithms based on processing of the chirp group delay spectrum are developed for formant tracking and feature estimation for speech recognition. The proposed algorithms are compared to state-of-the-art techniques. Equivalent or higher efficiency is obtained for all proposed algorithms. The theoretical parts of the thesis further discuss a mixed-phase model for speech and phase processing problems in detail. Index Terms—spectral representation, source-filter separation, glottal flow estimation, formant tracking, zeros of z-transform, group delay processing, phase processing.
Chapter I: Introduction..................................................................................................................................... 15
I.1. Motivations .......................................................................................................................................... 15
The (hi)story of this study ......................................................................................................................... 15
I.2. Original contributions of the thesis ....................................................................................................... 16
ZZT Representation of signals .................................................................................................................. 16
Chirp group delay processing.................................................................................................................... 16
Applications of ZZT and chirp group delay.............................................................................................. 17
I.3. Plan ...................................................................................................................................................... 17
Chapter II: State-of-the-art............................................................................................................................... 19
II.1. Introduction.......................................................................................................................................... 19
II.2. Glottal flow estimation and voice quality analysis................................................................................ 20
Glottal flow signal estimation methods..................................................................................................... 21
Glottal flow parameter estimation methods .............................................................................................. 22
Applications of glottal flow estimation in voice quality analysis for concatenative TTS......................... 23
II.3. Formant Tracking................................................................................................................................. 25
II.4. Phase Processing of Speech .................................................................................................................. 26
Phase processing in sinusoidal/harmonic modeling .................................................................................. 26
Phase processing in speech perception...................................................................................................... 27
Phase processing in speech analysis.......................................................................................................... 27
Phase processing in automatic speech recognition.................................................................................... 28
FIRST PART SPECTRAL REPRESENTATION OF SPEECH BY ZEROS OF THE Z-TRANSFORM (ZZT) AND CHIRP GROUP DELAY.................... 29
Chapter III: Zeros of the z-transform (ZZT) representation of speech ......................................................... 31
III.1. Introduction...................................................................................................................................... 31
III.2. Definition ......................................................................................................................................... 31
Finding the roots of high degree polynomials........................................................................................... 32
III.3. ZZT representation of speech signals................................................................................................ 33
III.3.1. ZZT of some basic signals ............................................................................................................ 33
ZZT of an exponential time series............................................................................................................. 33
ZZT of a damped sinusoid ........................................................................................................................ 34
III.3.2. ZZT of the glottal flow signal ....................................................................................................... 35
Contribution of the first phase to the ZZT of LF model glottal flow signal.............................................. 36
Contribution of the return phase to the ZZT of the LF model glottal flow signal..................................... 39
III.3.3. ZZT representation and source-filter model of speech.................................................................. 40
III.3.4. ZZT of windowed synthetic speech signals .................................................................................. 42
Effect of window location on ZZT patterns .............................................................................................. 43
Effect of window function on ZZT patterns.............................................................................................. 45
Effect of window size on ZZT patterns..................................................................................................... 46
III.3.5. ZZT of aperiodic components in speech ....................................................................................... 47
III.3.6. Conclusion ................................................................................................................................... 48
Chapter IV: Chirp group delay processing of signals...................................................................................... 49
IV.1. Introduction...................................................................................................................................... 49
IV.2. Methods proposed by Yegnanarayana and Murthy for group delay processing ............................... 50
Terminology............................................................................................................................................. 50
Difficulties in group delay processing....................................................................................................... 53
Processing group delay of the minimum-phase version of a signal .......................................................... 55
Modified group delay function.................................................................................................................. 55
IV.3. Phase processing of mixed-phase signals.......................................................................................... 56
IV.4. Mixed-phase speech model ............................................................................................................... 58
IV.5. Effects of windowing on group delay functions................................................................................ 61
Effects of window location on group delay functions............................................................................... 61
Effects of window size on group delay functions ..................................................................................... 62
Effects of window function on group delay functions .............................................................................. 63
Group delay spectrogram .......................................................................................................................... 64
Conclusion ............................................................................................................................................... 64
IV.6. Chirp group delay processing of speech............................................................................................ 65
Chirp Group Delay of GCI-Synchronously Windowed Speech (CGDGCI)............................................. 67
Chirp Group Delay of The Zero-Phase Version (CGDZP) ....................................................................... 68
IV.7. Conclusion ....................................................................................................................................... 69
SECOND PART APPLICATIONS OF ZZT AND CHIRP GROUP DELAY PROCESSING IN SPEECH ANALYSIS .................... 71
Chapter V: Applications of ZZT and Chirp Group Delay Processing in Speech Analysis .......................... 73
V.1. ZZT-decomposition for source-filter separation of speech ................................................................... 73
V.1.1. The ZZT-decomposition algorithm ............................................................................................... 73
V.1.2. Examples and evaluation of the decomposition algorithm............................................................ 74
Synthetic speech example ......................................................................................................................... 74
Real speech example ................................................................................................................................ 76
Robustness tests ....................................................................................................................................... 78
Robustness to GCI detection errors........................................................................................................... 78
Robustness to F1 variations ...................................................................................................................... 80
Robustness to additive noise and return phase variations ......................................................................... 80
V.1.3. Mixed-phase decomposition using complex cepstrum.................................................................. 81
Links between ZZT and complex cepstrum .............................................................................................. 81
V.1.4. Conclusions.................................................................................................................................. 83
V.2. Application to glottal flow parameter estimation.................................................................................. 83
V.2.1. Testing the Fg estimation algorithm ............................................................................................. 83
Tests with synthetic speech....................................................................................................................... 83
Tests with real speech ............................................................................................................................... 85
V.2.2. Conclusions.................................................................................................................................. 86
V.3. Application to formant tracking ............................................................................................................ 86
V.3.1. Formant tracker – first version ...................................................................................................... 86
V.3.2. Formant tracker – second version (DPPT) .................................................................................... 87
Tests ......................................................................................................................................................... 88
Stimuli...................................................................................................................................................... 88
Results...................................................................................................................................................... 88
Discussion ................................................................................................................................................ 90
V.3.3. Formant tracker – third version (Fast-DPPT)................................................................................ 90
Tests ......................................................................................................................................................... 91
Procedure and Stimuli ............................................................................................................................... 91
Results...................................................................................................................................................... 92
V.4. A Linear Prediction (LP) algorithm to estimate the glottal flow component from speech signals........ 93
V.4.1. The MixLP algorithm.................................................................................................................... 93
Tests ......................................................................................................................................................... 94
Conclusion ............................................................................................................................................... 95
V.5. Application to speech recognition......................................................................................................... 96
V.5.1. Group delay based features ........................................................................................................... 96
Computation of features for ASR.............................................................................................................. 98
V.5.2. ASR experiments .......................................................................................................................... 98
ASR system.............................................................................................................................................. 98
Speech Database ...................................................................................................................................... 99
Experimental Results ............................................................................................................................... 99
V.5.3. Discussion and conclusion .......................................................................................................... 100
Chapter VI: Conclusion and Future Works ................................................................................................... 101
VI.1. Conclusions...................................... ............................................................................................. 101
The ZZT representation and its applications........................................................................................... 101
The chirp group delay (CGD) representation.......................................................................................... 102
Applications of ZZT and CGD................................................................................................................ 102
Other applications studied....................................................................................................................... 103
VI.2. Future works .................................................................................................................................. 103
Appendix A: Window functions...................................................................................................................... 105
Appendix B: Relation between poles and spectral peaks of an all-pole filter .............................................. 107
Appendix C: Formant tracking examples....................................................................................................... 109
Appendix D: Publications not referred in the thesis manuscript.................................................................. 115
References......................................................................................................................................................... 117