师资
王中秋,现任南方科技大学计算机科学与工程系副教授、研究员、博士研究生导师。王博士的研究兴趣包括但不限于:
• 语音及音频信号处理
‐ 语音分离(涵盖说话人分离、语音增强、语音去混响)
‐ 麦克风阵列信号处理
‐ 鲁棒语音识别
‐ 助听器设计
‐ 主动降噪
‐ 计算听觉
• 脑机接口
• 海洋声学
• 深度学习
• 人工智能
工作经历
• 2024年7月 ~ 迄 今,计算机科学与工程系 @ 中国南方科技大学,副教授
• 2021年9月 ~ 2024年7月,语言技术研究所 @ 美国卡内基梅隆大学,博士后
• 2020年6月 ~ 2021年8月,语音与音频研究组 @ 美国三菱电机研究院,研究员
• 2019年5月 ~ 2019年8月,音频理解研究组 @ 美国谷歌研究院,研究实习员
• 2017年5月 ~ 2017年8月,语音与音频研究组 @ 美国三菱电机研究院,研究实习员
• 2016年5月 ~ 2016年8月,音频与声学研究组 @ 美国微软研究院,研究实习员
教育经历
• 2013年8月 ~ 2020年5月,计算机科学与工程系 @ 美国俄亥俄州立大学,博士
• 2013年8月 ~ 2017年8月,计算机科学与工程系 @ 美国俄亥俄州立大学,硕士
• 2009年8月 ~ 2013年7月,计算机科学与技术系 @ 中国哈尔滨工业大学,学士
Journal Publications
[13] Z.-Q. Wang, S. Cornell, S. Choi, Y. Lee, B.-Y. Kim, and S. Watanabe, "TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation", in IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP), vol. 31, pp. 3221-3236, 2023. [Sound Demo] [Code]
[12] D. Petermann, G. Wichern, A. Subramanian, Z.-Q. Wang, and J. Le Roux, "Tackling The Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks", in IEEE/ACM TASLP, vol. 31, pp. 2592-2605, 2023.
[11] Z.-Q. Wang, G. Wichern, S. Watanabe, and J. Le Roux, "STFT-Domain Neural Speech Enhancement with Very Low Algorithmic Latency", in IEEE/ACM TASLP, vol. 31, pp. 397-410, 2022.
[10] K. Tan, Z.-Q. Wang, and D.L. Wang, "Neural Spectrospatial Filtering", in IEEE/ACM TASLP, vol. 30, pp. 605-621, 2022.
[9] Z.-Q. Wang, G. Wichern, and J. Le Roux, "Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation", in IEEE/ACM TASLP, vol. 29, pp. 3476-3490, 2021.
[8] Z.-Q. Wang, P. Wang, and D.L. Wang, "Multi-Microphone Complex Spectral Mapping for Utterance-Wise and Continuous Speech Separation", in IEEE/ACM TASLP, vol. 29, pp. 2001-2014, 2021. [Sound Demo]
[7] Z.-Q. Wang*, P. Wang*, and D.L. Wang, "Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR", in IEEE/ACM TASLP, vol. 28, pp. 1778-1787, 2020. [* denotes equal contribution, Sound Demo]
[6] H. Taherian, Z.-Q. Wang, J. Chang, and D.L. Wang, "Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement", in IEEE/ACM TASLP, vol. 28, pp. 1293-1302, 2020.
[5] Z.-Q. Wang and D.L. Wang, "Deep Learning Based Target Cancellation for Speech Dereverberation", in IEEE/ACM TASLP, vol. 28, pp. 941-950, 2020. Data Generation Code
[4] Y. Zhao, Z.-Q. Wang, and D.L. Wang, "Two-Stage Deep Learning for Noisy-Reverberant Speech Enhancement", in IEEE/ACM TASLP, vol. 27, pp. 53-62, 2019.
[3] Z.-Q. Wang and D.L. Wang, "Combining Spectral and Spatial Features for Deep Learning Based Blind Speaker Separation", in IEEE/ACM TASLP, vol. 27, pp. 457-468, 2019.
[2] Z.-Q. Wang, X. Zhang, and D.L. Wang, "Robust Speaker Localization Guided by Deep Learning Based Time-Frequency Masking", in IEEE/ACM TASLP, vol. 27, pp. 178-188, 2019.
[1] Z.-Q. Wang and D.L. Wang, "A Joint Training Framework for Robust Automatic Speech Recognition", in IEEE/ACM TASLP, vol. 24, pp. 796-806, 2016.
Letter Publications
[3] Z.-Q. Wang, "Mixture to Mixture: Leveraging Close-talk Mixtures as Weak-supervision for Speech Separation", in IEEE Signal Processing Letters (IEEE SPL), vol. 31, pp. 1715-1719, 2024. [Sound Demo]
[2] Z.-Q. Wang and S. Watanabe, "Improving Frame-Online Neural Speech Enhancement with Overlapped-Frame Prediction", in IEEE SPL, vol. 29, pp. 1422-1426, 2022.
[1] Z.-Q. Wang, G. Wichern, and J. Le Roux, "On The Compensation Between Magnitude and Phase in Speech Separation", in IEEE SPL, vol. 28, pp. 2018-2022, 2021.
Conference Publications in ML/AI
[2] Z.-Q. Wang, A. Kumar, and S. Watanabe, "Cross-Talk Reduction", in International Joint Conference on Artificial Intelligence (IJCAI), pp. 5171-5180, 2024. [Sound Demo] [Poster]
[1] Z.-Q. Wang and S. Watanabe, "UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures", in Advances in Neural Information Processing Systems (NeurIPS), pp. 34021-34042, 2023. [Sound Demo] [Poster]
Conference Publications in Speech/Audio
[43] S. Wu, C. Wang, H. Chen, Y. Dai, C. Zhang, R. Wang, H. Lan, J. Du, C.-H. Lee, J. Chen, S. Watanabe, S. Siniscalchi, O. Scharenborg, Z.-Q. Wang, J. Pan, and J. Gao, "The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction", in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8351-8355, 2024.
[42] Y. Lee, S. Choi, B.-Y. Kim, Z.-Q. Wang, and S. Watanabe, "Boosting Unknown-Number Speaker Separation with Transformer Decoder-based Attractor", in ICASSP, pp. 446-450, 2024.
[41] K. Saijo, W. Zhang, Z.-Q. Wang, S. Watanabe, T. Kobayashi, and T. Ogawa, "A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction", in IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2023.
[40] W. Zhang, K. Saijo, Z.-Q. Wang, S. Watanabe, and Y. Qian, "Toward Universal Speech Enhancement for Diverse Input Conditions", in ASRU, 2023.
[39] S. Cornell, M. Wiesner, S. Watanabe, D. Raj, X. Chang, P. Garcia, Y. Masuyama, Z.-Q. Wang, S. Squartini, and S. Khudanpur, "The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios", in Proceedings of CHiME Challenge, 2023.
[38] Y. Masuyama, X. Chang, W. Zhang, S. Cornell, Z.-Q. Wang, N. Ono, Y. Qian, and S. Watanabe, "Exploring The Integration of Speech Separation and Recognition with Self-Supervised Learning Representation", in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023.
[37] Z.-Q. Wang, S. Cornell, S. Choi, Y. Lee, B.-Y. Kim, and S. Watanabe, "TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation", in ICASSP, 2023.
[36] Z.-Q. Wang, S. Cornell, S. Choi, Y. Lee, B.-Y. Kim, and S. Watanabe, "Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling", in ICASSP, 2023.
[35] S. Cornell, Z.-Q. Wang, Y. Masuyama, S. Watanabe, M. Pariente, N. Ono, and S. Squartini, "Multi-Channel Speaker Extraction with Adversarial Training: The WAVlab Submission to The Clarity ICASSP 2023 Grand Challenge", in ICASSP, 2023.
[34] S. Cornell, Z.-Q. Wang, Y. Masuyama, S. Watanabe, M. Pariente, and N. Ono, "Multi-Channel Target speaker Extraction with Refinement: The WAVLab Submission to The Second Clarity Enhancement Challenge", in Proceedings of Clarity Challenge, 2022. [Winner (1st/13 submissions) of The 2nd Clarity Enhancement Challenge, challenge description, workshop]
[33] S. Choi, Y. Lee, J. Park, H. Y. Kim, B.-Y. Kim, Z.-Q. Wang, and S. Watanabe, "An Empirical Study of Training Mixture Generation Strategies on Speech Separation: Dynamic Mixing and Augmentation", in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA-ASC), pp. 1071-1076, 2022.
[32] Y.-J. Lu, X. Chang, C. Li, W. Zhang, S. Cornell, Z. Ni, Y. Masuyama, B. Yan, R. Scheibler, Z.-Q. Wang, Y. Tsao, Y. Qian, and S. Watanabe, "ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding", in Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 5458-5462, 2022.
[31] Z.-Q. Wang and D.L. Wang, "Localization Based Sequential Grouping for Continuous Speech Separation", in ICASSP, pp. 281-285, 2022.
[30] Y.-J. Lu, Z.-Q. Wang, S. Watanabe, A. Richard, C. Yu, and Y. Tsao, "Conditional Diffusion Probabilistic Model for Speech Enhancement", in ICASSP, pp. 7402-7402, 2022.
[29] Y.-J. Lu, S. Cornell, X. Chang, W. Zhang, C. Li, Z. Ni, Z.-Q. Wang, and S. Watanabe, "Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPNet-SE Submission to The L3DAS22 Challenge", in ICASSP, pp. 9201-9205, 2022. [Winner (1st/17 teams) of L3DAS22 Speech Enhancement Challenge, challenge rankings] [Code]
[28] D. Petermann, G. Wichern, Z.-Q. Wang, and J. Le Roux, "The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks", in ICASSP, pp. 526-530, 2022.
[27] O. Slizovskaia, G. Wichern, Z.-Q. Wang, and J. Le Roux, "Locate This, Not That: Class-Conditioned Sound Event DOA Estimation", in ICASSP, pp. 711-715, 2022.
[26] Z.-Q. Wang, G. Wichern, and J. Le Roux, "Convolutive Prediction for Reverberant Speech Separation", in WASPAA, pp. 56-60, 2021.
[25] G. Wichern, A. Chakrabarty, Z.-Q. Wang, and J. Le Roux, "Anomalous Sound Detection using Attentive Neural Processes", in WASPAA, pp. 186-190, 2021.
[24] Z.-Q. Wang and D.L. Wang, "Count and Separate: Incorporating Speaker Counting for Continuous Speech Separation", in ICASSP, pp. 11-15, 2021.
[23] Z.-Q. Wang, H. Erdogan, S. Wisdom, K. Wilson, D. Raj, S. Watanabe, Z. Chen, and J. R. Hershey, "Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement", in IEEE Spoken Language Technology Workshop (SLT), pp. 905-911, 2021.
[22] Z.-Q. Wang and D.L. Wang, "Multi-Microphone Complex Spectral Mapping for Speech Dereverberation", in ICASSP, pp. 486-490, 2020.
[21] H. Taherian, Z.-Q. Wang, and D.L. Wang, "Deep Learning Based Multi-Channel Speaker Recognition in Noisy and Reverberant Environments", in INTERSPEECH, pp. 4070-4074, 2019.
[20] Z.-Q. Wang, K. Tan, and D.L. Wang, "Deep Learning Based Phase Reconstruction for Speaker Separation: A Trigonometric Perspective", in ICASSP, pp. 71-75, 2019.
[19] Z.-Q. Wang and D.L. Wang, "Integrating Spectral and Spatial Features for Multi-Channel Speaker Separation", in INTERSPEECH, pp. 2718-2722, 2018.
[18] Z.-Q. Wang, X. Zhang, and D.L. Wang, "Robust TDOA Estimation Based on Time-Frequency Masking and Deep Neural Networks", in INTERSPEECH, pp. 322-326, 2018.
[17] Z.-Q. Wang and D.L. Wang, "All-Neural Multi-Channel Speech Enhancement", in INTERSPEECH, pp. 3234-3238, 2018.
[16] Z.-Q. Wang, J. Le Roux, D.L. Wang, and J. R. Hershey, "End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction", in INTERSPEECH, pp. 2708-2712, 2018.
[15] Z.-Q. Wang, J. Le Roux, and J. R. Hershey, "Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation", in ICASSP, pp. 1-5, 2018. [Best Student Paper Award]
[14] Z.-Q. Wang, J. Le Roux, and J. R. Hershey, "Alternative Objective Functions for Deep Clustering", in ICASSP, pp. 686-690, 2018.
[13] Z.-Q. Wang and D.L. Wang, "On Spatial Features for Supervised Speech Separation and its Application to Beamforming and Robust ASR", in ICASSP, pp. 5709-5713, 2018.
[12] Z.-Q. Wang and D.L. Wang, "Mask Weighted STFT Ratios for Relative Transfer Function Estimation and its Application to Robust ASR", in ICASSP, pp. 5619-5623, 2018.
[11] I. Tashev, Z.-Q. Wang, and K. Godin, "Speech Emotion Recognition Based on Gaussian Mixture Models and Deep Neural Networks", in Information Theory and Applications Workshop (ITA), pp. 1-4, 2017.
[10] Y. Zhao, Z.-Q. Wang, and D.L. Wang, "A Two-stage Algorithm for Noisy and Reverberant Speech Enhancement", in ICASSP, pp. 5580-5584, 2017.
[9] X. Zhang, Z.-Q. Wang, and D.L. Wang, "A Speech Enhancement Algorithm by Iterating Single- and Multi-microphone Processing and its Application to Robust ASR", in ICASSP, pp. 276-280, 2017.
[8] Z.-Q. Wang and D.L. Wang, "Recurrent Deep Stacking Networks for Supervised Speech Separation", in ICASSP, pp. 71-75, 2017.
[7] Z.-Q. Wang and I. Tashev, "Learning Utterance-level Representations for Speech Emotion and Age/Gender Recognition using Deep Neural Networks", in ICASSP, pp. 5150-5154, 2017.
[6] Z.-Q. Wang and D.L. Wang, "Unsupervised Speaker Adaptation of Batch Normalized Acoustic Models for Robust ASR", in ICASSP, pp. 4890-4894, 2017.
[5] Z.-Q. Wang, Y. Zhao, and D.L. Wang, "Phoneme-Specific Speech Separation", in ICASSP, pp. 146-150, 2016. [NSF Student Travel Grant]
[4] Z.-Q. Wang and D.L. Wang, "Robust Speech Recognition from Ratio Masks", in ICASSP, pp. 5720-5724, 2016.
[3] D. Bagchi, M. Mandel, Z. Wang, Y. He, A. Plummer,, and E. Fosler-Lussier, "Combining Spectral Feature Mapping and Multi-channel Model-based Source Separation for Noise-robust Automatic Speech Recognition", in ASRU, pp. 496-503, 2015.
[2] Z.-Q. Wang and D.L. Wang, "Joint Training of Speech Separation, Filterbank and Acoustic Model for Robust Automatic Speech Recognition", in INTERSPEECH, pp. 2839-2843, 2015.
[1] Y. Liu, Z. Wang, M. Guo, and P. Li, "Hidden Conditional Random Field for Lung Nodule Detection", in IEEE International Conference on Image Processing (ICIP), pp. 3518-3521, 2014.
Patents
[4] Z.-Q. Wang, G. Wichern, and J. Le Roux, "Method and System for Audio Signal Enhancement with Reduced Latency", US Patent Application 18/045,380, 2023.
[3] G. Wichern, A. Chakrabarty, Z.-Q. Wang, and J. Le Roux, "Method and System for Detecting Anomalous Sound", US Patent 11,978,476 B2, 2024.
[2] Z.-Q. Wang, G. Wichern, and J. Le Roux, "Method and System for Dereverberation of Speech Signals", US Patent 11,790,930 B2, 2023.
[1] J. Le Roux, J. R. Hershey, Z. Wang, and G. P. Wichern, "Methods and Systems for End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction", US Patent 10,529,349 B2, 2020.
Dissertation
[1] Z.-Q. Wang, "Deep Learning Based Array Processing for Speech Separation, Localization, and Recognition", Ph.D. Dissertation, The Ohio State University, Apr. 2020.
Professional Services
• Professional Membership
○ Committee Member, Audio and Acoustic Signal Processing Technical Committee (AASP-TC), IEEE Signal Processing Society, 2023.1 - 2025.12
• Conference Chair
○ Area Chair, "Speech Coding and Enhancement", Interspeech 2024
○ Area Chair, "Audio and Speech Source Separation", ICASSP 2024 and 2025
○ Challenge Organizer, "CHiME-7 Task 1: Distant automatic speech recognition with multiple devices in diverse scenarios", CHiME workshop 2023
○ Special Session Chair, "Resource-efficient real-time neural speech separation", ICASSP 2023
• Meta-Reviewer
○ WASPAA 2023, ICASSP 2023
• Journal Reviewer
○ IEEE/ACM TASLP
○ Neural Networks
○ Speech Communication
○ Journal of The Acoustical Society of America
○ IEEE SPL
○ IEEE Open Journal of Signal Processing
○ Journal of Signal Processing Systems
○ EURASIP Journal on Audio, Speech, and Music Processing
○ Pattern Recognition Letters
○ Digital Signal Processing
○ IET Signal Processing
○ Electronics Letters
• Conference Reviewer
○ ICASSP, Interspeech, SLT, WASPAA, ASRU, CHiME workshop, IALP