Speech and Audio Recognition (2025 Fall)

Course Information

Semester: Fall 2025
Instructor: Prof. Inkyu An
Department: School of Computer Science, Kookmin University

Course Description

This course covers fundamental concepts and state-of-the-art techniques in speech and audio processing, with emphasis on modern deep learning approaches. Students will learn digital signal processing, classical and modern speech recognition methods, audio source separation, and cutting-edge technologies including self-supervised learning, multi-channel processing, and diffusion-based text-to-speech synthesis. The course bridges theoretical foundations with practical applications in robot audition and audio-visual systems.

Course Objectives

By the end of this course, students will be able to:

Digital signal processing techniques for audio analysis
Apply speech and audio recognition technologies

Weekly Class Plan

Week 1: Introduction

Week 2: Digital Signal Processing

Week 3: Speech Recognition 1

Week 4: Speech Recognition 2

Week 5: Self-Supervised Models for Audio

Week 6: Source Separation 1

Week 7: Source Separation 2

Week 8: Midterm exam

Week 9: Audio-Visual Deep Learning

Week 10: Robot Audition

Week 11: Multi-channel Audio Processing

Week 12: Sound Source Localization

Week 13: Multi-channel Speech Separation & ASR

Week 14: Diffusion-based TTS

Week 15: Final Project

Tools and Frameworks

Python (NumPy, SciPy, scikit-learn)
Deep learning frameworks (PyTorch)
Speech processing libraries (Libra, Kaldi, ESPnet)
Audio analysis tools

Course Materials

Contact Information

Professor: Inkyu An
Email: inkyu.an@kookmin.ac.kr
Office: Room 450, Engineering Building
Office Hours: TBA