Speech and Audio Recognition (2025 Fall)

Course Information

  • Semester: Fall 2025
  • Instructor: Prof. Inkyu An
  • Department: School of Computer Science, Kookmin University

Course Description

This course covers fundamental concepts and state-of-the-art techniques in speech and audio processing, with emphasis on modern deep learning approaches. Students will learn digital signal processing, classical and modern speech recognition methods, audio source separation, and cutting-edge technologies including self-supervised learning, multi-channel processing, and diffusion-based text-to-speech synthesis. The course bridges theoretical foundations with practical applications in robot audition and audio-visual systems.

Course Objectives

By the end of this course, students will be able to:

  • Digital signal processing techniques for audio analysis
  • Apply speech and audio recognition technologies

Weekly Class Plan

Week 1: Introduction
Week 2: Digital Signal Processing
Week 3: Speech Recognition 1
Week 4: Speech Recognition 2
Week 5: Self-Supervised Models for Audio
Week 6: Source Separation 1
Week 7: Source Separation 2
Week 8: Midterm exam
Week 9: Audio-Visual Deep Learning
Week 10: Robot Audition
Week 11: Multi-channel Audio Processing
Week 12: Sound Source Localization
Week 13: Multi-channel Speech Separation & ASR
Week 14: Diffusion-based TTS
Week 15: Final Project

Tools and Frameworks

  • Python (NumPy, SciPy, scikit-learn)
  • Deep learning frameworks (PyTorch)
  • Speech processing libraries (Libra, Kaldi, ESPnet)
  • Audio analysis tools

Course Materials

Lecture Notes

Lecture notes will be uploaded here throughout the semester

Assignments

Assignment materials and programming exercises will be posted here

Additional Materials

Supplementary materials, datasets, and resources will be available here

Contact Information

Professor: Inkyu An
Email: inkyu.an@kookmin.ac.kr
Office: Room 450, Engineering Building
Office Hours: TBA