Speech and Audio Recognition (2025 Fall)
Course Information
- Semester: Fall 2025
- Instructor: Prof. Inkyu An
- Department: School of Computer Science, Kookmin University
Course Description
This course covers fundamental concepts and state-of-the-art techniques in speech and audio processing, with emphasis on modern deep learning approaches. Students will learn digital signal processing, classical and modern speech recognition methods, audio source separation, and cutting-edge technologies including self-supervised learning, multi-channel processing, and diffusion-based text-to-speech synthesis. The course bridges theoretical foundations with practical applications in robot audition and audio-visual systems.
Course Objectives
By the end of this course, students will be able to:
- Digital signal processing techniques for audio analysis
- Apply speech and audio recognition technologies
Weekly Class Plan
Tools and Frameworks
- Python (NumPy, SciPy, scikit-learn)
- Deep learning frameworks (PyTorch)
- Speech processing libraries (Libra, Kaldi, ESPnet)
- Audio analysis tools
Course Materials
Lecture Notes
Lecture notes will be uploaded here throughout the semester
Assignments
Assignment materials and programming exercises will be posted here
Additional Materials
Supplementary materials, datasets, and resources will be available here
Contact Information
Professor: Inkyu An
Email: inkyu.an@kookmin.ac.kr
Office: Room 450, Engineering Building
Office Hours: TBA