Bringing together experts in multimodal signal processing, this book provides a detailed introduction to the area, with a focus on the analysis, recognition and interpretation of human communication. The technology described has powerful applications. For instance, automatic analysis of the outputs of cameras and microphones in a meeting can make sense of what is happening - who spoke, what they said, whether there was an active discussion and who was dominant in it. These analyses are layered to move from basic interpretations of the signals to richer semantic information. The book covers the necessary analyses in a tutorial manner, going from basic ideas to recent research results. It includes chapters on advanced speech processing and computer vision technologies, language understanding, interaction modeling and abstraction, as well as meeting support technology. This guide connects fundamental research with a wide range of prototype applications to support and analyze group interactions in meetings.
Steve Renals is Director of the Institute for Language, Computation, and Cognition (ILCC) and Professor of Speech Technology in the School of Informatics at the University of Edinburgh. He has over 150 publications in speech and language processing, is the co-editor-in-chief of the ACM Transactions on Speech and Language Processing and has led several large projects in the field. With Herve Bourlard, he was the joint coordinator of the AMI and AMIDA European Integrated Projects, which form the basis for the book. Herve Bourlard is Director of the Idiap Research Institute in Switzerland, Professor at the Swiss Federal Institute of Technology at Lausanne (EPFL) and founding Director of the Swiss National Center of Competence in Research on Interactive Multimodal Information Management (NCCR IM2). He has over 250 publications, has initiated and coordinated numerous international research projects and is the recipient of several scientific and entrepreneurship awards. Jean Carletta is a Senior Research Fellow at the Human Communication Research Centre, University of Edinburgh. She was the scientific manager of the AMI and AMIDA Integrated Projects. A former Marshall Scholar, she has been on the editorial boards of Computational Linguistics and Language Resources and Evaluation. Andrei Popescu-Belis is a senior researcher at the Idiap Research Institute, a lecturer at EPFL, and the head of Idiap's Natural Language Processing group. His research interests are in natural language processing, information retrieval, language resources, and the evaluation of linguistic and interactive systems.
1. Multimodal signal processing for human meetings: an introduction Andrei Popescu-Belis and Jean Carletta; 2. Data collection Jean Carletta and Mike Lincoln; 3. Microphone arrays and beamforming Iain McCowan; 4. Speaker diarization Fabio Valente and Gerald Friedland; 5. Speech recognition Thomas Hain and Philip N. Garner; 6. Sampling techniques for audio-visual tracking and head pose estimation Jean-Marc Odobez and Oswald Lanz; 7. Video processing and recognition Pavel Zemcik, Sebastien Marcel and Jozef Mlich; 8. Language structure Tilman Becker and Theresa Wilson; 9. Multimodal analysis of small-group conversational dynamics Daniel Gatica-Perez, Rieks op den Akker and Dirk Heylen; 10. Summarization Thomas Kleinbauer and Gabriel Murray; 11. User requirements for meeting support technology Denis Lalanne and Andrei Popescu-Belis; 12. Meeting browsers and meeting assistants Steve Whittaker, Simon Tucker and Denis Lalanne; 13. Evaluation of meeting support technology Simon Tucker and Andrei Popescu-Belis; 14. Conclusion and perspectives Herve Bourlard and Steve Renals.