The field of machine learning has matured to the point where many sophisticated learning approaches can be applied to practical applications. Thus it is of critical importance that researchers have the proper tools to evaluate learning approaches and understand the underlying issues. This book examines various aspects of the evaluation process with an emphasis on classification algorithms. The authors describe several techniques for classifier performance assessment, error estimation and resampling, obtaining statistical significance as well as selecting appropriate domains for evaluation. They also present a unified evaluation framework and highlight how different components of evaluation are both significantly interrelated and interdependent. The techniques presented in the book are illustrated using R and WEKA, facilitating better practical insight as well as implementation. Aimed at researchers in the theory and applications of machine learning, this book offers a solid basis for conducting performance evaluations of algorithms in practical settings.
Nathalie Japkowicz is an Associate Professor at the School of Information Technology and Engineering of the University of Ottawa. She is a former assistant professor at Dalhousie University and lecturer at Ohio State University. Japkowicz co-organized numerous workshops on classifier evaluation and the class imbalance problem at AAAI and ICML. She has published many articles in peer-reviewed journals and conference proceedings. Mohak Shah is a Postdoctoral Fellow at the Centre for Intelligent Machines at McGill University. He is a former CIHR Postdoctoral Fellow at the CHUL Genomics research centre and Laval University in Quebec. He has been named the Arnold Smith Commonwealth Scholar in 2002 and a National Scholar in India in 1995. Shah has served on program committees of various conferences and symposiums in addition to reviewing for major journals and conferences in the field.
1. Introduction; 2. Machine learning and statistics overview; 3. Performance measures I; 4. Performance measures II; 5. Error estimation; 6. Statistical significance testing; 7. Data sets and experimental framework; 8. Recent developments; 9. Conclusion; Appendix A: statistical tables; Appendix B: additional information on the data; Appendix C: two case studies.