Tackle a variety of tasks in natural language processing by learning how to use the R language and tidy data principles. This practical guide provides examples and resources to help you get up to speed with dplyr, broom, ggplot2, and other tidy tools from the R ecosystem. You'll discover how tidy data principles can make text mining easier, more effective, and consistent by employing tools already in wide use. Text Mining with R shows you how to manipulate, summarize, and visualize the characteristics of text, sentiment analysis, tf-idf, and topic modeling. Along with tidy data methods, you'll also examine several beginning-to-end tidy text analyses on data sources from Twitter to NASA datasets. These analyses bring together multiple text mining approaches covered in the book. Get real-world examples for implementing text mining using tidy R package Understand natural language processing concepts like sentiment analysis, tf-idf, and topic modeling Learn how to analyze unstructured, text-heavy data using R language and ecosystem
Julia Silge is a data scientist at Datassist where her work involves analyzing and modeling complex data sets while communicating about technical topics with diverse audiences. She has a PhD in Astrophysics, as well as abiding affections for Jane Austen and making beautiful charts. Julia worked in academia and ed tech before moving into data science and discovering R. David Robinson is a data scientist at Stack Overflow. He has a PhD in Quantitative and Computational Biology from Princeton University, where he worked with Professor John Storey on genomic analysis. He enjoys working and blogging about statistics, R programming, and text mining, including a popular analysis of Donald Trump's twitter account (performed according to the tidy data principles described in this book).