Apache Hadoop YARN: Moving beyond MapRe... | WHSmith Books
Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2

Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2

By: Arun C. Murthy (author), Joseph Niemiec (author), Douglas Eadline (author), Jeff Markham (author), Vinod Kumar Vavilapalli (author)Paperback

Only 1 in stock

£25.49 RRP £29.99  You save £4.50 (15%) With FREE Saver Delivery


"This book is a critically needed resource for the newly released Apache Hadoop 2.0, highlighting YARN as the significant breakthrough that broadens Hadoop beyond the MapReduce paradigm." -From the Foreword by Raymie Stata, CEO of Altiscale The Insider's Guide to Building Distributed, Big Data Applications with Apache Hadoop (TM) YARN Apache Hadoop is helping drive the Big Data revolution. Now, its data processing has been completely overhauled: Apache Hadoop YARN provides resource management at data center scale and easier ways to create distributed applications that process petabytes of data. And now in Apache Hadoop (TM) YARN, two Hadoop technical leaders show you how to develop new applications and adapt existing code to fully leverage these revolutionary advances. YARN project founder Arun Murthy and project lead Vinod Kumar Vavilapalli demonstrate how YARN increases scalability and cluster utilization, enables new programming models and services, and opens new options beyond Java and batch processing. They walk you through the entire YARN project lifecycle, from installation through deployment. You'll find many examples drawn from the authors' cutting-edge experience-first as Hadoop's earliest developers and implementers at Yahoo! and now as Hortonworks developers moving the platform forward and helping customers succeed with it. Coverage includes YARN's goals, design, architecture, and components-how it expands the Apache Hadoop ecosystem Exploring YARN on a single node Administering YARN clusters and Capacity Scheduler Running existing MapReduce applications Developing a large-scale clustered YARN application Discovering new open source frameworks that run under YARN

About Author

Arun Murthy has contributed to Apache Hadoop full-time since the inception of the project in early 2006. He is a long-term Hadoop committer and a member of the Apache Hadoop Project Management Committee. Previously, he was the architect and lead of the Yahoo Hadoop MapReduce development team and was ultimately responsible, technically, for providing Hadoop MapReduce as a service for all of Yahoo--currently running on nearly 50,000 machines. Arun is the founder and architect of the Hortonworks Inc., a software company that is helping to accelerate the development and adoption of Apache Hadoop. Hortonworks was formed by the key architects and core Hadoop committers from the Yahoo! Hadoop software engineering team in June 2011. Funded by Yahoo! and Benchmark Capital, one of the preeminent technology investors, their goal is to ensure that Apache Hadoop becomes the standard platform for storing, processing, managing, and analyzing big data. Vinod Kumar Vavilapalli has been contributing to Apache Hadoop project full-time since mid-2007. At Apache Software Foundation, he is a long-term Hadoop contributor, Hadoop committer, member of the Apache Hadoop Project Management Committee, and a foundation member. Vinod is a MapReduce and YARN go-to guy at Hortonworks Inc. For more than five years, he has been working on Hadoop. He was involved in HadoopOnDemand, Hadoop-0.20, CapacityScheduler, Hadoop security, and MapReduce, and is now a lead developer and the project lead for Apache Hadoop YARN. Before Hortonworks, he was at Yahoo!, working in the Grid team that made Hadoop what it is today, running at large scale--up to tens of thousands of nodes. Vinod loves reading books of all kinds and is passionate about using computers to change the world for better, bit by bit. He has a bachelor's degree in computer science and engineering from the Indian Institute of Technology Roorkee. He can be reached at twitter handle @tshooter. Douglas Eadline, Ph.D., began his career as a practitioner and a chronicler of the Linux Cluster HPC revolution and now documents big data analytics. Starting with the first Beowulf How To document, Doug has written hundreds of articles, white papers, and instructional documents covering virtually all aspects of HPC computing. Prior to starting and editing the popular ClusterMonkey.net website in 2005, he served as editor -in- chief for ClusterWorld magazine, and was senior HPC editor for Linux Magazine. Currently, he is a consultant to the HPC industry and writes a monthly column in HPC Admin magazine. Both clients and readers have recognized Doug's ability to present a "technological value proposition" in a clear and accurate style. He has practical, hands-on experience in many aspects of HPC, including hardware and software design, benchmarking, storage, GPU, cloud, and parallel computing. He is the author of Hadoop Fundamentals LiveLessons (video) from Addison-Wesley. Joseph Niemiec is a big data solutions engineer whose focus is on designing Hadoop solutions for many Fortune 1000 companies. In this position, Joseph has worked with customers to build multiple YARN applications providing a unique perspective on moving customers beyond batch processing, and has worked on YARN development directly. An avid technologist, Joseph has been focused on technology innovations since 2001. His interest in data analytics originally started in game score optimization as a teenager, and has shifted to helping customers uptake new technology innovations such as Hadoop and, most recently, building new data applications using YARN. Jeff Markham is a solution engineer at Hortonworks Inc., the company promoting open source Hadoop. Previously, he was with VMware, Red Hat, and IBM, helping companies build distributed applications with distributed data. He has written articles on Java application development and has spoken at several conferences and to Hadoop User Groups. Jeff is a contributor to Apache Pig and Apache HDFS.


Foreword by Raymie Stata xiiiForeword by Paul Dix xvPreface xviiAcknowledgments xxiAbout the Authors xxv Chapter 1: Apache Hadoop YARN: A Brief History and Rationale 1Introduction 1Apache Hadoop 2Phase 0: The Era of Ad Hoc Clusters 3Phase 1: Hadoop on Demand 3Phase 2: Dawn of the Shared Compute Clusters 9Phase 3: Emergence of YARN 18Conclusion 20 Chapter 2: Apache Hadoop YARN Install Quick Start 21Getting Started 22Steps to Configure a Single-Node YARN Cluster 22Run Sample MapReduce Examples 30Wrap-up 31 Chapter 3: Apache Hadoop YARN Core Concepts 33Beyond MapReduce 33Apache Hadoop MapReduce 35Apache Hadoop YARN 38YARN Components 39Wrap-up 42 Chapter 4: Functional Overview of YARN Components 43Architecture Overview 43ResourceManager 45YARN Scheduling Components 46Containers 49NodeManager 49ApplicationMaster 50YARN Resource Model 50Managing Application Dependencies 53Wrap-up 57 Chapter 5: Installing Apache Hadoop YARN 59The Basics 59System Preparation 60Script-based Installation of Hadoop 2 62Script-based Uninstall 68Configuration File Processing 68Configuration File Settings 68Start-up Scripts 71Installing Hadoop with Apache Ambari 71Wrap-up 84 Chapter 6: Apache Hadoop YARN Administration 85Script-based Configuration 85Monitoring Cluster Health: Nagios 90Real-time Monitoring: Ganglia 97Administration with Ambari 99JVM Analysis 103Basic YARN Administration 106Wrap-up 114 Chapter 7: Apache Hadoop YARN Architecture Guide 115Overview 115ResourceManager 117NodeManager 127ApplicationMaster 138YARN Containers 148Summary for Application-writers 150Wrap-up 151 Chapter 8: Capacity Scheduler in YARN 153Introduction to the Capacity Scheduler 153Capacity Scheduler Configuration 155Queues 156Hierarchical Queues 156Queue Access Control 159Capacity Management with Queues 160User Limits 163Reservations 166State of the Queues 167Limits on Applications 168User Interface 169Wrap-up 169 Chapter 9: MapReduce with Apache Hadoop YARN 171Running Hadoop YARN MapReduce Examples 171MapReduce Compatibility 181The MapReduce ApplicationMaster 181Calculating the Capacity of a Node 182Changes to the Shuffle Service 184Running Existing Hadoop Version 1 Applications 184Running MapReduce Version 1 Existing Code 187Advanced Features 188Wrap-up 190 Chapter 10: Apache Hadoop YARN Application Example 191The YARN Client 191The ApplicationMaster 208Wrap-up 226 Chapter 11: Using Apache Hadoop YARN Distributed-Shell 227Using the YARN Distributed-Shell 227Internals of the Distributed-Shell 232Wrap-up 240 Chapter 12: Apache Hadoop YARN Frameworks 241Distributed-Shell 241Hadoop MapReduce 241Apache Tez 242Apache Giraph 242Hoya: HBase on YARN 243Dryad on YARN 243Apache Spark 244Apache Storm 244REEF: Retainable Evaluator Execution Framework 245Hamster: Hadoop and MPI on the Same Cluster 245Wrap-up 245 Appendix A: Supplemental Content and Code Downloads 247Available Downloads 247 Appendix B: YARN Installation Scripts 249install-hadoop2.sh 249uninstall-hadoop2.sh 256hadoop-xml-conf.sh 258 Appendix C: YARN Administration Scripts 263configure-hadoop2.sh 263 Appendix D: Nagios Modules 269check resource manager.sh 269check data node.sh 271check resource manager old space pct.sh 272 Appendix E: Resources and Additional Information 277 Appendix F: HDFS Quick Reference 279Quick Command Reference 279 Index 287

Product Details

  • ISBN13: 9780321934505
  • Format: Paperback
  • Number Of Pages: 400
  • ID: 9780321934505
  • weight: 524
  • ISBN10: 0321934504

Delivery Information

  • Saver Delivery: Yes
  • 1st Class Delivery: Yes
  • Courier Delivery: Yes
  • Store Delivery: Yes

Prices are for internet purchases only. Prices and availability in WHSmith Stores may vary significantly