Beginning Apache Pig: Big Data Processing Made Easy

Balaswamy Vaddeman

  • 出版商: Apress
  • 出版日期: 2016-12-16
  • 售價: $1,650
  • 貴賓價: 9.5$1,568
  • 語言: 英文
  • 頁數: 274
  • 裝訂: Paperback
  • ISBN: 1484223365
  • ISBN-13: 9781484223369
  • 相關分類: 大數據 Big-data
  • 海外代購書籍(需單獨結帳)

商品描述

Learn to use Apache Pig to develop lightweight big data applications easily and quickly. This book shows you many optimization techniques and covers every context where Pig is used in big data analytics. Beginning Apache Pig shows you how Pig is easy to learn and requires relatively little time to develop big data applications.
The book is divided into four parts: the complete features of Apache Pig; integration with other tools; how to solve complex business problems; and optimization of tools.
You'll discover topics such as MapReduce and why it cannot meet every business need; the features of Pig Latin such as data types for each load, store, joins, groups, and ordering; how Pig workflows can be created; submitting Pig jobs using Hue; and working with Oozie. You'll also see how to extend the framework by writing UDFs and custom load, store, and filter functions. Finally you'll cover different optimization techniques such as gathering statistics about a Pig script, joining strategies, parallelism, and the role of data formats in good performance.

What You Will Learn
• Use all the features of Apache Pig
• Integrate Apache Pig with other tools
• Extend Apache Pig
• Optimize Pig Latin code
• Solve different use cases for Pig Latin
Who This Book Is For
All levels of IT professionals: architects, big data enthusiasts, engineers, developers, and big data administrators

商品描述(中文翻譯)

學習使用Apache Pig輕鬆快速地開發大數據應用程式。本書展示了許多優化技巧,並涵蓋了Pig在大數據分析中的各種情境。《Beginning Apache Pig》向您展示了Pig的易學性,並且相對於開發大數據應用程式所需的時間較少。

本書分為四個部分:Apache Pig的完整功能;與其他工具的整合;如何解決複雜的業務問題;以及工具的優化。

您將了解到MapReduce以及為什麼它無法滿足每個業務需求;Pig Latin的特性,例如每個load、store、join、group和order的數據類型;如何創建Pig工作流程;使用Hue提交Pig作業;以及與Oozie的合作。您還將看到如何通過編寫UDF和自定義的load、store和filter函數來擴展框架。最後,您將涵蓋不同的優化技術,例如收集有關Pig腳本的統計信息、連接策略、並行處理以及數據格式在性能方面的作用。

您將學到以下內容:
• 使用Apache Pig的所有功能
• 將Apache Pig與其他工具整合
• 擴展Apache Pig
• 優化Pig Latin代碼
• 解決Pig Latin的不同使用案例

本書適合所有級別的IT專業人士:架構師、大數據愛好者、工程師、開發人員和大數據管理員。