Hadoop: Data Processing and Modelling

Tanmay Deshpande, Sandeep Karanth, Gerald Turkington

  • 出版商: Packt Publishing
  • 出版日期: 2017-07-14
  • 售價: $3,790
  • 貴賓價: 9.5$3,601
  • 語言: 英文
  • 頁數: 1006
  • 裝訂: Paperback
  • ISBN: 1787125165
  • ISBN-13: 9781787125162
  • 相關分類: Hadoop
  • 下單後立即進貨 (約3~4週)

商品描述

Unlock the power of your data with Hadoop 2.X ecosystem and its data warehousing techniques across large data sets About This Book * Conquer the mountain of data using Hadoop 2.X tools * The authors succeed in creating a context for Hadoop and its ecosystem * Hands-on examples and recipes giving the bigger picture and helping you to master Hadoop 2.X data processing platforms * Overcome the challenging data processing problems using this exhaustive course with Hadoop 2.X Who This Book Is For This course is for Java developers, who know scripting, wanting a career shift to Hadoop - Big Data segment of the IT industry. So if you are a novice in Hadoop or an expert, this book will make you reach the most advanced level in Hadoop 2.X. What You Will Learn * Best practices for setup and configuration of Hadoop clusters, tailoring the system to the problem at hand * Integration with relational databases, using Hive for SQL queries and Sqoop for data transfer * Installing and maintaining Hadoop 2.X cluster and its ecosystem * Advanced Data Analysis using the Hive, Pig, and Map Reduce programs * Machine learning principles with libraries such as Mahout and Batch and Stream data processing using Apache Spark * Understand the changes involved in the process in the move from Hadoop 1.0 to Hadoop 2.0 * Dive into YARN and Storm and use YARN to integrate Storm with Hadoop * Deploy Hadoop on Amazon Elastic MapReduce and Discover HDFS replacements and learn about HDFS Federation In Detail As Marc Andreessen has said "Data is eating the world," which can be witnessed today being the age of Big Data, businesses are producing data in huge volumes every day and this rise in tide of data need to be organized and analyzed in a more secured way. With proper and effective use of Hadoop, you can build new-improved models, and based on that you will be able to make the right decisions. The first module, Hadoop beginners Guide will walk you through on understanding Hadoop with very detailed instructions and how to go about using it. Commands are explained using sections called "What just happened" for more clarity and understanding. The second module, Hadoop Real World Solutions Cookbook, 2nd edition, is an essential tutorial to effectively implement a big data warehouse in your business, where you get detailed practices on the latest technologies such as YARN and Spark. Big data has become a key basis of competition and the new waves of productivity growth. Hence, once you get familiar with the basics and implement the end-to-end big data use cases, you will start exploring the third module, Mastering Hadoop. So, now the question is if you need to broaden your Hadoop skill set to the next level after you nail the basics and the advance concepts, then this course is indispensable. When you finish this course, you will be able to tackle the real-world scenarios and become a big data expert using the tools and the knowledge based on the various step-by-step tutorials and recipes. Style and approach This course has covered everything right from the basic concepts of Hadoop till you master the advance mechanisms to become a big data expert. The goal here is to help you learn the basic essentials using the step-by-step tutorials and from there moving toward the recipes with various real-world solutions for you. It covers all the important aspects of Hadoop from system designing and configuring Hadoop, machine learning principles with various libraries with chapters illustrated with code fragments and schematic diagrams. This is a compendious course to explore Hadoop from the basics to the most advanced techniques available in Hadoop 2.X.

商品描述(中文翻譯)

解鎖您數據的力量,利用 Hadoop 2.X 生態系統及其在大型數據集上的數據倉儲技術

關於本書
* 使用 Hadoop 2.X 工具征服數據山
* 作者成功地為 Hadoop 及其生態系統創造了一個背景
* 實作範例和食譜提供更大的視野,幫助您掌握 Hadoop 2.X 數據處理平台
* 使用這個全面的課程克服挑戰性的數據處理問題,搭配 Hadoop 2.X

本書適合誰
本課程適合 Java 開發人員,具備腳本編寫能力,想要轉職至 Hadoop - IT 行業的大數據領域。因此,無論您是 Hadoop 新手還是專家,本書都將幫助您達到 Hadoop 2.X 的最先進水平。

您將學到什麼
* Hadoop 集群的設置和配置最佳實踐,根據當前問題調整系統
* 與關聯數據庫的整合,使用 Hive 進行 SQL 查詢和 Sqoop 進行數據傳輸
* 安裝和維護 Hadoop 2.X 集群及其生態系統
* 使用 Hive、Pig 和 Map Reduce 程式進行高級數據分析
* 機器學習原則,使用 Mahout 等庫,並使用 Apache Spark 進行批處理和流處理
* 理解從 Hadoop 1.0 到 Hadoop 2.0 過程中涉及的變化
* 深入了解 YARN 和 Storm,並使用 YARN 將 Storm 與 Hadoop 整合
* 在 Amazon Elastic MapReduce 上部署 Hadoop,探索 HDFS 替代方案並了解 HDFS 聯邦

詳細內容
正如 Marc Andreessen 所說「數據正在吞噬世界」,這在當今大數據時代得到了證實,企業每天都在產生大量數據,這股數據潮流需要以更安全的方式進行組織和分析。通過正確有效地使用 Hadoop,您可以建立新的改進模型,並基於此做出正確的決策。第一個模組,Hadoop 初學者指南,將詳細指導您理解 Hadoop 及其使用方法。命令的解釋使用了名為「剛剛發生了什麼」的部分,以提供更清晰的理解。第二個模組,Hadoop 實際解決方案食譜(第二版),是有效實施大數據倉庫的必要教程,您將獲得有關最新技術(如 YARN 和 Spark)的詳細實踐。大數據已成為競爭的關鍵基礎和生產力增長的新浪潮。因此,一旦您熟悉基礎並實施端到端的大數據用例,您將開始探索第三個模組,精通 Hadoop。因此,問題是如果您在掌握基礎和進階概念後需要擴展您的 Hadoop 技能,那麼這門課程是不可或缺的。完成本課程後,您將能夠應對現實世界的情境,並成為使用各種逐步教程和食譜的工具和知識的大數據專家。

風格與方法
本課程涵蓋了從 Hadoop 的基本概念到您掌握高級機制以成為大數據專家的所有內容。這裡的目標是幫助您通過逐步教程學習基本要素,然後向各種現實世界解決方案的食譜邁進。它涵蓋了 Hadoop 的所有重要方面,包括系統設計和配置 Hadoop、各種庫的機器學習原則,並用代碼片段和示意圖來說明章節。這是一門全面的課程,旨在從基礎到最先進的技術探索 Hadoop 2.X。