Serverless Analytics with Amazon Athena: Query structured, unstructured, or semi-structured data in seconds without setting up any infrastructure
暫譯: 無伺服器分析與 Amazon Athena:在不設置任何基礎設施的情況下,幾秒鐘內查詢結構化、非結構化或半結構化數據

Virtuoso, Anthony, Hocanin, Mert Turkay, Wishnick, Aaron

  • 出版商: Packt Publishing
  • 出版日期: 2021-11-19
  • 售價: $2,220
  • 貴賓價: 9.5$2,109
  • 語言: 英文
  • 頁數: 438
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1800562349
  • ISBN-13: 9781800562349
  • 相關分類: Serverless
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

Get more from your data with Amazon Athena's ease-of-use, interactive performance, and pay-per-query pricing


Key Features:

  • Explore the promising capabilities of Amazon Athena and Athena's Query Federation SDK
  • Use Athena to prepare data for common machine learning activities
  • Cover best practices for setting up connectivity between your application and Athena and security considerations


Book Description:

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using SQL, without needing to manage any infrastructure.


This book begins with an overview of the serverless analytics experience offered by Athena and teaches you how to build and tune an S3 Data Lake using Athena, including how to structure your tables using open-source file formats like Parquet. You'll learn how to build, secure, and connect to a data lake with Athena and Lake Formation. Next, you'll cover key tasks such as ad hoc data analysis, working with ETL pipelines, monitoring and alerting KPI breaches using CloudWatch Metrics, running customizable connectors with AWS Lambda, and more. Moving on, you'll work through easy integrations, troubleshooting and tuning common Athena issues, and the most common reasons for query failure. You will also review tips to help diagnose and correct failing queries in your pursuit of operational excellence. Finally, you'll explore advanced concepts such as Athena Query Federation and Athena ML to generate powerful insights without needing to touch a single server.


By the end of this book, you'll be able to build and use a data lake with Amazon Athena to add data-driven features to your app and perform the kind of ad hoc data analysis that often precedes many of today's ML modeling exercises.


What You Will Learn:

  • Secure and manage the cost of querying your data
  • Use Athena ML and User Defined Functions (UDFs) to add advanced features to your reports
  • Write your own Athena Connector to integrate with a custom data source
  • Discover your datasets on S3 using AWS Glue Crawlers
  • Integrate Amazon Athena into your applications
  • Setup Identity and Access Management (IAM) policies to limit access to tables and databases in Glue Data Catalog
  • Add an Amazon SageMaker Notebook to your Athena queries
  • Get to grips with using Athena for ETL pipelines


Who this book is for:

Business intelligence (BI) analysts, application developers, and system administrators who are looking to generate insights from an ever-growing sea of data while controlling costs and limiting operational burden, will find this book helpful. Basic SQL knowledge is expected to make the most out of this book.

商品描述(中文翻譯)

利用 Amazon Athena 的易用性、互動性能和按查詢計費,從數據中獲得更多價值

主要特點:


  • 探索 Amazon Athena 和 Athena 的查詢聯邦 SDK 的潛力

  • 使用 Athena 準備數據以進行常見的機器學習活動

  • 涵蓋設置應用程序與 Athena 之間的連接性和安全考量的最佳實踐

書籍描述:
Amazon Athena 是一種互動查詢服務,使您能夠輕鬆使用 SQL 分析 Amazon S3 中的數據,而無需管理任何基礎設施。

本書首先概述了 Athena 提供的無伺服器分析體驗,並教您如何使用 Athena 構建和調整 S3 數據湖,包括如何使用 Parquet 等開源文件格式來結構化您的表格。您將學習如何使用 Athena 和 Lake Formation 構建、安全和連接數據湖。接下來,您將涵蓋關鍵任務,例如即時數據分析、處理 ETL 管道、使用 CloudWatch Metrics 監控和警報 KPI 違規、運行可自定義的連接器與 AWS Lambda 等。隨後,您將處理簡單的集成、故障排除和調整常見的 Athena 問題,以及查詢失敗的最常見原因。您還將回顧幫助診斷和修正失敗查詢的提示,以追求運營卓越。最後,您將探索高級概念,如 Athena 查詢聯邦和 Athena ML,以在不接觸任何伺服器的情況下生成強大的見解。

在本書結束時,您將能夠使用 Amazon Athena 構建和使用數據湖,為您的應用程序添加數據驅動的功能,並執行許多當今機器學習建模練習之前常見的即時數據分析。

您將學到的內容:


  • 安全管理查詢數據的成本

  • 使用 Athena ML 和用戶定義函數 (UDFs) 為報告添加高級功能

  • 編寫自己的 Athena 連接器以集成自定義數據源

  • 使用 AWS Glue 爬蟲發現 S3 上的數據集

  • 將 Amazon Athena 集成到您的應用程序中

  • 設置身份和訪問管理 (IAM) 策略以限制對 Glue 數據目錄中表格和數據庫的訪問

  • 將 Amazon SageMaker Notebook 添加到您的 Athena 查詢中

  • 掌握使用 Athena 進行 ETL 管道的技巧

本書適合誰:
尋求從不斷增長的數據海洋中生成見解,同時控制成本和減少運營負擔的商業智能 (BI) 分析師、應用程序開發人員和系統管理員將會發現本書非常有幫助。希望充分利用本書,需具備基本的 SQL 知識。