Apache Airflow Best Practices: A practical guide to orchestrating data workflow with Apache Airflow (Apache Airflow 最佳實踐:使用 Apache Airflow 編排數據工作流程的實用指南)

Intorf, Dylan, Storey, Dylan, Doorn, Kendrick Van

  • 出版商: Packt Publishing
  • 出版日期: 2024-10-31
  • 售價: $1,800
  • 貴賓價: 9.5$1,710
  • 語言: 英文
  • 頁數: 188
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1805123750
  • ISBN-13: 9781805123750
  • 立即出貨 (庫存 < 3)

相關主題

商品描述

Confidently orchestrate your data pipelines with Apache Airflow by applying industry best practices and scalable strategies

Key Features:

- Understand the steps for migrating from Airflow 1.x to 2.x and explore the new features and improvements in version 2.x

- Learn Apache Airflow workflow authoring through real-world use cases

- Uncover strategies to operationalize your Airflow instance and pipelines for resilient operations and high throughput

- Purchase of the print or Kindle book includes a free PDF eBook

Book Description:

Data professionals face the monumental task of managing complex data pipelines, orchestrating workflows across diverse systems, and ensuring scalable, reliable data processing. This definitive guide to mastering Apache Airflow, written by experts in engineering, data strategy, and problem-solving across tech, financial, and life sciences industries, is your key to overcoming these challenges. It covers everything from the basics of Airflow and its core components to advanced topics such as custom plugin development, multi-tenancy, and cloud deployment.

Starting with an introduction to data orchestration and the significant updates in Apache Airflow 2.0, this book takes you through the essentials of DAG authoring, managing Airflow components, and connecting to external data sources. Through real-world use cases, you'll gain practical insights into implementing ETL pipelines and machine learning workflows in your environment. You'll also learn how to deploy Airflow in cloud environments, tackle operational considerations for scaling, and apply best practices for CI/CD and monitoring.

By the end of this book, you'll be proficient in operating and using Apache Airflow, authoring high-quality workflows in Python for your specific use cases, and making informed decisions crucial for production-ready implementation.

What You Will Learn:

- Explore the new features and improvements in Apache Airflow 2.0

- Design and build data pipelines using DAGs

- Implement ETL pipelines, ML workflows, and other advanced use cases

- Develop and deploy custom plugins and UI extensions

- Deploy and manage Apache Airflow in cloud environments such as AWS, GCP, and Azure

- Describe a path for the scaling of your environment over time

- Apply best practices for monitoring and maintaining Airflow

Who this book is for:

This book is for data engineers, developers, IT professionals, and data scientists who want to optimize workflow orchestration with Apache Airflow. It's perfect for those who recognize Airflow's potential and want to avoid common implementation pitfalls. Whether you're new to data, an experienced professional, or a manager seeking insights, this guide will support you. A functional understanding of Python, some business experience, and basic DevOps skills are helpful. While prior experience with Airflow is not required, it is beneficial.

Table of Contents

- Getting Started with Airflow 2.0

- Core Airflow Concepts

- Components of Airflow

- Basics of Airflow and DAG Authoring

- Connecting to External Sources

- Extending Functionality with UI Plugins

- Writing and Distributing Custom Providers

- Orchestrating a Machine Learning Workflow

- Using Airflow as a Driving Service

- Airflow Ops: Development and Deployment

- Airflow Ops Best Practices: Observation and Monitoring

- Multi-Tenancy in Airflow

- Migrating Airflow

商品描述(中文翻譯)

自信地運用 Apache Airflow 來編排您的數據管道,並應用行業最佳實踐和可擴展策略

主要特點:
- 了解從 Airflow 1.x 遷移到 2.x 的步驟,並探索 2.x 版本中的新功能和改進
- 通過實際案例學習 Apache Airflow 工作流程的創建
- 發掘將您的 Airflow 實例和管道運營化的策略,以實現韌性操作和高吞吐量
- 購買印刷版或 Kindle 書籍可獲得免費 PDF 電子書

書籍描述:
數據專業人士面臨著管理複雜數據管道的艱巨任務,需在多樣化系統中編排工作流程,並確保可擴展和可靠的數據處理。這本由工程、數據策略和解決問題的專家撰寫的 Apache Airflow 精通指南,是您克服這些挑戰的關鍵。它涵蓋了從 Airflow 的基本概念及其核心組件到自定義插件開發、多租戶和雲端部署等高級主題的所有內容。

本書從數據編排的介紹和 Apache Airflow 2.0 的重大更新開始,帶您了解 DAG 創建的基本要素、管理 Airflow 組件以及連接外部數據源。通過實際案例,您將獲得在您的環境中實施 ETL 管道和機器學習工作流程的實用見解。您還將學習如何在雲環境中部署 Airflow,處理擴展的運營考量,並應用 CI/CD 和監控的最佳實踐。

在本書結束時,您將能夠熟練操作和使用 Apache Airflow,為您的特定用例創建高質量的工作流程,並做出對生產就緒實施至關重要的明智決策。

您將學到的內容:
- 探索 Apache Airflow 2.0 的新功能和改進
- 使用 DAG 設計和構建數據管道
- 實施 ETL 管道、機器學習工作流程及其他高級用例
- 開發和部署自定義插件和 UI 擴展
- 在 AWS、GCP 和 Azure 等雲環境中部署和管理 Apache Airflow
- 描述隨時間推移擴展環境的路徑
- 應用 Airflow 監控和維護的最佳實踐

本書適合對象:
本書適合希望優化 Apache Airflow 工作流程編排的數據工程師、開發人員、IT 專業人士和數據科學家。它非常適合那些認識到 Airflow 潛力並希望避免常見實施陷阱的人。無論您是數據新手、經驗豐富的專業人士,還是尋求見解的經理,本指南都將支持您。具備 Python 的基本理解、一些商業經驗和基本的 DevOps 技能將會有所幫助。雖然不需要先前的 Airflow 經驗,但有相關經驗會更有利。

目錄:
- 開始使用 Airflow 2.0
- Airflow 的核心概念
- Airflow 的組件
- Airflow 和 DAG 創建的基本知識
- 連接外部數據源
- 使用 UI 插件擴展功能
- 編寫和分發自定義提供者
- 編排機器學習工作流程
- 將 Airflow 作為驅動服務
- Airflow 操作:開發和部署
- Airflow 操作最佳實踐:觀察和監控
- Airflow 中的多租戶
- 遷移 Airflow